Differences

This shows you the differences between two versions of the page.

--- ai:insttructlab [2025/06/27 15:59] – created phil
+++ ai:insttructlab [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1
@@ Line 1: / Line 1: @@
-====== Running InstructLab on a Lenovo Thinkpad X1 Carbon Gen 12 ======
-This notebook's hardware is advertised as [[https://www.lenovo.com/us/en/p/laptops/thinkpad/thinkpadx1/thinkpad-x1-carbon-gen-12-14-inch-intel/len101t0083|Powered by Intel® Core™ Ultra processors, with integrated AI]]
-and indeed there seem to be dedicated devices present for that purpose:
-  % lspci
-  [...]
-:08.0 System peripheral: Intel Corporation Meteor Lake-P Gaussian & Neural-Network Accelerator (rev 20)
-  [...]
-:0b.0 Processing accelerators: Intel Corporation Meteor Lake NPU (rev 04)
-  [...]
-For the second one above there is a kernel driver enabled by
-//CONFIG_DRM_ACCEL_IVPU// symbol. In the config menu, it sits in:
-  > Device Drivers > Compute Acceleration Framework
-When building as a module, it is called //intel_vpu.ko//.
-Once loaded, there appears a ///dev/accel/accel0// node in //devtmpfs// and
-//sysfs// gains a new //class/accel/accel0// symlink pointing at the PCI
-device. Interesting attributes in ///sys/class/accel/accel0/device//:
-| npu_busy_time_us | The time this NPU spent executing jobs (in us) |
-| npu_memory_utilization | Memory currently used (in bytes) |
-| npu_current_frequency_mhz | Current clock frequency (in MHz) |
-| npu_max_frequency_mhz | Maximum clock frequency (in MHz) |
-(The latter three are available since linux-6.15.)
-====== A first look at InstructLab ======
-The [[https://github.com/instructlab/instructlab|Github page]] has installation
-instructions, but they offer only four choices:
-  * Install with Apple Metal (accelerators in recent Macbooks)
-  * Install with AMD ROCm (to utilize AMD GPUs)
-  * Install with Intel CUDA (utilizing NVIDIA GPUs)
-  * Install without acceleration (utilizing the CPU only)
-After choosing the latter variant and following the basic setup guide serving a
-model and chatting with it is basically possible:
-  >>> How are you today?                                                    [S][default]
-  ╭──────────────────────────── granite-7b-lab-Q4_K_M.gguf ────────────────────────────╮
-  │ Thank you for asking! I'm doing well today. I'm an AI language model, so I don't   │
-  │ have feelings or emotions, but I'm here and ready to help you with any questions   │
-  │ or tasks you might have. How can I assist you today?                               │
-  ╰──────────────────────────────────────────────────────────── elapsed 7.078 seconds ─╯
-Attempting to train the model shows weird behaviour, though: The busy ''ilab
-data generate'' command seems to read filesystem contents outside of the
-(modified) taxonomy repository, and moreover it seems to follow symlinks, with
-inadvertent results:
-  % strace -fxp <ilab PID>
-  [...]
-  [pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-amd-mp2-plat.c", {st_mode=S_IFREG|0644, st_size=9621, ...}) = 0
-  [pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-at91.h", {st_mode=S_IFREG|0644, st_size=6823, ...}) = 0
-  [pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-parport.c", {st_mode=S_IFREG|0644, st_size=10747, ...}) = 0
-  [pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-cadence.c", {st_mode=S_IFREG|0644, st_size=46715, ...}) = 0
-  [pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-npcm7xx.c", {st_mode=S_IFREG|0644, st_size=71028, ...}) = 0
-  [pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-mv64xxx.c", {st_mode=S_IFREG|0644, st_size=31017, ...}) = 0
-Apparently it has found the //build// symlink typically found in kernel module
-install directories. In this case, that symlink sits in a subdirectory of the
-one it points at and the crawler is obviously ignorant of that. While it's busy
-following symlinks, the command does not react to CTRL-c key combination,
-yet it behaves when sent SIGTERM via ''kill'', at least.
-====== Backends of Backends ======
-Leaving model training aside for now, a closer look at ''ilab model serve
---help'' output reveals there are two possible backends to use:
-[[https://github.com/vllm-project/vllm|vLLM]] and
-[[https://github.com/ggml-org/llama.cpp|llama.cpp]].
-===== vLLM =====
-The former claims to support Intel GPUs, its
-[[https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html|install
-page]] has a tab named "Intel XPU". One has to build the package from source,
-but apart from vague requirements to install Intel GPU drivers and OneAPI the
-instructions are pretty straightforward. As it turns out, installing
-//intel-compute-runtime// package via the distribution's package manger seems
-to suffice.
-Interestingly, the repository's //requirements/xpu.txt// file which the
-instructions point at references XPU-enabled builds of ''pytorch''. There is a
-quick way of checking whether it is happy with the system so far:
-  % . /tmp/my_venv/bin/activate
-  (my_venv) % python
-  >>> import torch
-  >>> torch.xpu.is_available()
-  True
-In Fedora42 for instance, the module would complain and return False:
-  >>> torch.xpu.is_available()
-  /home/me/ilab_venv/lib64/python3.12/site-packages/torch/xpu/__init__.py:60: UserWarning: XPU device count is zero! (Triggered internally at /pytorch/c10/xpu/XPUFunctions.cpp:115.)
-    return torch._C._xpu_getDeviceCount()
-  False
-Another simple health check is via ''clinfo'' tool. If //intel-comput-runtime//
-package is correctly installed, it should find the local GPU:
-  % clinfo -l
-  Platform #0: Intel(R) OpenCL Graphics
-   `-- Device #0: Intel(R) Graphics
-This is the case on Fedora42, so obviously not sufficient to check accelerator
-availability.
-If things look fine, one may try to serve the model using vLLM backend to see
-what happens. The output is pretty excessive, so the following listing omits
-large parts:
-  (my_venv) % ilab model serve --backend vllm
-  WARNING 2025-06-27 00:22:00,347 instructlab.model.backends.backends:96: The serving backend 'vllm' was configured explicitly, but the provided model is not compatible with it. The model was detected as 'llama-cpp, reason: model is a GGUF file.'.
-  The backend startup sequence will continue with the configured backend but might fail.
-  [...]
-  DEBUG 06-27 00:22:07 [__init__.py:138] Checking if XPU platform is available.
-  [W627 00:22:08.949943771 OperatorEntry.cpp:154] Warning: Warning only once for all operators,  other operators may also be overridden.
-    Overriding a previously registered kernel for the same operator and the same dispatch key
-    operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
-      registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
-    dispatch key: XPU
-    previous kernel: registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:37
-         Gew kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/gpu/xpu/ATen/RegisterXPU_0.cpp:186 (function operator())
-  DEBUG 06-27 00:22:09 [__init__.py:146] Confirmed XPU platform is available.
-  [...]
-  WARNING 06-27 00:22:23 [_logger.py:68] device type=xpu is not supported by the V1 Engine. Falling back to V0.
-  WARNING 06-27 00:22:23 [_logger.py:68] Unknown device name intel(r) graphics, always use float16
-  WARNING 06-27 00:22:23 [_logger.py:68] bfloat16 is only supported on Intel Data Center GPU, Intel Arc GPU is not supported yet. Your device is Intel(R) Graphics, which is not supported. will fallback to float16
-  WARNING 06-27 00:22:23 [_logger.py:68] CUDA graph is not supported on XPU, fallback to the eager mode.
-  ERROR 06-27 00:22:23 [xpu.py:108] Both start methods (spawn and fork) have issue on XPU if you use mp backend, setting it to ray instead.
-  [...]
-  WARNING 06-27 00:23:20 [_logger.py:68] No existing RAY instance detected. A new instance will be launched with current node resources.
-  [...]
-  ERROR 06-27 00:23:42 [worker_base.py:622] NotImplementedError: The operator 'vllm::_apply_gguf_embedding' is not currently implemented for the XPU device. Please open a feature on https://github.com/intel/torch-xpu-ops/issues. You can set the environment variable `PYTORCH_ENABLE_XPU_FALLBACK=1` to use the CPU implementation as a fallback for XPU unimplemented operators. WARNING: this will bring unexpected performance compared with running natively on XPU.
-  [...]
-  RuntimeError: Engine process failed to start. See stack trace for the root cause.
-A few things to notice from that:
-  * Maybe a different model is required for vLLM
-  * From vLLM's point of view, XPU devices seem to be pretty restricted (or maybe just the consumer one in this notebook?)
-  * There is a CPU fallback for unsupported things. In this case it won't help though, the call fails with: ''NotImplementedError: Could not run 'vllm::_apply_gguf_embedding' with arguments from the 'CPU' backend.''
-Next try with a model in Safetensors format:
-  (my_venv) % ilab model serve --backend vllm --model-path ~/.cache/instructlab/models/instructlab/granite-7b-lab
-  [...]
-  (raylet) [2025-06-27 01:05:09,708 E 20006 20006] (raylet) node_manager.cc:3193: 14 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: e8d0da19e18cda1181e90e67d93f6cb3cc3a6ebbbad9c52ea82cfea1, IP: 192.168.0.11) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 192.168.0.11`
-The OOM condition seems like a dead end.
-===== llama.cpp =====
-The Github page lists a number of
-[[https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#supported-backends|supported backends]],
-the interesting one is
-[[https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md|SYCL]]
-as it is described as "primarily designed for Intel GPUs".
-To build with SYCL support, Intel's proprietary //icx// and //icpx// compilers
-need to be present. These come in a
-[[https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html|self-extracting archive with a binary installer]],
-so basically a worst-case scenario for anyone interested in system security.
-A convenient way to recompile the library is via reinstalling
-//llama-cpp-python// wheel using pip:
-  (my_venv) % pip cache remove llama_cpp_python
-  (my_venv) % . /opt/intel/oneapi/setvars.sh
-  (my_venv) % CMAKE_ARGS="-DGGML_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install --verbose --force-reinstall 'llama-cpp-python[server]'
-With the freshly built library in place, GPU offloading may be verified by inspecting debug output printed by ilab with //--verbose// option:
-  (my_venv) % ilab --verbose model serve
-  [...]
-  load_tensors: loading model tensors, this can take a while... (mmap = true)
-  load_tensors: layer   0 assigned to device SYCL0, is_swa = 0
-  load_tensors: layer   1 assigned to device SYCL0, is_swa = 0
-  [...]
-  load_tensors: layer  31 assigned to device SYCL0, is_swa = 0
-  load_tensors: layer  32 assigned to device SYCL0, is_swa = 0
-  load_tensors: tensor 'token_embd.weight' (q4_K) (and 0 others) cannot be used with preferred buffer type SYCL_Host, using CPU instead
-  load_tensors: offloading 32 repeating layers to GPU
-  load_tensors: offloading output layer to GPU
-  load_tensors: offloaded 33/33 layers to GPU
-Response time when chatting with the model decreased, though:
-  >>> How are you today?                                                    [S][default]
-  ╭──────────────────────────── granite-7b-lab-Q4_K_M.gguf ────────────────────────────╮
-  │ Thank you for asking! I'm doing well today. I'm an AI language model, so I don't   │
-  │ have feelings or emotions, but I'm fully operational and ready to assist you with  │
-  │ any questions or tasks you might have. How can I help you today?                   │
-  ╰─────────────────────────────────────────────────────────── elapsed 19.407 seconds ─╯
-This does not seem right. Also contents of the various
-///sys/class/accel/accel0/device/npu_*// files remain unchanged. So either the
-offloading is not functional as intended or it is simply not used for this
-specific use-case. If so, there should not be a difference in performance, though.
-===== Summary ======
-While all involved software components allegedly support offloading to the
-notebook's Intel GPU, doing so leads to a (slightly) worse user experience in
-best case and breaks functionality in worst case.
-Many questions remain though, more investigation is needed for a better
-picture. The best direction in which to push this forward seems to be using
-//llama.cpp// backend and identifying either why NPU performance counters don't
-increase, the NPU is not used when it should or which use-case will actually
-leverage it.