Running InstructLab on a Lenovo Thinkpad X1 Carbon Gen 12

This notebook's hardware is advertised as Powered by Intel® Core™ Ultra processors, with integrated AI and indeed there seem to be dedicated devices present for that purpose:

% lspci
[...]
00:08.0 System peripheral: Intel Corporation Meteor Lake-P Gaussian & Neural-Network Accelerator (rev 20)
[...]
00:0b.0 Processing accelerators: Intel Corporation Meteor Lake NPU (rev 04)
[...]

For the second one above there is a kernel driver enabled by CONFIG_DRM_ACCEL_IVPU symbol. In the config menu, it sits in:

> Device Drivers > Compute Acceleration Framework

When building as a module, it is called intel_vpu.ko.

Once loaded, there appears a /dev/accel/accel0 node in devtmpfs and sysfs gains a new class/accel/accel0 symlink pointing at the PCI device. Interesting attributes in /sys/class/accel/accel0/device:

npu_busy_time_us	The time this NPU spent executing jobs (in us)
npu_memory_utilization	Memory currently used (in bytes)
npu_current_frequency_mhz	Current clock frequency (in MHz)
npu_max_frequency_mhz	Maximum clock frequency (in MHz)

(The latter three are available since linux-6.15.)

A first look at InstructLab

The Github page has installation instructions, but they offer only four choices:

Install with Apple Metal (accelerators in recent Macbooks)
Install with AMD ROCm (to utilize AMD GPUs)
Install with Intel CUDA (utilizing NVIDIA GPUs)
Install without acceleration (utilizing the CPU only)

After choosing the latter variant and following the basic setup guide serving a model and chatting with it is basically possible:

>>> How are you today?                                                    [S][default]
╭──────────────────────────── granite-7b-lab-Q4_K_M.gguf ────────────────────────────╮
│ Thank you for asking! I'm doing well today. I'm an AI language model, so I don't   │
│ have feelings or emotions, but I'm here and ready to help you with any questions   │
│ or tasks you might have. How can I assist you today?                               │
╰──────────────────────────────────────────────────────────── elapsed 7.078 seconds ─╯

Attempting to train the model shows weird behaviour, though: The busy ilab data generate command seems to read filesystem contents outside of the (modified) taxonomy repository, and moreover it seems to follow symlinks, with inadvertent results:

% strace -fxp <ilab PID>
[...]
[pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-amd-mp2-plat.c", {st_mode=S_IFREG|0644, st_size=9621, ...}) = 0
[pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-at91.h", {st_mode=S_IFREG|0644, st_size=6823, ...}) = 0
[pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-parport.c", {st_mode=S_IFREG|0644, st_size=10747, ...}) = 0
[pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-cadence.c", {st_mode=S_IFREG|0644, st_size=46715, ...}) = 0
[pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-npcm7xx.c", {st_mode=S_IFREG|0644, st_size=71028, ...}) = 0
[pid 21254] stat("./git/linux-minime/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/modules/lib/modules/6.16.0-rc1-00201-g746c9b4f6a27/build/drivers/i2c/busses/i2c-mv64xxx.c", {st_mode=S_IFREG|0644, st_size=31017, ...}) = 0

Apparently it has found the build symlink typically found in kernel module install directories. In this case, that symlink sits in a subdirectory of the one it points at and the crawler is obviously ignorant of that. While it's busy following symlinks, the command does not react to CTRL-c key combination, yet it behaves when sent SIGTERM via kill, at least.

Backends of Backends

Leaving model training aside for now, a closer look at ilab model serve –help output reveals there are two possible backends to use: vLLM and llama.cpp.

vLLM

The former claims to support Intel GPUs, its install page has a tab named “Intel XPU”. One has to build the package from source, but apart from vague requirements to install Intel GPU drivers and OneAPI the instructions are pretty straightforward. As it turns out, installing intel-compute-runtime package via the distribution's package manger seems to suffice.

Interestingly, the repository's requirements/xpu.txt file which the instructions point at references XPU-enabled builds of pytorch. There is a quick way of checking whether it is happy with the system so far:

% . /tmp/my_venv/bin/activate
(my_venv) % python
>>> import torch
>>> torch.xpu.is_available()
True

In Fedora42 for instance, the module would complain and return False:

>>> torch.xpu.is_available()
/home/me/ilab_venv/lib64/python3.12/site-packages/torch/xpu/__init__.py:60: UserWarning: XPU device count is zero! (Triggered internally at /pytorch/c10/xpu/XPUFunctions.cpp:115.)
  return torch._C._xpu_getDeviceCount()
False

Another simple health check is via clinfo tool. If intel-comput-runtime package is correctly installed, it should find the local GPU:

% clinfo -l
Platform #0: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Graphics

This is the case on Fedora42, so obviously not sufficient to check accelerator availability.

If things look fine, one may try to serve the model using vLLM backend to see what happens. The output is pretty excessive, so the following listing omits large parts:

(my_venv) % ilab model serve --backend vllm
WARNING 2025-06-27 00:22:00,347 instructlab.model.backends.backends:96: The serving backend 'vllm' was configured explicitly, but the provided model is not compatible with it. The model was detected as 'llama-cpp, reason: model is a GGUF file.'.
The backend startup sequence will continue with the configured backend but might fail.
[...]
DEBUG 06-27 00:22:07 [__init__.py:138] Checking if XPU platform is available.
[W627 00:22:08.949943771 OperatorEntry.cpp:154] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
    registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:37
       Gew kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/gpu/xpu/ATen/RegisterXPU_0.cpp:186 (function operator())
DEBUG 06-27 00:22:09 [__init__.py:146] Confirmed XPU platform is available.
[...]
WARNING 06-27 00:22:23 [_logger.py:68] device type=xpu is not supported by the V1 Engine. Falling back to V0.
WARNING 06-27 00:22:23 [_logger.py:68] Unknown device name intel(r) graphics, always use float16
WARNING 06-27 00:22:23 [_logger.py:68] bfloat16 is only supported on Intel Data Center GPU, Intel Arc GPU is not supported yet. Your device is Intel(R) Graphics, which is not supported. will fallback to float16
WARNING 06-27 00:22:23 [_logger.py:68] CUDA graph is not supported on XPU, fallback to the eager mode.
ERROR 06-27 00:22:23 [xpu.py:108] Both start methods (spawn and fork) have issue on XPU if you use mp backend, setting it to ray instead.
[...]
WARNING 06-27 00:23:20 [_logger.py:68] No existing RAY instance detected. A new instance will be launched with current node resources.
[...]
ERROR 06-27 00:23:42 [worker_base.py:622] NotImplementedError: The operator 'vllm::_apply_gguf_embedding' is not currently implemented for the XPU device. Please open a feature on https://github.com/intel/torch-xpu-ops/issues. You can set the environment variable `PYTORCH_ENABLE_XPU_FALLBACK=1` to use the CPU implementation as a fallback for XPU unimplemented operators. WARNING: this will bring unexpected performance compared with running natively on XPU.
[...]
RuntimeError: Engine process failed to start. See stack trace for the root cause.

A few things to notice from that:

Maybe a different model is required for vLLM
From vLLM's point of view, XPU devices seem to be pretty restricted (or maybe just the consumer one in this notebook?)
There is a CPU fallback for unsupported things. In this case it won't help though, the call fails with: NotImplementedError: Could not run 'vllm::_apply_gguf_embedding' with arguments from the 'CPU' backend.

Next try with a model in Safetensors format:

(my_venv) % ilab model serve --backend vllm --model-path ~/.cache/instructlab/models/instructlab/granite-7b-lab
[...]
(raylet) [2025-06-27 01:05:09,708 E 20006 20006] (raylet) node_manager.cc:3193: 14 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: e8d0da19e18cda1181e90e67d93f6cb3cc3a6ebbbad9c52ea82cfea1, IP: 192.168.0.11) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 192.168.0.11`

The OOM condition seems like a dead end.

llama.cpp

The Github page lists a number of supported backends, the interesting one is SYCL as it is described as “primarily designed for Intel GPUs”.

To build with SYCL support, Intel's proprietary icx and icpx compilers need to be present. These come in a self-extracting archive with a binary installer, so basically a worst-case scenario for anyone interested in system security.

A convenient way to recompile the library is via reinstalling llama-cpp-python wheel using pip:

(my_venv) % pip cache remove llama_cpp_python
(my_venv) % . /opt/intel/oneapi/setvars.sh
(my_venv) % CMAKE_ARGS="-DGGML_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install --verbose --force-reinstall 'llama-cpp-python[server]'

With the freshly built library in place, GPU offloading may be verified by inspecting debug output printed by ilab with –verbose option:

(my_venv) % ilab --verbose model serve
[...]
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device SYCL0, is_swa = 0
load_tensors: layer   1 assigned to device SYCL0, is_swa = 0
[...]
load_tensors: layer  31 assigned to device SYCL0, is_swa = 0
load_tensors: layer  32 assigned to device SYCL0, is_swa = 0
load_tensors: tensor 'token_embd.weight' (q4_K) (and 0 others) cannot be used with preferred buffer type SYCL_Host, using CPU instead
load_tensors: offloading 32 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 33/33 layers to GPU

Response time when chatting with the model decreased, though:

>>> How are you today?                                                    [S][default]
╭──────────────────────────── granite-7b-lab-Q4_K_M.gguf ────────────────────────────╮
│ Thank you for asking! I'm doing well today. I'm an AI language model, so I don't   │
│ have feelings or emotions, but I'm fully operational and ready to assist you with  │
│ any questions or tasks you might have. How can I help you today?                   │
╰─────────────────────────────────────────────────────────── elapsed 19.407 seconds ─╯

This does not seem right. Also contents of the various /sys/class/accel/accel0/device/npu_* files remain unchanged. So either the offloading is not functional as intended or it is simply not used for this specific use-case. If so, there should not be a difference in performance, though.

Summary

While all involved software components allegedly support offloading to the notebook's Intel GPU, doing so leads to a (slightly) worse user experience in best case and breaks functionality in worst case.

Many questions remain though, more investigation is needed for a better picture. The best direction in which to push this forward seems to be using llama.cpp backend and identifying either why NPU performance counters don't increase, the NPU is not used when it should or which use-case will actually leverage it.

DokuWiki

Table of Contents