Описание
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
A vulnerability in vLLM allows attackers to supply malicious serialized prompt-embedding tensors that are deserialized using torch.load() without validation. Due to PyTorch 2.8.0 disabling sparse-tensor integrity checks by default, a crafted tensor can bypass bounds checks and cause an out-of-bounds write during to_dense(), leading to a crash (DoS) and potentially remote code execution on the vLLM server.
Отчет
This vulnerability is considered important rather than moderate because it involves unsafe deserialization leading to memory corruption in a network-reachable, unauthenticated API path. Unlike typical moderate flaws that may only allow limited DoS or require specific conditions, this issue allows an attacker to supply a crafted sparse tensor that triggers an out-of-bounds memory write during PyTorch’s to_dense() conversion. Memory corruption in a server process handling untrusted input significantly elevates security risk because it can lead not only to a reliable crash but also to potential remote code execution, enabling full compromise of the vLLM service. Additionally, the affected code path is part of the standard Completions API workflow, making the attack surface broadly exposed in real deployments. The combination of remote exploitability, unauthenticated access, memory corruption, and potential RCE clearly positions this issue above a moderate classification and into an important severity level.
Меры по смягчению последствий
No mitigation is currently available that meets Red Hat Product Security’s standards for usability, deployment, applicability, or stability.
Затронутые пакеты
| Платформа | Пакет | Состояние | Рекомендация | Релиз |
|---|---|---|---|---|
| Red Hat AI Inference Server | rhaiis/vllm-spyre-rhel9 | Affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-amd-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-aws-nvidia-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-azure-amd-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-azure-nvidia-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-gcp-nvidia-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-intel-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-nvidia-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/instructlab-amd-rhel9 | Not affected | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/instructlab-intel-rhel9 | Not affected |
Показывать по
Ссылки на источники
Дополнительная информация
Статус:
8.8 High
CVSS3
Связанные уязвимости
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
vLLM is an inference and serving engine for large language models (LLM ...
vLLM deserialization vulnerability leading to DoS and potential RCE
Уязвимость компонента Completions API библиотеки для работы с большими языковыми моделями (LLM) vLLM, позволяющая нарушителю вызвать отказ в обслуживании и выполнить произвольный код
8.8 High
CVSS3