Описание
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.
A vulnerability in vLLM allows an authenticated user to trigger unintended tokenization during chat template processing by supplying crafted chat_template_kwargs to the /v1/chat/completions or /tokenize endpoints. By forcing the server to tokenize very large inputs, an attacker can block the API server’s event loop for extended periods, causing a denial of service and delaying all other requests.
Отчет
The flaw is limited to a denial-of-service vector that requires an authenticated user and relies on abusing an optional, non-security-critical parameter (chat_template_kwargs) to force unexpected tokenization during template application, which is computationally expensive but not indicative of data corruption, privilege escalation, or code execution. The attacker cannot break isolation boundaries or execute arbitrary logic—they can only cause the server’s event loop to stall through large crafted inputs, and only if they already have access to the vLLM API. Moreover, the DoS condition is resource-intensive, depends heavily on model size and server configuration, and does not persist once the malicious request completes. Because the impact is bounded to temporary availability degradation without confidentiality or integrity loss, and because exploitation requires legitimate API access and large payloads, this issue aligns with a Moderate severity rather than an Important/High flaw.
Меры по смягчению последствий
No mitigation is currently available that meets Red Hat Product Security’s standards for usability, deployment, applicability, or stability.
Затронутые пакеты
| Платформа | Пакет | Состояние | Рекомендация | Релиз |
|---|---|---|---|---|
| Red Hat AI Inference Server | rhaiis/vllm-spyre-rhel9 | Fix deferred | ||
| Red Hat AI Inference Server | rhaiis/vllm-tpu-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-amd-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-aws-nvidia-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-azure-amd-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-azure-nvidia-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-gcp-nvidia-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-intel-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/bootc-nvidia-rhel9 | Fix deferred | ||
| Red Hat Enterprise Linux AI (RHEL AI) | rhelai1/instructlab-amd-rhel9 | Fix deferred |
Показывать по
Ссылки на источники
Дополнительная информация
Статус:
EPSS
6.5 Medium
CVSS3
Связанные уязвимости
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.
vLLM is an inference and serving engine for large language models (LLM ...
vLLM vulnerable to DoS via large Chat Completion or Tokenization requests with specially crafted `chat_template_kwargs`
Уязвимость библиотеки для работы с большими языковыми моделями (LLM) vLLM, связанная с неограниченным распределением ресурсов, позволяющая нарушителю вызвать отказ в обслуживании
EPSS
6.5 Medium
CVSS3