### Your current environment
<details>
<summary>The output of <code>python coll…ect_env.py</code></summary>
```text
==============================
System Info
==============================
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version : Could not collect
CMake version : version 3.22.1
Libc version : glibc-2.35
==============================
PyTorch Info
==============================
PyTorch version : 2.7.1+cu126
Is debug build : False
CUDA used to build PyTorch : 12.6
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.11.13 | packaged by conda-forge | (main, Jun 4 2025, 14:48:23) [GCC 13.3.0] (64-bit runtime)
Python platform : Linux-6.8.0-1017-aws-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.4.131
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration : GPU 0: NVIDIA A10G
Nvidia driver version : 550.127.05
cuDNN version : Could not collect
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7R32
CPU family: 23
Model: 49
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
BogoMIPS: 5599.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 256 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 4 MiB (8 instances)
L3 cache: 32 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] pyzmq==27.0.2
[pip3] torch==2.7.1
[pip3] torchaudio==2.7.1
[pip3] torchvision==0.22.1
[pip3] transformers==4.55.4
[pip3] triton==3.3.1
[conda] numpy 2.2.6 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.6.4.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.6.80 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.5.1.17 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.3.0.4 pypi_0 pypi
[conda] nvidia-cufile-cu12 1.11.1.6 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.7.77 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.7.1.2 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.5.4.2 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.6.85 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.6.77 pypi_0 pypi
[conda] pyzmq 27.0.2 py311h2315fbb_2 conda-forge
[conda] torch 2.7.1 pypi_0 pypi
[conda] torchaudio 2.7.1 pypi_0 pypi
[conda] torchvision 0.22.1 pypi_0 pypi
[conda] transformers 4.55.4 pypi_0 pypi
[conda] triton 3.3.1 pypi_0 pypi
==============================
vLLM Info
==============================
ROCM Version : Could not collect
Neuron SDK Version : N/A
vLLM Version : 0.10.1rc2.dev346+ga1da2f381.d20250831 (git sha: a1da2f381, date: 20250831)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 0-15 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
==============================
Environment Variables
==============================
VLLM_USE_PRECOMPILED=1
LD_LIBRARY_PATH=/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
```
</details>
### 🐛 Describe the bug
* In `gemma3n_mm.py::_process_audio_input` we call:
`audio_input["input_features"].squeeze(1)`
* For batched audio requests, `input_features` arrives as a Python list →
`AttributeError: 'list' object has no attribute 'squeeze'` → EngineCore dies.
* Result: repeated HTTP 500s on `/v1/chat/completions` and NCCL shutdown warning.
*Logs attached below.*
<details>
<summary>Logs from <code>vLLM api server</code></code></summary>
```text
(APIServer pid=479785) [vllm]WARNING 08-31 16:47:37 [__init__.py:1671] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [serving_responses.py:126] Using default chat sampling params from model: {'top_k': 64, 'top_p': 0.95}
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [serving_chat.py:137] Using default chat sampling params from model: {'top_k': 64, 'top_p': 0.95}
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [serving_completion.py:79] Using default completion sampling params from model: {'top_k': 64, 'top_p': 0.95}
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [api_server.py:1960] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:36] Available routes are:
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /docs, Methods: HEAD, GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /redoc, Methods: HEAD, GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /health, Methods: GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /load, Methods: GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /ping, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /ping, Methods: GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /tokenize, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /detokenize, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/models, Methods: GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /version, Methods: GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/responses, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/chat/completions, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/completions, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/embeddings, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /pooling, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /classify, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /score, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/score, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/audio/translations, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /rerank, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/rerank, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v2/rerank, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /invocations, Methods: POST
(APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /metrics, Methods: GET
(APIServer pid=479785) INFO: Started server process [479785]
(APIServer pid=479785) INFO: Waiting for application startup.
(APIServer pid=479785) INFO: Application startup complete.
(APIServer pid=479785) [vllm]INFO 08-31 16:47:50 [chat_utils.py:470] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
(APIServer pid=479785) [vllm]DEBUG 08-31 16:47:58 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(EngineCore_0 pid=480353) [vllm]DEBUG 08-31 16:48:07 [core.py:746] EngineCore loop active.
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.10.1rc2.dev346+ga1da2f381.d20250831) with config: model='google/gemma-3n-E2B-it', speculative_config=None, tokenizer='google/gemma-3n-E2B-it', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=google/gemma-3n-E2B-it, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":2,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null},
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-c7174594ad734987a8be9044aa54665e,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-5.4375, -6.1250, -5.2812, ..., -4.6562, -4.6250, -4.6562],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-5.1562, -5.3125, -6.1562, ..., -3.7656, -4.0938, -4.4688],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.2812, -6.0938, -4.9688, ..., -3.5938, -3.7344, -3.9375],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ...,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.3125, -5.0000, -4.1875, ..., -1.9688, -2.5312, -2.5000],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.6406, -3.7031, -3.7344, ..., -2.8125, -2.0469, -2.6094],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.1562, -3.6250, -4.5938, ..., -2.4688, -1.7891, -2.5312]],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['3b393679d5be4e9257e7a285dfaa25a0933ab1a6ef2252183a7254e3a495a6af'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198], [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211], [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224], [112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237], [149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250]),num_computed_tokens=384,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None)), NewRequestData(req_id=chatcmpl-edd61485740249969aef39374e833eb8,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-5.4375, -6.1250, -5.2812, ..., -4.6562, -4.6250, -4.6562],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-5.1562, -5.3125, -6.1562, ..., -3.7656, -4.0938, -4.4688],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.2812, -6.0938, -4.9688, ..., -3.5938, -3.7344, -3.9375],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ...,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.3125, -5.0000, -4.1875, ..., -1.9688, -2.5312, -2.5000],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.6406, -3.7031, -3.7344, ..., -2.8125, -2.0469, -2.6094],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.1562, -3.6250, -4.5938, ..., -2.4688, -1.7891, -2.5312]],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['3b393679d5be4e9257e7a285dfaa25a0933ab1a6ef2252183a7254e3a495a6af'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 251], [0, 0, 0, 0, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 252], [0, 0, 0, 0, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 253], [0, 0, 0, 0, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 254], [0, 0, 0, 0, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 255]),num_computed_tokens=576,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None)), NewRequestData(req_id=chatcmpl-073eb6d113954830a030a4a120e71394,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-5.4375, -6.1250, -5.2812, ..., -4.6562, -4.6250, -4.6562],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-5.1562, -5.3125, -6.1562, ..., -3.7656, -4.0938, -4.4688],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.2812, -6.0938, -4.9688, ..., -3.5938, -3.7344, -3.9375],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ...,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.3125, -5.0000, -4.1875, ..., -1.9688, -2.5312, -2.5000],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.6406, -3.7031, -3.7344, ..., -2.8125, -2.0469, -2.6094],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.1562, -3.6250, -4.5938, ..., -2.4688, -1.7891, -2.5312]],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['3b393679d5be4e9257e7a285dfaa25a0933ab1a6ef2252183a7254e3a495a6af'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 256], [0, 0, 0, 0, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 257], [0, 0, 0, 0, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 258], [0, 0, 0, 0, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 259], [0, 0, 0, 0, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 260]),num_computed_tokens=576,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None)), NewRequestData(req_id=chatcmpl-45ddc6c4fbc64e6c81d1926f0944aeeb,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-11.5000, -11.5000, -11.5000, ..., -11.5000, -11.5000, -11.5000],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-11.5000, -11.5000, -11.5000, ..., -11.5000, -11.5000, -11.5000],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-11.5000, -11.5000, -11.5000, ..., -11.5000, -11.5000, -11.5000],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ...,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [ -3.3750, -3.4375, -3.7656, ..., -2.2812, -2.5156, -2.1719],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [ -3.9219, -3.3438, -2.9531, ..., -2.2812, -2.3281, -2.0938],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [ -4.2500, -3.6562, -3.1719, ..., -2.1719, -2.4844, -2.4375]],
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['1730b4fec2eab85509df76cfb7201ff0fb0bb0f9e9af266fa3398b172ad9469b'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273], [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286], [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299], [112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312], [149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325]),num_computed_tokens=384,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None))], scheduled_cached_reqs=CachedRequestData(req_ids=['chatcmpl-9dccaf1e7b51468b8bcaf31ce2590bfb'], resumed_from_preemption=[false], new_token_ids=[], new_block_ids=[null], num_computed_tokens=[588]), num_scheduled_tokens={chatcmpl-edd61485740249969aef39374e833eb8: 12, chatcmpl-c7174594ad734987a8be9044aa54665e: 204, chatcmpl-9dccaf1e7b51468b8bcaf31ce2590bfb: 1, chatcmpl-45ddc6c4fbc64e6c81d1926f0944aeeb: 204, chatcmpl-073eb6d113954830a030a4a120e71394: 12}, total_num_scheduled_tokens=433, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={chatcmpl-45ddc6c4fbc64e6c81d1926f0944aeeb: [0], chatcmpl-c7174594ad734987a8be9044aa54665e: [0]}, num_common_prefix_blocks=[24, 0, 0, 0, 0], finished_req_ids=[], free_encoder_mm_hashes=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=5, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.006680232677642839, prefix_cache_stats=PrefixCacheStats(reset=False, requests=4, queries=2352, hits=1920), spec_decoding_stats=None, num_corrupted_reqs=0)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] EngineCore encountered a fatal error.
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] Traceback (most recent call last):
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 705, in run_engine_core
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] engine_core.run_busy_loop()
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 732, in run_busy_loop
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] self._process_engine_step()
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 757, in _process_engine_step
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] outputs, model_executed = self.step_fn()
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 291, in step
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] model_output = self.execute_model_with_error_logging(
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 277, in execute_model_with_error_logging
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] raise err
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 268, in execute_model_with_error_logging
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return model_fn(scheduler_output)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/executor/abstract.py", line 95, in execute_model
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] output = self.collective_rpc("execute_model",
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/utils/__init__.py", line 3036, in run_method
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return func(*args, **kwargs)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return func(*args, **kwargs)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] output = self.model_runner.execute_model(scheduler_output,
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return func(*args, **kwargs)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1519, in execute_model
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] self._execute_mm_encoder(scheduler_output)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1168, in _execute_mm_encoder
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] curr_group_outputs = self.model.get_multimodal_embeddings(
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 627, in get_multimodal_embeddings
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] audio_embeddings = self._process_audio_input(multimodal_input)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 577, in _process_audio_input
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] input_features = audio_input["input_features"].squeeze(1)
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] AttributeError: 'list' object has no attribute 'squeeze'
(EngineCore_0 pid=480353) Process EngineCore_0:
(EngineCore_0 pid=480353) Traceback (most recent call last):
(EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_0 pid=480353) self.run()
(EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_0 pid=480353) self._target(*self._args, **self._kwargs)
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 716, in run_engine_core
(EngineCore_0 pid=480353) raise e
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 705, in run_engine_core
(EngineCore_0 pid=480353) engine_core.run_busy_loop()
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 732, in run_busy_loop
(EngineCore_0 pid=480353) self._process_engine_step()
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 757, in _process_engine_step
(EngineCore_0 pid=480353) outputs, model_executed = self.step_fn()
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 291, in step
(EngineCore_0 pid=480353) model_output = self.execute_model_with_error_logging(
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 277, in execute_model_with_error_logging
(EngineCore_0 pid=480353) raise err
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 268, in execute_model_with_error_logging
(EngineCore_0 pid=480353) return model_fn(scheduler_output)
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/executor/abstract.py", line 95, in execute_model
(EngineCore_0 pid=480353) output = self.collective_rpc("execute_model",
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_0 pid=480353) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/utils/__init__.py", line 3036, in run_method
(EngineCore_0 pid=480353) return func(*args, **kwargs)
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_0 pid=480353) return func(*args, **kwargs)
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_worker.py", line 362, in execute_model
(EngineCore_0 pid=480353) output = self.model_runner.execute_model(scheduler_output,
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_0 pid=480353) return func(*args, **kwargs)
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1519, in execute_model
(EngineCore_0 pid=480353) self._execute_mm_encoder(scheduler_output)
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1168, in _execute_mm_encoder
(EngineCore_0 pid=480353) curr_group_outputs = self.model.get_multimodal_embeddings(
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 627, in get_multimodal_embeddings
(EngineCore_0 pid=480353) audio_embeddings = self._process_audio_input(multimodal_input)
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 577, in _process_audio_input
(EngineCore_0 pid=480353) input_features = audio_input["input_features"].squeeze(1)
(EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=480353) AttributeError: 'list' object has no attribute 'squeeze'
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] AsyncLLM output_handler failed.
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] Traceback (most recent call last):
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/async_llm.py", line 412, in output_handler
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] outputs = await engine_core.get_output_async()
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core_client.py", line 843, in get_output_async
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] raise self._format_exception(outputs) from None
(APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=479785) INFO: 127.0.0.1:34814 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34800 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34764 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34772 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34796 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34824 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34782 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34848 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) [vllm]INFO 08-31 16:48:08 [loggers.py:123] Engine 000: Avg prompt throughput: 58.8 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0%
[rank0]:[W831 16:48:09.099020087 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=479785) INFO: 127.0.0.1:34824 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34782 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34800 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34848 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34796 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34764 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34772 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34814 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34824 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34848 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34772 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34782 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34796 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34764 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34800 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34814 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
(APIServer pid=479785) INFO: Shutting down
(APIServer pid=479785) INFO: Waiting for application shutdown.
(APIServer pid=479785) INFO: Application shutdown complete.
(APIServer pid=479785) INFO: Finished server process [479785]
(vln) ubuntu@ip-10-12-1-224:~/yashpratap/exp/vllm$
```
</details>
### Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.