Is the vLLM-compatible model Gemma‑3n‑E4B‑IT supported for audio concurrency?

Dibyajyoti_Mishra · October 31, 2025, 6:40am

I am running the Gemma-3n-4B-IT audio model using vLLM. It works fine for a single audio input, but when I process multiple audio inputs concurrently using asyncio, I encounter the following error:
AttributeError: 'list' object has no attribute 'squeeze'.
Could someone please help me understand whether vLLM supports concurrency for audio models, and if so, how to resolve this issue?

Pannaga_J · November 6, 2025, 6:13am

Hi @Dibyajyoti_Mishra

Could you provide a minimal script that allows us to reproduce the bug?

Also I suggest you look at this related issue on the vLLM GitHub repo, as it seems to be similar to the issue you reported. Hope this helps.

github.com/vllm-project/vllm

[Bug]: Gemma3n audio path crashes when input_features is a list not a Tensor.

opened 04:58PM - 31 Aug 25 UTC

closed 02:23PM - 02 Sep 25 UTC

pratapyash

bug

### Your current environment <details> <summary>The output of <code>python coll…ect_env.py</code></summary> ```text ============================== System Info ============================== OS : Ubuntu 22.04.5 LTS (x86_64) GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 Clang version : Could not collect CMake version : version 3.22.1 Libc version : glibc-2.35 ============================== PyTorch Info ============================== PyTorch version : 2.7.1+cu126 Is debug build : False CUDA used to build PyTorch : 12.6 ROCM used to build PyTorch : N/A ============================== Python Environment ============================== Python version : 3.11.13 | packaged by conda-forge | (main, Jun 4 2025, 14:48:23) [GCC 13.3.0] (64-bit runtime) Python platform : Linux-6.8.0-1017-aws-x86_64-with-glibc2.35 ============================== CUDA / GPU Info ============================== Is CUDA available : True CUDA runtime version : 12.4.131 CUDA_MODULE_LOADING set to : LAZY GPU models and configuration : GPU 0: NVIDIA A10G Nvidia driver version : 550.127.05 cuDNN version : Could not collect HIP runtime version : N/A MIOpen runtime version : N/A Is XNNPACK available : True ============================== CPU Info ============================== Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD EPYC 7R32 CPU family: 23 Model: 49 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 0 BogoMIPS: 5599.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid Hypervisor vendor: KVM Virtualization type: full L1d cache: 256 KiB (8 instances) L1i cache: 256 KiB (8 instances) L2 cache: 4 MiB (8 instances) L3 cache: 32 MiB (2 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected ============================== Versions of relevant libraries ============================== [pip3] numpy==2.2.6 [pip3] nvidia-cublas-cu12==12.6.4.1 [pip3] nvidia-cuda-cupti-cu12==12.6.80 [pip3] nvidia-cuda-nvrtc-cu12==12.6.77 [pip3] nvidia-cuda-runtime-cu12==12.6.77 [pip3] nvidia-cudnn-cu12==9.5.1.17 [pip3] nvidia-cufft-cu12==11.3.0.4 [pip3] nvidia-cufile-cu12==1.11.1.6 [pip3] nvidia-curand-cu12==10.3.7.77 [pip3] nvidia-cusolver-cu12==11.7.1.2 [pip3] nvidia-cusparse-cu12==12.5.4.2 [pip3] nvidia-cusparselt-cu12==0.6.3 [pip3] nvidia-nccl-cu12==2.26.2 [pip3] nvidia-nvjitlink-cu12==12.6.85 [pip3] nvidia-nvtx-cu12==12.6.77 [pip3] pyzmq==27.0.2 [pip3] torch==2.7.1 [pip3] torchaudio==2.7.1 [pip3] torchvision==0.22.1 [pip3] transformers==4.55.4 [pip3] triton==3.3.1 [conda] numpy 2.2.6 pypi_0 pypi [conda] nvidia-cublas-cu12 12.6.4.1 pypi_0 pypi [conda] nvidia-cuda-cupti-cu12 12.6.80 pypi_0 pypi [conda] nvidia-cuda-nvrtc-cu12 12.6.77 pypi_0 pypi [conda] nvidia-cuda-runtime-cu12 12.6.77 pypi_0 pypi [conda] nvidia-cudnn-cu12 9.5.1.17 pypi_0 pypi [conda] nvidia-cufft-cu12 11.3.0.4 pypi_0 pypi [conda] nvidia-cufile-cu12 1.11.1.6 pypi_0 pypi [conda] nvidia-curand-cu12 10.3.7.77 pypi_0 pypi [conda] nvidia-cusolver-cu12 11.7.1.2 pypi_0 pypi [conda] nvidia-cusparse-cu12 12.5.4.2 pypi_0 pypi [conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi [conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi [conda] nvidia-nvjitlink-cu12 12.6.85 pypi_0 pypi [conda] nvidia-nvtx-cu12 12.6.77 pypi_0 pypi [conda] pyzmq 27.0.2 py311h2315fbb_2 conda-forge [conda] torch 2.7.1 pypi_0 pypi [conda] torchaudio 2.7.1 pypi_0 pypi [conda] torchvision 0.22.1 pypi_0 pypi [conda] transformers 4.55.4 pypi_0 pypi [conda] triton 3.3.1 pypi_0 pypi ============================== vLLM Info ============================== ROCM Version : Could not collect Neuron SDK Version : N/A vLLM Version : 0.10.1rc2.dev346+ga1da2f381.d20250831 (git sha: a1da2f381, date: 20250831) vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X 0-15 0 N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks ============================== Environment Variables ============================== VLLM_USE_PRECOMPILED=1 LD_LIBRARY_PATH=/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib NCCL_CUMEM_ENABLE=0 PYTORCH_NVML_BASED_CUDA_CHECK=1 TORCHINDUCTOR_COMPILE_THREADS=1 CUDA_MODULE_LOADING=LAZY ``` </details> ### 🐛 Describe the bug * In `gemma3n_mm.py::_process_audio_input` we call: `audio_input["input_features"].squeeze(1)` * For batched audio requests, `input_features` arrives as a Python list → `AttributeError: 'list' object has no attribute 'squeeze'` → EngineCore dies. * Result: repeated HTTP 500s on `/v1/chat/completions` and NCCL shutdown warning. *Logs attached below.* <details> <summary>Logs from <code>vLLM api server</code></code></summary> ```text (APIServer pid=479785) [vllm]WARNING 08-31 16:47:37 [__init__.py:1671] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`. (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [serving_responses.py:126] Using default chat sampling params from model: {'top_k': 64, 'top_p': 0.95} (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [serving_chat.py:137] Using default chat sampling params from model: {'top_k': 64, 'top_p': 0.95} (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [serving_completion.py:79] Using default completion sampling params from model: {'top_k': 64, 'top_p': 0.95} (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [api_server.py:1960] Starting vLLM API server 0 on http://0.0.0.0:8000 (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:36] Available routes are: (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /openapi.json, Methods: HEAD, GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /docs, Methods: HEAD, GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /docs/oauth2-redirect, Methods: HEAD, GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /redoc, Methods: HEAD, GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /health, Methods: GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /load, Methods: GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /ping, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /ping, Methods: GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /tokenize, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /detokenize, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/models, Methods: GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /version, Methods: GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/responses, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/responses/{response_id}, Methods: GET (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/responses/{response_id}/cancel, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/chat/completions, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/completions, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/embeddings, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /pooling, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /classify, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /score, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/score, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/audio/transcriptions, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/audio/translations, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /rerank, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v1/rerank, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /v2/rerank, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /scale_elastic_ep, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /is_scaling_elastic_ep, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /invocations, Methods: POST (APIServer pid=479785) [vllm]INFO 08-31 16:47:37 [launcher.py:44] Route: /metrics, Methods: GET (APIServer pid=479785) INFO: Started server process [479785] (APIServer pid=479785) INFO: Waiting for application startup. (APIServer pid=479785) INFO: Application startup complete. (APIServer pid=479785) [vllm]INFO 08-31 16:47:50 [chat_utils.py:470] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this. (APIServer pid=479785) [vllm]DEBUG 08-31 16:47:58 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% (EngineCore_0 pid=480353) [vllm]DEBUG 08-31 16:48:07 [core.py:746] EngineCore loop active. (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.10.1rc2.dev346+ga1da2f381.d20250831) with config: model='google/gemma-3n-E2B-it', speculative_config=None, tokenizer='google/gemma-3n-E2B-it', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=google/gemma-3n-E2B-it, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":2,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,0],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-c7174594ad734987a8be9044aa54665e,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-5.4375, -6.1250, -5.2812, ..., -4.6562, -4.6250, -4.6562], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-5.1562, -5.3125, -6.1562, ..., -3.7656, -4.0938, -4.4688], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.2812, -6.0938, -4.9688, ..., -3.5938, -3.7344, -3.9375], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ..., (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.3125, -5.0000, -4.1875, ..., -1.9688, -2.5312, -2.5000], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.6406, -3.7031, -3.7344, ..., -2.8125, -2.0469, -2.6094], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.1562, -3.6250, -4.5938, ..., -2.4688, -1.7891, -2.5312]], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['3b393679d5be4e9257e7a285dfaa25a0933ab1a6ef2252183a7254e3a495a6af'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198], [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211], [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224], [112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237], [149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250]),num_computed_tokens=384,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None)), NewRequestData(req_id=chatcmpl-edd61485740249969aef39374e833eb8,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-5.4375, -6.1250, -5.2812, ..., -4.6562, -4.6250, -4.6562], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-5.1562, -5.3125, -6.1562, ..., -3.7656, -4.0938, -4.4688], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.2812, -6.0938, -4.9688, ..., -3.5938, -3.7344, -3.9375], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ..., (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.3125, -5.0000, -4.1875, ..., -1.9688, -2.5312, -2.5000], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.6406, -3.7031, -3.7344, ..., -2.8125, -2.0469, -2.6094], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.1562, -3.6250, -4.5938, ..., -2.4688, -1.7891, -2.5312]], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['3b393679d5be4e9257e7a285dfaa25a0933ab1a6ef2252183a7254e3a495a6af'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 251], [0, 0, 0, 0, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 252], [0, 0, 0, 0, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 253], [0, 0, 0, 0, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 254], [0, 0, 0, 0, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 255]),num_computed_tokens=576,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None)), NewRequestData(req_id=chatcmpl-073eb6d113954830a030a4a120e71394,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-5.4375, -6.1250, -5.2812, ..., -4.6562, -4.6250, -4.6562], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-5.1562, -5.3125, -6.1562, ..., -3.7656, -4.0938, -4.4688], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.2812, -6.0938, -4.9688, ..., -3.5938, -3.7344, -3.9375], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ..., (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-6.3125, -5.0000, -4.1875, ..., -1.9688, -2.5312, -2.5000], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.6406, -3.7031, -3.7344, ..., -2.8125, -2.0469, -2.6094], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-3.1562, -3.6250, -4.5938, ..., -2.4688, -1.7891, -2.5312]], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['3b393679d5be4e9257e7a285dfaa25a0933ab1a6ef2252183a7254e3a495a6af'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 256], [0, 0, 0, 0, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 257], [0, 0, 0, 0, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 258], [0, 0, 0, 0, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 259], [0, 0, 0, 0, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 260]),num_computed_tokens=576,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None)), NewRequestData(req_id=chatcmpl-45ddc6c4fbc64e6c81d1926f0944aeeb,prompt_token_ids_len=588,mm_kwargs=[{'input_features_mask': MultiModalFieldElem(modality='audio', key='input_features_mask', data=tensor([True, True, True, ..., True, True, True]), field=MultiModalBatchedField()), 'input_features': MultiModalFieldElem(modality='audio', key='input_features', data=tensor([[-11.5000, -11.5000, -11.5000, ..., -11.5000, -11.5000, -11.5000], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-11.5000, -11.5000, -11.5000, ..., -11.5000, -11.5000, -11.5000], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [-11.5000, -11.5000, -11.5000, ..., -11.5000, -11.5000, -11.5000], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] ..., (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [ -3.3750, -3.4375, -3.7656, ..., -2.2812, -2.5156, -2.1719], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [ -3.9219, -3.3438, -2.9531, ..., -2.2812, -2.3281, -2.0938], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] [ -4.2500, -3.6562, -3.1719, ..., -2.1719, -2.4844, -2.4375]], (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalBatchedField())}],mm_hashes=['1730b4fec2eab85509df76cfb7201ff0fb0bb0f9e9af266fa3398b172ad9469b'],mm_positions=[PlaceholderRange(offset=391, length=192, is_embed=tensor([False, False, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] True, True, True, True, True, True, True, True, True, True, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:76] False, False]))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.5, top_p=0.95, top_k=64, min_p=0.0, seed=None, stop=[], stop_token_ids=[106], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273], [38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286], [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299], [112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312], [149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325]),num_computed_tokens=384,lora_request=LoRARequest(lora_name='lora0', lora_int_id=1, lora_path='/home/ubuntu/yashpratap/packages/infra/finetuning/adapters-E2B/gemma-3n-E2B-arrow-unscripted-assitant-only-loss-lr32/checkpoint-10200_cleaned', lora_local_path=None, long_lora_max_len=None, base_model_name=None, tensorizer_config_dict=None))], scheduled_cached_reqs=CachedRequestData(req_ids=['chatcmpl-9dccaf1e7b51468b8bcaf31ce2590bfb'], resumed_from_preemption=[false], new_token_ids=[], new_block_ids=[null], num_computed_tokens=[588]), num_scheduled_tokens={chatcmpl-edd61485740249969aef39374e833eb8: 12, chatcmpl-c7174594ad734987a8be9044aa54665e: 204, chatcmpl-9dccaf1e7b51468b8bcaf31ce2590bfb: 1, chatcmpl-45ddc6c4fbc64e6c81d1926f0944aeeb: 204, chatcmpl-073eb6d113954830a030a4a120e71394: 12}, total_num_scheduled_tokens=433, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={chatcmpl-45ddc6c4fbc64e6c81d1926f0944aeeb: [0], chatcmpl-c7174594ad734987a8be9044aa54665e: [0]}, num_common_prefix_blocks=[24, 0, 0, 0, 0], finished_req_ids=[], free_encoder_mm_hashes=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=5, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.006680232677642839, prefix_cache_stats=PrefixCacheStats(reset=False, requests=4, queries=2352, hits=1920), spec_decoding_stats=None, num_corrupted_reqs=0) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] EngineCore encountered a fatal error. (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] Traceback (most recent call last): (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 705, in run_engine_core (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] engine_core.run_busy_loop() (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 732, in run_busy_loop (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] self._process_engine_step() (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 757, in _process_engine_step (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] outputs, model_executed = self.step_fn() (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 291, in step (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] model_output = self.execute_model_with_error_logging( (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 277, in execute_model_with_error_logging (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] raise err (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 268, in execute_model_with_error_logging (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return model_fn(scheduler_output) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/executor/abstract.py", line 95, in execute_model (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] output = self.collective_rpc("execute_model", (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/utils/__init__.py", line 3036, in run_method (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return func(*args, **kwargs) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return func(*args, **kwargs) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_worker.py", line 362, in execute_model (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] output = self.model_runner.execute_model(scheduler_output, (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] return func(*args, **kwargs) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1519, in execute_model (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] self._execute_mm_encoder(scheduler_output) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1168, in _execute_mm_encoder (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] curr_group_outputs = self.model.get_multimodal_embeddings( (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 627, in get_multimodal_embeddings (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] audio_embeddings = self._process_audio_input(multimodal_input) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 577, in _process_audio_input (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] input_features = audio_input["input_features"].squeeze(1) (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) [vllm]ERROR 08-31 16:48:08 [core.py:714] AttributeError: 'list' object has no attribute 'squeeze' (EngineCore_0 pid=480353) Process EngineCore_0: (EngineCore_0 pid=480353) Traceback (most recent call last): (EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_0 pid=480353) self.run() (EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/multiprocessing/process.py", line 108, in run (EngineCore_0 pid=480353) self._target(*self._args, **self._kwargs) (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 716, in run_engine_core (EngineCore_0 pid=480353) raise e (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 705, in run_engine_core (EngineCore_0 pid=480353) engine_core.run_busy_loop() (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 732, in run_busy_loop (EngineCore_0 pid=480353) self._process_engine_step() (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 757, in _process_engine_step (EngineCore_0 pid=480353) outputs, model_executed = self.step_fn() (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 291, in step (EngineCore_0 pid=480353) model_output = self.execute_model_with_error_logging( (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 277, in execute_model_with_error_logging (EngineCore_0 pid=480353) raise err (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 268, in execute_model_with_error_logging (EngineCore_0 pid=480353) return model_fn(scheduler_output) (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/executor/abstract.py", line 95, in execute_model (EngineCore_0 pid=480353) output = self.collective_rpc("execute_model", (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc (EngineCore_0 pid=480353) answer = run_method(self.driver_worker, method, args, kwargs) (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/utils/__init__.py", line 3036, in run_method (EngineCore_0 pid=480353) return func(*args, **kwargs) (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (EngineCore_0 pid=480353) return func(*args, **kwargs) (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_worker.py", line 362, in execute_model (EngineCore_0 pid=480353) output = self.model_runner.execute_model(scheduler_output, (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (EngineCore_0 pid=480353) return func(*args, **kwargs) (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1519, in execute_model (EngineCore_0 pid=480353) self._execute_mm_encoder(scheduler_output) (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 1168, in _execute_mm_encoder (EngineCore_0 pid=480353) curr_group_outputs = self.model.get_multimodal_embeddings( (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 627, in get_multimodal_embeddings (EngineCore_0 pid=480353) audio_embeddings = self._process_audio_input(multimodal_input) (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 577, in _process_audio_input (EngineCore_0 pid=480353) input_features = audio_input["input_features"].squeeze(1) (EngineCore_0 pid=480353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_0 pid=480353) AttributeError: 'list' object has no attribute 'squeeze' (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] AsyncLLM output_handler failed. (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] Traceback (most recent call last): (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/async_llm.py", line 412, in output_handler (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] outputs = await engine_core.get_output_async() (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core_client.py", line 843, in get_output_async (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] raise self._format_exception(outputs) from None (APIServer pid=479785) [vllm]ERROR 08-31 16:48:08 [async_llm.py:453] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (APIServer pid=479785) INFO: 127.0.0.1:34814 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34800 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34764 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34772 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34796 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34824 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34782 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34848 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) [vllm]INFO 08-31 16:48:08 [loggers.py:123] Engine 000: Avg prompt throughput: 58.8 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0% [rank0]:[W831 16:48:09.099020087 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=479785) INFO: 127.0.0.1:34824 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34782 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34800 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34848 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34796 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34764 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34772 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34814 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34824 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34848 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34772 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34782 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34796 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34764 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34800 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34814 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34822 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: 127.0.0.1:34838 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error (APIServer pid=479785) INFO: Shutting down (APIServer pid=479785) INFO: Waiting for application shutdown. (APIServer pid=479785) INFO: Application shutdown complete. (APIServer pid=479785) INFO: Finished server process [479785] (vln) ubuntu@ip-10-12-1-224:~/yashpratap/exp/vllm$ ``` </details> ### Before submitting a new issue... - [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Thanks

Topic		Replies	Views
SLURM errors: failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error; GPU:0 unknown device General Discussion keras , gpu , help_request	11	5009	August 21, 2021
Tensorflow freezes while trying to load or create model General Discussion models , install , gpu , help_request	3	2530	May 26, 2022
Nvidia-smi does not show ML tasks General Discussion gpu	11	2169	October 24, 2021
Deciphering memory allocation warnings General Discussion gpu , help_request , tfcore	2	7363	September 19, 2021
Help training MobileNetV3Small model on custom image classification Keras models , help_request	1	1787	December 21, 2021

Is the vLLM-compatible model Gemma‑3n‑E4B‑IT supported for audio concurrency?

Related topics