Illegal memory access due to tensor bigger than int32 elements

Pierre_Beaulieu · January 31, 2025, 2:41pm

Hi,

I’m using tensorflow==2.18.0 alongside tf_keras==2.18.0. I want to train a model on a huge dataset and I have a A100 80GB. When I use a batch size of 4096 I can train my model using only 20% of my GPUDRAM but when I double my batch size, I have tensors bigger than Int32 elements and I have illegal memory access error because some variable for example work_element_count are encoded as int32. Has someone ever faced this issue and have a fix please ? Do I need to rebuild tf from source with some compiler spec ?

Divya_Sree_Kayyuri · January 13, 2026, 12:27pm

Hi @Pierre_Beaulieu, The illegal memory access error with very large batch sizes occurs because int32 indexing overflows when the total number of elements in a tensor exceeds the maximum value of a 32-bit integer. Instead of rebuilding TensorFlow from source, you can try using Gradient Accumulation. Thanks!

Topic		Replies	Views
Memory issue when start fit a model Keras nvidia , tfconfig , tfkeras	1	72	July 18, 2024
CUDA 11.8 + TF 2.14 + cuDNN: Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered General Discussion cuda , tensorflow-data	1	283	March 18, 2024
Cpu goes out of memory when i increase batch size from 2048 to 4096 General Discussion gcp , help_request	1	429	January 17, 2024
Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered General Discussion gpu	1	1364	October 17, 2023
Problems with training a model on a dataset that doesn't fit into RAM memory General Discussion python , tfcore , tensorflow-data , tf_function	3	1038	November 29, 2023

Illegal memory access due to tensor bigger than int32 elements

Related topics