Hello,
I’m a relative newcomer to TF and especially GPU computing. I’m trying to make a NLP model for text classification, but I’m struggling with GPU memory allocation on recent TensorFlow releases. I already had a model mostly working on TF 2.4.1, but on later versions, it crashes because the GPU runs out of memory. I tried to narrow it down to some minimal examples that I’m presenting here. It feels like it could be a regression in TF 2.5 or 2.6 that I maybe should report as a bug, but I want to make sure first that I’m not making some simple mistake.
My machine is an Asus laptop with 16GB RAM and an integrated GeForece MX150 GPU with 2GB VRAM. This is not a very powerful GPU, but nevertheless I managed to run a version of Stable Diffusion on it, and for neural computing the GPU appears to be significantly faster than the i7-8550U CPU. I’m using Ubuntu Linux 20.04 and Python 3.9.15 installed via miniconda. I’m trying different versions of the tensorflow-gpu package that I’ve installed in separate conda environments. I’ve installed NVidia driver version 515.76. The GPU is only used for computing, not for graphics.
The problem appears when using large NumPy arrays as training data. The whole array doesn’t fit to GPU VRAM at once, but my understanding is that only a single batch should need to go into VRAM at a time. Here is a simple but large data set (all zeros) and a toy Keras model with very few parameters:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# 4GB array
X = np.zeros((1024*1024, 1024), dtype=np.float32)
# 4MB array
Y = np.ones(1024*1024, dtype=np.float32)
# define a toy model (linear regression, 1025 parameters)
model = Sequential()
model.add(Dense(1, input_shape=(1024,), activation='linear'))
model.compile(loss='mean_squared_error')
model.summary()
model.fit(X, Y, batch_size=32)
I first tried running this under TF 2.4.1 which is available in the conda default repo. It runs just fine. Here is the output:
Keras model TF 2.4.1 output
2022-12-12 22:09:37.062352: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-12 22:09:37.898164: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-12-12 22:09:37.898926: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-12-12 22:09:37.930082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:37.930474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce MX150 computeCapability: 6.1
coreClock: 1.5315GHz coreCount: 3 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 44.76GiB/s
2022-12-12 22:09:37.930511: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-12 22:09:37.932318: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-12-12 22:09:37.932411: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2022-12-12 22:09:37.934061: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2022-12-12 22:09:37.934470: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2022-12-12 22:09:37.936205: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2022-12-12 22:09:37.937265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2022-12-12 22:09:37.940598: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2022-12-12 22:09:37.940717: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:37.941120: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:37.941426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2022-12-12 22:09:37.941726: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-12 22:09:37.942046: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-12-12 22:09:37.942131: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:37.942437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce MX150 computeCapability: 6.1
coreClock: 1.5315GHz coreCount: 3 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 44.76GiB/s
2022-12-12 22:09:37.942456: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-12 22:09:37.942471: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2022-12-12 22:09:37.942481: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2022-12-12 22:09:37.942491: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2022-12-12 22:09:37.942500: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2022-12-12 22:09:37.942510: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2022-12-12 22:09:37.942519: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2022-12-12 22:09:37.942531: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2022-12-12 22:09:37.942574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:37.942891: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:37.943178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2022-12-12 22:09:37.943206: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2022-12-12 22:09:38.390832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-12-12 22:09:38.390870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2022-12-12 22:09:38.390876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2022-12-12 22:09:38.391072: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:38.391246: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:38.391383: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:09:38.391506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1632 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1)
2022-12-12 22:09:38.431914: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 4294967296 exceeds 10% of free system memory.
2022-12-12 22:09:41.416372: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-12-12 22:09:41.433482: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 1999965000 Hz
2022-12-12 22:09:41.629588: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 1025
=================================================================
Total params: 1,025
Trainable params: 1,025
Non-trainable params: 0
_________________________________________________________________
32768/32768 [==============================] - 32s 976us/step - loss: 0.0534
TF 2.4 is quite old, so I wanted to try more recent TF releases that are available from conda-forge. I quickly ran into problems with all the ones I tested (2.6.2, 2.7.1, 2.8.1, 2.10.0). Here is the output of the same script on 2.6.2 (other versions are similar):
Keras model TF 2.6.2 output
2022-12-12 22:15:32.102254: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.127075: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.127450: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.127922: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-12 22:15:32.128265: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.128457: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.128765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.649401: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.649581: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.649722: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-12 22:15:32.649845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1612 MB memory: -> device: 0, name: NVIDIA GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1
2022-12-12 22:15:32.689420: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 4294967296 exceeds 10% of free system memory.
2022-12-12 22:15:45.639836: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.00GiB (rounded to 4294967296)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2022-12-12 22:15:45.639923: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2022-12-12 22:15:45.639958: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256): Total Chunks: 8, Chunks in use: 8. 2.0KiB allocated for chunks. 2.0KiB in use in bin. 40B client-requested in use in bin.
2022-12-12 22:15:45.639985: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640011: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2022-12-12 22:15:45.640093: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640152: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096): Total Chunks: 2, Chunks in use: 1. 11.0KiB allocated for chunks. 4.0KiB in use in bin. 4.0KiB client-requested in use in bin.
2022-12-12 22:15:45.640194: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640234: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640294: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640339: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640392: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640445: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640496: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640539: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640576: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640614: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640661: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640709: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640766: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640812: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640855: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640918: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456): Total Chunks: 1, Chunks in use: 0. 1.57GiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-12 22:15:45.640966: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 4.00GiB was 256.00MiB, Chunk State:
2022-12-12 22:15:45.641025: I tensorflow/core/common_runtime/bfc_allocator.cc:1033] Size: 1.57GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 4.0KiB | Requested Size: 4.0KiB | in_use: 1 | bin_num: -1
2022-12-12 22:15:45.641060: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 1690894336
2022-12-12 22:15:45.641092: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000000 of size 256 next 1
2022-12-12 22:15:45.641122: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000100 of size 1280 next 2
2022-12-12 22:15:45.641158: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000600 of size 256 next 3
2022-12-12 22:15:45.641189: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000700 of size 256 next 4
2022-12-12 22:15:45.641231: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000800 of size 256 next 5
2022-12-12 22:15:45.641264: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000900 of size 256 next 6
2022-12-12 22:15:45.641307: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000a00 of size 256 next 9
2022-12-12 22:15:45.641347: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000b00 of size 256 next 10
2022-12-12 22:15:45.641379: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea000c00 of size 256 next 11
2022-12-12 22:15:45.641416: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free at 7f31ea000d00 of size 7168 next 7
2022-12-12 22:15:45.641446: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7f31ea002900 of size 4096 next 8
2022-12-12 22:15:45.641475: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free at 7f31ea003900 of size 1690879744 next 18446744073709551615
2022-12-12 22:15:45.641502: I tensorflow/core/common_runtime/bfc_allocator.cc:1065] Summary of in-use Chunks by size:
2022-12-12 22:15:45.641537: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 8 Chunks of size 256 totalling 2.0KiB
2022-12-12 22:15:45.641568: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2022-12-12 22:15:45.641600: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 4096 totalling 4.0KiB
2022-12-12 22:15:45.641644: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 7.2KiB
2022-12-12 22:15:45.641676: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 1690894336 memory_limit_: 1690894336 available bytes: 0 curr_region_allocation_bytes_: 3381788672
2022-12-12 22:15:45.641722: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit: 1690894336
InUse: 7424
MaxInUse: 14336
NumAllocs: 13
MaxAllocSize: 4096
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2022-12-12 22:15:45.641759: W tensorflow/core/common_runtime/bfc_allocator.cc:468] *___________________________________________________________________________________________________
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 1025
=================================================================
Total params: 1,025
Trainable params: 1,025
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
File "/home/myuser/proj/ml-wordemb/./test-keras-fit.py", line 18, in <module>
model.fit(X, Y, batch_size=32)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/keras/engine/training.py", line 1134, in fit
data_handler = data_adapter.get_data_handler(
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/keras/engine/data_adapter.py", line 1383, in get_data_handler
return DataHandler(*args, **kwargs)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/keras/engine/data_adapter.py", line 1138, in __init__
self._adapter = adapter_cls(
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/keras/engine/data_adapter.py", line 230, in __init__
x, y, sample_weights = _process_tensorlike((x, y, sample_weights))
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/keras/engine/data_adapter.py", line 1031, in _process_tensorlike
inputs = tf.nest.map_structure(_convert_numpy_and_scipy, inputs)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 869, in map_structure
structure[0], [func(*x) for x in entries],
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/util/nest.py", line 869, in <listcomp>
structure[0], [func(*x) for x in entries],
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/keras/engine/data_adapter.py", line 1026, in _convert_numpy_and_scipy
return tf.convert_to_tensor(x, dtype=dtype)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1430, in convert_to_tensor_v2_with_dispatch
return convert_to_tensor_v2(
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1436, in convert_to_tensor_v2
return convert_to_tensor(
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
return func(*args, **kwargs)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 1566, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py", line 271, in constant
return _constant_impl(value, dtype, shape, name, verify_shape=False,
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py", line 283, in _constant_impl
return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py", line 308, in _constant_eager_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/home/myuser/miniconda3/envs/tf-2.6.2/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py", line 106, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
The key error message seems to be this line:
2022-12-12 22:15:45.639836: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.00GiB (rounded to 4294967296)requested by op _EagerConst
Apparently TF 2.6.2 (and later) is trying to copy the whole 4GB array into VRAM, and obviously it won’t fit. Why is the behavior different from TF 2.4.1? Has something changed, is there a setting I need to change to enable batching instead of copying everything at once?
I also tried tf.data.Dataset, but this post is already getting too long, so I’ll put it in a follow-up reply.
Thanks in advance,
Osma