Unable to Load Large 18GB Dataset (Numpy Array) with 11GB GPU


I have my features and labels saved in a .npz file (numpy arrays) that’s about 18GB. My GPU has 11GB RAM (although the CPU RAM is 120GB).

I’m trying to load the data as a tf.data.Dataset but not having any luck.

I’m using TensorFlow 2.14 and the below code:

import tensorflow as tf
import numpy as np

# load from npz file
data = np.load('231017_encoded_seqs_labels.npz')
encoded_sequences = data['encoded_sequences']
input_labels = data['input_labels']

# Create tf.data.Dataset from your data.
dataset = tf.data.Dataset.from_tensor_slices((encoded_sequences, input_labels))

Everything runs fine except the last line which gives the below error. Kindly assist.

2023-10-19 11:27:34.562484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10778 MB memory:  -> device: 0, name: Tesla K40m, pci bus id: 0000:05:00.0, compute capability: 3.5
2023-10-19 11:27:34.563653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10778 MB memory:  -> device: 1, name: Tesla K40m, pci bus id: 0000:81:00.0, compute capability: 3.5
2023-10-19 11:27:34.566228: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 19403961120 exceeds 10% of free system memory.
2023-10-19 11:27:59.419189: W tensorflow/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 18.07GiB (rounded to 19403961344)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2023-10-19 11:27:59.419249: I tensorflow/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc
2023-10-19 11:27:59.419273: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (256): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419289: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (512): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419303: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1024): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419317: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2048): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419331: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419344: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (8192): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419358: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419371: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (32768): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419385: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419398: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (131072): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419411: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (262144): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419425: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419439: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419453: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2097152): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419466: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419480: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419495: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (16777216): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419509: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419522: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (67108864): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419535: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419549: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419566: I tensorflow/tsl/framework/bfc_allocator.cc:1062] Bin for 18.07GiB was 256.00MiB, Chunk State: 
2023-10-19 11:27:59.419578: I tensorflow/tsl/framework/bfc_allocator.cc:1100]      Summary of in-use Chunks by size: 
2023-10-19 11:27:59.419589: I tensorflow/tsl/framework/bfc_allocator.cc:1107] Sum Total of in-use chunks: 0B
2023-10-19 11:27:59.419604: I tensorflow/tsl/framework/bfc_allocator.cc:1109] Total bytes in pool: 0 memory_limit_: 11301945344 available bytes: 11301945344 curr_region_allocation_bytes_: 11301945344
2023-10-19 11:27:59.419622: I tensorflow/tsl/framework/bfc_allocator.cc:1114] Stats: 
Limit:                     11301945344
InUse:                               0
MaxInUse:                            0
NumAllocs:                           0
MaxAllocSize:                        0
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2023-10-19 11:27:59.419639: W tensorflow/tsl/framework/bfc_allocator.cc:497] <allocator contains no memory>
InternalError                             Traceback (most recent call last)
Cell In[6], line 2
      1 # Create tf.data.Dataset from your data.
----> 2 dataset = tf.data.Dataset.from_tensor_slices((encoded_sequences, input_labels))

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py:821, in DatasetV2.from_tensor_slices(tensors, name)
    817 # Loaded lazily due to a circular dependency (dataset_ops ->
    818 # from_tensor_slices_op -> dataset_ops).
    819 # pylint: disable=g-import-not-at-top,protected-access
    820 from tensorflow.python.data.ops import from_tensor_slices_op
--> 821 return from_tensor_slices_op._from_tensor_slices(tensors, name)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py:25, in _from_tensor_slices(tensors, name)
     24 def _from_tensor_slices(tensors, name=None):
---> 25   return _TensorSliceDataset(tensors, name=name)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py:33, in _TensorSliceDataset.__init__(self, element, is_files, name)
     31 def __init__(self, element, is_files=False, name=None):
     32   """See `Dataset.from_tensor_slices` for details."""
---> 33   element = structure.normalize_element(element)
     34   batched_spec = structure.type_spec_from_value(element)
     35   self._tensors = structure.to_batched_tensor_list(batched_spec, element)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py:134, in normalize_element(element, element_signature)
    131       else:
    132         dtype = getattr(spec, "dtype", None)
    133         normalized_components.append(
--> 134             ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
    135 return nest.pack_sequence_as(pack_as, normalized_components)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/profiler/trace.py:183, in trace_wrapper.<locals>.inner_wrapper.<locals>.wrapped(*args, **kwargs)
    181   with Trace(trace_name, **trace_kwargs):
    182     return func(*args, **kwargs)
--> 183 return func(*args, **kwargs)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:698, in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
    696 # TODO(b/142518781): Fix all call-sites and remove redundant arg
    697 preferred_dtype = preferred_dtype or dtype_hint
--> 698 return tensor_conversion_registry.convert(
    699     value, dtype, name, as_ref, preferred_dtype, accepted_result_types
    700 )

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/tensor_conversion_registry.py:234, in convert(value, dtype, name, as_ref, preferred_dtype, accepted_result_types)
    225       raise RuntimeError(
    226           _add_error_prefix(
    227               f"Conversion function {conversion_func!r} for type "
    230               f"actual = {ret.dtype.base_dtype.name}",
    231               name=name))
    233 if ret is None:
--> 234   ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    236 if ret is NotImplemented:
    237   continue

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:328, in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    325 def _constant_tensor_conversion_function(v, dtype=None, name=None,
    326                                          as_ref=False):
    327   _ = as_ref
--> 328   return constant(v, dtype=dtype, name=name)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:267, in constant(value, dtype, shape, name)
    170 @tf_export("constant", v1=[])
    171 def constant(value, dtype=None, shape=None, name="Const"):
    172   """Creates a constant tensor from a tensor-like object.
    174   Note: All eager `tf.Tensor` values are immutable (in contrast to
    265     ValueError: if called on a symbolic tensor.
    266   """
--> 267   return _constant_impl(value, dtype, shape, name, verify_shape=False,
    268                         allow_broadcast=True)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:279, in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    277     with trace.Trace("tf.constant"):
    278       return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 279   return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    281 const_tensor = ops._create_graph_constant(  # pylint: disable=protected-access
    282     value, dtype, shape, name, verify_shape, allow_broadcast
    283 )
    284 return const_tensor

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:289, in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    287 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
    288   """Creates a constant on the current device."""
--> 289   t = convert_to_eager_tensor(value, ctx, dtype)
    290   if shape is None:
    291     return t

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)
    100     dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

Hey @Felix_M
Can you refactor your code a little to use tf.data.Dataset? This shall help with memory-related issues.
You can follow this tutorial Build TensorFlow input pipelines to get started if you are not yet familiar with it.
Thank you.

That’s what my code is trying to do.

As I mentioned above, it’s the tf.data.Dataset line of code that gives the error.

The TF docs mentions some kind of 2 GB limit on NumPy arrays.

Have you tried reading the file directly into a TF Dataset? There’s this library called TF IO that’s pretty nice: tfio.experimental.IODataset  |  TensorFlow I/O

Looks interesting.

I’ll have a look and see if that’s the way to go.


Cool - let me know if it works.