Hello,
I have my features and labels saved in a .npz file (numpy arrays) that’s about 18GB. My GPU has 11GB RAM (although the CPU RAM is 120GB).
I’m trying to load the data as a tf.data.Dataset but not having any luck.
I’m using TensorFlow 2.14 and the below code:
import tensorflow as tf
import numpy as np
# load from npz file
data = np.load('231017_encoded_seqs_labels.npz')
encoded_sequences = data['encoded_sequences']
input_labels = data['input_labels']
# Create tf.data.Dataset from your data.
dataset = tf.data.Dataset.from_tensor_slices((encoded_sequences, input_labels))
Everything runs fine except the last line which gives the below error. Kindly assist.
2023-10-19 11:27:34.562484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10778 MB memory: -> device: 0, name: Tesla K40m, pci bus id: 0000:05:00.0, compute capability: 3.5
2023-10-19 11:27:34.563653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10778 MB memory: -> device: 1, name: Tesla K40m, pci bus id: 0000:81:00.0, compute capability: 3.5
2023-10-19 11:27:34.566228: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 19403961120 exceeds 10% of free system memory.
2023-10-19 11:27:59.419189: W tensorflow/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 18.07GiB (rounded to 19403961344)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-10-19 11:27:59.419249: I tensorflow/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc
2023-10-19 11:27:59.419273: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (256): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419289: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419303: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419317: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419331: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419344: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419358: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419371: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419385: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419398: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419411: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419425: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419439: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419453: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419466: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419480: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419495: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419509: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419522: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419535: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419549: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419566: I tensorflow/tsl/framework/bfc_allocator.cc:1062] Bin for 18.07GiB was 256.00MiB, Chunk State:
2023-10-19 11:27:59.419578: I tensorflow/tsl/framework/bfc_allocator.cc:1100] Summary of in-use Chunks by size:
2023-10-19 11:27:59.419589: I tensorflow/tsl/framework/bfc_allocator.cc:1107] Sum Total of in-use chunks: 0B
2023-10-19 11:27:59.419604: I tensorflow/tsl/framework/bfc_allocator.cc:1109] Total bytes in pool: 0 memory_limit_: 11301945344 available bytes: 11301945344 curr_region_allocation_bytes_: 11301945344
2023-10-19 11:27:59.419622: I tensorflow/tsl/framework/bfc_allocator.cc:1114] Stats:
Limit: 11301945344
InUse: 0
MaxInUse: 0
NumAllocs: 0
MaxAllocSize: 0
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2023-10-19 11:27:59.419639: W tensorflow/tsl/framework/bfc_allocator.cc:497] <allocator contains no memory>
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
Cell In[6], line 2
1 # Create tf.data.Dataset from your data.
----> 2 dataset = tf.data.Dataset.from_tensor_slices((encoded_sequences, input_labels))
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py:821, in DatasetV2.from_tensor_slices(tensors, name)
817 # Loaded lazily due to a circular dependency (dataset_ops ->
818 # from_tensor_slices_op -> dataset_ops).
819 # pylint: disable=g-import-not-at-top,protected-access
820 from tensorflow.python.data.ops import from_tensor_slices_op
--> 821 return from_tensor_slices_op._from_tensor_slices(tensors, name)
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py:25, in _from_tensor_slices(tensors, name)
24 def _from_tensor_slices(tensors, name=None):
---> 25 return _TensorSliceDataset(tensors, name=name)
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py:33, in _TensorSliceDataset.__init__(self, element, is_files, name)
31 def __init__(self, element, is_files=False, name=None):
32 """See `Dataset.from_tensor_slices` for details."""
---> 33 element = structure.normalize_element(element)
34 batched_spec = structure.type_spec_from_value(element)
35 self._tensors = structure.to_batched_tensor_list(batched_spec, element)
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py:134, in normalize_element(element, element_signature)
131 else:
132 dtype = getattr(spec, "dtype", None)
133 normalized_components.append(
--> 134 ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
135 return nest.pack_sequence_as(pack_as, normalized_components)
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/profiler/trace.py:183, in trace_wrapper.<locals>.inner_wrapper.<locals>.wrapped(*args, **kwargs)
181 with Trace(trace_name, **trace_kwargs):
182 return func(*args, **kwargs)
--> 183 return func(*args, **kwargs)
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:698, in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
696 # TODO(b/142518781): Fix all call-sites and remove redundant arg
697 preferred_dtype = preferred_dtype or dtype_hint
--> 698 return tensor_conversion_registry.convert(
699 value, dtype, name, as_ref, preferred_dtype, accepted_result_types
700 )
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/tensor_conversion_registry.py:234, in convert(value, dtype, name, as_ref, preferred_dtype, accepted_result_types)
225 raise RuntimeError(
226 _add_error_prefix(
227 f"Conversion function {conversion_func!r} for type "
(...)
230 f"actual = {ret.dtype.base_dtype.name}",
231 name=name))
233 if ret is None:
--> 234 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
236 if ret is NotImplemented:
237 continue
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:328, in _constant_tensor_conversion_function(v, dtype, name, as_ref)
325 def _constant_tensor_conversion_function(v, dtype=None, name=None,
326 as_ref=False):
327 _ = as_ref
--> 328 return constant(v, dtype=dtype, name=name)
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:267, in constant(value, dtype, shape, name)
170 @tf_export("constant", v1=[])
171 def constant(value, dtype=None, shape=None, name="Const"):
172 """Creates a constant tensor from a tensor-like object.
173
174 Note: All eager `tf.Tensor` values are immutable (in contrast to
(...)
265 ValueError: if called on a symbolic tensor.
266 """
--> 267 return _constant_impl(value, dtype, shape, name, verify_shape=False,
268 allow_broadcast=True)
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:279, in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
277 with trace.Trace("tf.constant"):
278 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 279 return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
281 const_tensor = ops._create_graph_constant( # pylint: disable=protected-access
282 value, dtype, shape, name, verify_shape, allow_broadcast
283 )
284 return const_tensor
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:289, in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
287 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
288 """Creates a constant on the current device."""
--> 289 t = convert_to_eager_tensor(value, ctx, dtype)
290 if shape is None:
291 return t
File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)
100 dtype = dtypes.as_dtype(dtype).as_datatype_enum
101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.