IMDB Example Getting Value Error when loading data source

Robert_Jaret · February 27, 2025, 4:52pm

Is there an easy solution for this?

ValueError Traceback (most recent call last)
in <cell line: 0>()
----> 1 (train_data, test_data), info = tfds.load(
2 “imdb_reviews/subwords8k”,
3 split=(tfds.Split.TRAIN, tfds.Split.TEST),
4 with_info=True,
5 as_supervised=True,

10 frames
/usr/local/lib/python3.11/dist-packages/tensorflow_datasets/core/dataset_builder.py in _create_builder_config(self, builder_config, version)
1360 builder_config = self.builder_configs.get(f"{name}:{version}")
1361 if builder_config is None:
→ 1362 raise ValueError(
1363 “BuilderConfig %s not found with version %s. Available: %s”
1364 % (name, version, list(self.builder_configs.keys()))

ValueError: Failed to construct dataset “imdb_reviews”, builder_kwargs “{‘config’: ‘subwords8k’, ‘data_dir’: None}”: BuilderConfig subwords8k not found with version None. Available: [‘plain_text’]

Biruk_Daniel · February 28, 2025, 4:29am

Yes! The error occurs because the imdb_reviews dataset does not have a subwords8k configuration. Instead, it only supports the plain_text configuration.
(train_data, test_data), info = tfds.load(
“imdb_reviews/subwords8k”,
split=(tfds.Split.TRAIN, tfds.Split.TEST),
with_info=True,
as_supervised=True,
)
to
(train_data, test_data), info = tfds.load(
“imdb_reviews/plain_text”, # Use “plain_text” instead of “subwords8k”
split=(tfds.Split.TRAIN, tfds.Split.TEST),
with_info=True,
as_supervised=True,
)
This should work because plain_text is the only available version of the dataset.

Robert_Jaret · February 28, 2025, 3:09pm

Thanks so much Daniel. That worked. I’m getting another error further down in the code though. Any quick solution to the incompatible shape objects?

(train_data, test_data), info = tfds.load(
“imdb_reviews/plain_text”,
split=(tfds.Split.TRAIN, tfds.Split.TEST),
with_info=True,
as_supervised=True,
)
encoder = info.features[“text”].encoder

Shuffle and pad the data.

train_batches = train_data.shuffle(1000).padded_batch(
10, padded_shapes=((None,), ())
)
test_batches = test_data.shuffle(1000).padded_batch(
10, padded_shapes=((None,), ())
)
train_batch, train_labels = next(iter(train_batches))

Giving this error:
Downloading and preparing dataset 80.23 MiB (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0…

Dl Completed…: 100%

1/1 [00:13<00:00, 13.98s/ url]

Dl Size…: 100%

80/80 [00:13<00:00, 5.61 MiB/s]

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.

ValueError Traceback (most recent call last)

in <cell line: 0>() 8 9 # Shuffle and pad the data. —> 10 train_batches = train_data.shuffle(1000).padded_batch( 11 10, padded_shapes=((None,), ()) 12 )

3 frames

/usr/local/lib/python3.11/dist-packages/tensorflow/python/data/ops/padded_batch_op.py in _padded_shape_to_tensor(padded_shape, input_component_shape) 121 if not _is_padded_shape_compatible_with(padded_shape_as_shape, 122 input_component_shape): → 123 raise ValueError(f"The padded shape {padded_shape_as_shape} is not " 124 f"compatible with the shape {input_component_shape} of " 125 f"the corresponding input component.")

ValueError: The padded shape (None,) is not compatible with the shape () of the corresponding input component.

Biruk_Daniel · February 28, 2025, 4:13pm

It is so hard i don’t think i can do it

Kiran_Sai_Ramineni · March 13, 2025, 11:40am

Hi @Robert_Jaret, If you see the shape of the data after loading the dataset it is like

for f,l in train_data.take(1):
  print(f.shape) #output: ()

while providing the padding shape with ((none,),()) does not match the shape of the feature causing the error. please try by using padded_shapes=((),()) or exclude that argument which will not produce the error. Thank You.

Topic		Replies	Views
Unbatching a tensor is only supported for rank >= 1 General Discussion datasets	2	1028	October 16, 2023
TfDataBuilder does not work with images General Discussion datasets , help_request	1	792	December 4, 2024
Tensorflow dataset has () shape General Discussion models , nlp , datasets , help_request	1	2308	May 12, 2022
Trouble using tfds load to vectorize text Keras datasets , epoc , tfkeras	2	254	February 6, 2024
AttributeError: Layer retrieval_1 has no inbound nodes General Discussion models , datasets , keras	2	1694	October 21, 2022

IMDB Example Getting Value Error when loading data source

Shuffle and pad the data.

3 frames

Related topics