IMDB Example Getting Value Error when loading data source

Is there an easy solution for this?


ValueError Traceback (most recent call last)
in <cell line: 0>()
----> 1 (train_data, test_data), info = tfds.load(
2 “imdb_reviews/subwords8k”,
3 split=(tfds.Split.TRAIN, tfds.Split.TEST),
4 with_info=True,
5 as_supervised=True,

10 frames
/usr/local/lib/python3.11/dist-packages/tensorflow_datasets/core/dataset_builder.py in _create_builder_config(self, builder_config, version)
1360 builder_config = self.builder_configs.get(f"{name}:{version}")
1361 if builder_config is None:
→ 1362 raise ValueError(
1363 “BuilderConfig %s not found with version %s. Available: %s”
1364 % (name, version, list(self.builder_configs.keys()))

ValueError: Failed to construct dataset “imdb_reviews”, builder_kwargs “{‘config’: ‘subwords8k’, ‘data_dir’: None}”: BuilderConfig subwords8k not found with version None. Available: [‘plain_text’]

1 Like

Yes! The error occurs because the imdb_reviews dataset does not have a subwords8k configuration. Instead, it only supports the plain_text configuration.
(train_data, test_data), info = tfds.load(
“imdb_reviews/subwords8k”,
split=(tfds.Split.TRAIN, tfds.Split.TEST),
with_info=True,
as_supervised=True,
)
to
(train_data, test_data), info = tfds.load(
“imdb_reviews/plain_text”, # Use “plain_text” instead of “subwords8k”
split=(tfds.Split.TRAIN, tfds.Split.TEST),
with_info=True,
as_supervised=True,
)
This should work because plain_text is the only available version of the dataset.

Thanks so much Daniel. That worked. I’m getting another error further down in the code though. Any quick solution to the incompatible shape objects?

(train_data, test_data), info = tfds.load(
“imdb_reviews/plain_text”,
split=(tfds.Split.TRAIN, tfds.Split.TEST),
with_info=True,
as_supervised=True,
)
encoder = info.features[“text”].encoder

Shuffle and pad the data.

train_batches = train_data.shuffle(1000).padded_batch(
10, padded_shapes=((None,), ())
)
test_batches = test_data.shuffle(1000).padded_batch(
10, padded_shapes=((None,), ())
)
train_batch, train_labels = next(iter(train_batches))

Giving this error:
Downloading and preparing dataset 80.23 MiB (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0…

Dl Completed…: 100%

1/1 [00:13<00:00, 13.98s/ url]

Dl Size…: 100%

80/80 [00:13<00:00, 5.61 MiB/s]

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.


ValueError Traceback (most recent call last)

in <cell line: 0>() 8 9 # Shuffle and pad the data. —> 10 train_batches = train_data.shuffle(1000).padded_batch( 11 10, padded_shapes=((None,), ()) 12 )


3 frames

/usr/local/lib/python3.11/dist-packages/tensorflow/python/data/ops/padded_batch_op.py in _padded_shape_to_tensor(padded_shape, input_component_shape) 121 if not _is_padded_shape_compatible_with(padded_shape_as_shape, 122 input_component_shape): → 123 raise ValueError(f"The padded shape {padded_shape_as_shape} is not " 124 f"compatible with the shape {input_component_shape} of " 125 f"the corresponding input component.")

ValueError: The padded shape (None,) is not compatible with the shape () of the corresponding input component.

1 Like

It is so hard i don’t think i can do it