The naive option is to use something like this:

```
import tensorflow as tf
import numpy as np
import collections
num_classes = 2
num_samples = 10000
data_np = np.random.choice(num_classes, num_samples)
y = collections.defaultdict(int)
for i in dataset:
cls, _ = i
y[cls.numpy()] += 1
```

Bhack
March 8, 2022, 2:44pm
3
If you are looking for a non-numpy solution there was a API request at:

https://github.com/tensorflow/datasets/issues/2902

Bhack
March 8, 2022, 2:51pm
4
With numpy you could use many solutions like:

Not sure how these methods would scale.

I would give this one a try:

Bhack
March 8, 2022, 3:21pm
6
It is doing something similar iterating over the full dataset but in c++:

```
```
// Iterate through the input dataset.
while (true) {
if (ctx->cancellation_manager()->IsCancelled()) {
return errors::Cancelled("Operation was cancelled");
}
std::vector<Tensor> next_input_element;
bool end_of_input;
TF_RETURN_IF_ERROR(
iterator->GetNext(&iter_ctx, &next_input_element, &end_of_input));
if (end_of_input) {
break;
}