The naive option is to use something like this:
import tensorflow as tf
import numpy as np
import collections
num_classes = 2
num_samples = 10000
data_np = np.random.choice(num_classes, num_samples)
y = collections.defaultdict(int)
for i in dataset:
cls, _ = i
y[cls.numpy()] += 1
Bhack
March 8, 2022, 2:44pm
3
If you are looking for a non-numpy solution there was a API request at:
https://github.com/tensorflow/datasets/issues/2902
Bhack
March 8, 2022, 2:51pm
4
With numpy you could use many solutions like:
Not sure how these methods would scale.
I would give this one a try:
Bhack
March 8, 2022, 3:21pm
6
It is doing something similar iterating over the full dataset but in c++:
// Iterate through the input dataset.
while (true) {
if (ctx->cancellation_manager()->IsCancelled()) {
return errors::Cancelled("Operation was cancelled");
}
std::vector<Tensor> next_input_element;
bool end_of_input;
TF_RETURN_IF_ERROR(
iterator->GetNext(&iter_ctx, &next_input_element, &end_of_input));
if (end_of_input) {
break;
}