Hello im working training to train a CNN with two datasets that I label manually with negative and positive.(80x60 depth images in each matrix)
# dimensions of our images.
img_width, img_height = 80, 60
n_positives_img, n_negatives_img = 17874, 26308
n_total_img = 44182
#Imports of datasets inside Drive
ds_negatives = np.loadtxt('/content/drive/MyDrive/Colab Notebooks/negative_depth.txt')
ds_positives = np.loadtxt('/content/drive/MyDrive/Colab Notebooks/positive_depth.txt')
#Labeled arrays for datasets
arrayceros = np.zeros(n_negatives_img)
arrayunos = np.ones(n_positives_img)
#Reshaping of datasets to convert separate them
arraynegativos= ds_negatives.reshape(( n_negatives_img, img_width, img_height))
arraypositivos= ds_positives.reshape((n_positives_img, img_width, img_height))
#Labeling datasets with the arrays
ds_negatives_target = tf.data.Dataset.from_tensor_slices((arraynegativos, arrayceros))
ds_positives_target = tf.data.Dataset.from_tensor_slices((arraypositivos, arrayunos))
#Concatenate 2 datasets and shuffle them
ds_concatenate = ds_negatives_target.concatenate(ds_positives_target)
datasetfinal = ds_concatenate.shuffle(n_total_img)
But when I try to separate my dataset 80/20 to validate my CNN:
trainingdataset, validatedataset = train_test_split(datasetfinal, test_size=0.2, random_state=25)
I get this error:
TypeError: Singleton array arrayshapes: ((80, 60), ()), types: (tf.float64, tf.float64)>, dtype=object) cannot be considered a valid collection.
Any ideas? Thank in advance!!!
It’s impossible to split tensorflow dataset object by passing it to train_test_split from sklearn. You can choose the number of validation samples, which should be int number, and use the following example:
valid_ds = datasetfinal.take(n_samples)
train_ds = datasetfinal.skip(n_samples)
It does what the method says: takes first n_samples from the dataset and skips all the rest or skips the first n_samples and takes all the rest.
3 Likes
I fixed that but then when i build the model and i fit it:
model = Sequential()
model.add(Conv2D(5, kernel_size=(5, 5),activation='linear',input_shape=(80,60,1),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2),padding='same'))
model.add(Conv2D(5, (5, 5), activation='linear',padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
model.add(Conv2D(5, (5, 5), activation='linear',padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
model.add(Flatten())
model.add(Dense(100, activation='linear'))
model.add(Dense(1, activation='linear'))
#Compiling model
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
model.fit(train_ds, validation_data=valid_ds, batch_size=32, epochs=10)
I get this error:
ValueError: Input 0 of layer sequential_6 is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: (80, 60)
If your data has shape (80, 60), the input layer of the network should also have this input shape. But you defined input_shape=(80, 60, 1), which is 3D.
When the error says that it expected 4 dimensions, it means that it expects 3D input as you defined it + batch dimension of the dataset. You probably did not apply .batch(batch_size) method to your datasets before passing them to the model.
You can read about preparing data and various dataset methods here (tf.data:构建 TensorFlow 输入流水线 | TensorFlow Core). You could probably benefit from using .cache() method as well.
1 Like
I have followed your reply and i could have trained my CNN correctly but now that i want to do confusion matrix of my results, its impossible for me because i dont have my tensorflow dataset separated in x_ds and y_ds with the labeled array, So i cant compared it with the predicted one.
How could i do that?
THANKS!
To extract validation labels from a batched dataset you can do this:
valid_labels = list(valid_ds.flat_map(lambda x, y: tf.data.Dataset.from_tensor_slices((x, y))).as_numpy_iterator())
valid_labels = [y for x, y in valid_labels]
Probably, there is some easier way to do it. But now I can’t think of anything else.
Then you get predicted values from the model:
pred_labels = model.predict(valid_ds)
And use this to get a confusion matrix:
conf_m = tf.math.confusion_matrix(valid_labels, pred_labels)
1 Like
It gives me this error:
InvalidArgumentError: `predictions` contains negative values
Condition x >= 0 did not hold element-wise:
x (shape=(20000, 10) dtype=int64) =
['3', '3', '-8', '...']
It has no sense because my labels are just 0 and 1.
This iwhat pred_labels has
[[ 6.11138 2.9243512 -11.660926 … -11.982912 -12.400366
-12.061557 ]
[ 6.1406865 2.6330147 -11.452074 … -11.73517 -12.167985
-11.924534 ]
[ 6.0676413 2.5402145 -11.498982 … -11.899355 -12.084745
-11.687552 ]
…
[ 6.1307297 2.6329107 -11.447449 … -11.732571 -12.161408
-11.918484 ]
[ 6.056893 3.654493 -10.960058 … -11.586772 -11.884876
-11.678584 ]
[ 6.1401978 2.6324804 -11.449804 … -11.733837 -12.1662035
-11.92291 ]]
It seems to be one image not predicted labels
Check the final layer of the model. It should have an activation suitable to classification task (sigmoid in this case) and 1 neuron. Loss function should be BinaryCrossentropy.
You can see this example of a binary image classification:
Firstly, thank you that worked!!
But, this is the confusion matrix i get
[[7829 4138]
[5346 2687]]
And this is the precision 52% and accuracy 39%.
Why do i get this values if my model return me with model.evaluate and fit around 93% of acc?