Thanks for responding, Chunduriv. Any input on what I’m doing wrong is welcome.
I’m expecting improvements in speed. Right now the training is taking 15 minutes per epoch regardless of whether I pass it in as RGB or grayscale. Pretty sure I’m doing something wrong.
The images are actually signatures, so the pixels are binary “did the pen touch the paper” values. Signatures are either “good” or “bad,” so two classes. I receive them in RGB PNG, but all three channels are identical, and either 0 or 255.
I translated the corpus translated to grayscale PNG in a separate directory. Each image is 90x800.
The samples are in directories:
data_dir/
good/
img1.png
img2.png
bad/
img3.png
img4.png
Here are the parts of the code that I think matter.
color_mode is either “grayscale” or “rgb”
input_shape is either (32, 90, 800, 1) or not included so it defaults
train_ds, val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir, validation_split=0.2, subset="both",
color_mode="grayscale", seed=123, image_size=(90, 800))
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
model = Sequential([
layers.Rescaling(1./255, input_shape=(90, 800, 1)),
layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 90, 800, 1)),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu', input_shape=(32, 90, 800, 1)),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu', input_shape=(32, 90, 800, 1)),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])