Mismatch in input and output size of NN Exception?

I am training an encoder neural network to encode/decode images of shape None,100,100,3 with tensorflow (batch,height,width,channels). I want to output both the reconstructed image and the coded image (output of the encoder part of the nn) but when i fit the model an exception occurs. Stating:

  • ValueError: Shapes (None, 100, 100, 3) and (None, 25, 25, 3) are incompatible File "/tmp/autograph_generated_fileh4snkmtb.py", line 15, in tf__train_function retval = ag_.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) ValueError: Shapes (None, 100, 100, 3) and (None, 25, 25, 3) are incompatible*

This is my NN:

def custom_architecture(input_shape):
    inputs = input_shape
    # Encoder
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(inputs)
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(conv1)
    pool1 = MaxPooling2D((2, 2), padding='same')(conv1)
    
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(pool1)
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(conv2)
    pool2 = MaxPooling2D((2, 2), padding='same')(conv2)
    
    # Bottleneck
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(pool2)
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(conv3)
    
    # Decoder
    upconv2 = UpSampling2D((2, 2))(conv3)
    conv2_upsampled = Conv2D(128, (1, 1), activation='relu', padding='same')(conv2)
    concat2 = Concatenate(axis=-1)([upconv2, conv2_upsampled])
    conv4 = Conv2D(128, (3, 3), activation='relu', padding='same')(concat2)
    conv4 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv4)
    
    upconv1 = UpSampling2D((2, 2))(conv4)
    conv1_upsampled = Conv2D(64, (1, 1), activation='relu', padding='same')(conv1)
    concat1 = Concatenate(axis=-1)([upconv1, conv1_upsampled])
    conv5 = Conv2D(64, (3, 3), activation='relu', padding='same')(concat1)
    conv5 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv5)
    
    # Output layer
    #outputs = Conv2D(1, (1, 1), activation='sigmoid', padding='same')(repeat_gap)
    conv_output = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(conv5)
    outputs = conv_output#layers.Flatten()(conv_output)
    
    # Create the model    
    model = Model(inputs, [outputs,conv3])
    
    return model

this is the part where i call the compile and fit methods:

   tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
        model.compile(optimizer='adam',loss=['binary_crossentropy',None], metrics=['accuracy'],loss_weights=[1.0,0.0])
        history = model.fit(train_ds,  epochs = 1001, batch_size=32, shuffle=True,callbacks = [tensorboard_callback])

I do not understand the problem, if i understand correctly you are supposed to be able to return more than one output, and this output should not be mandatory to match the input size. I mean the problem is that it is trying to compare it to compute the loss and metrics but i have stated ‘None’ so it doesnt do it for the second output (output of the encoder).

I have tried to change the loss function, adding None as function for the second output, and also return the encoder ouput with the same size of the input and treat it afterwards but this last option is not desirable as it outputs a very big chunk of information and i need to reduce it (the nn treats chunks of 100x100 called patches, for an image of size X,X)

Hi @adrianferal_is ,

I have checled the some of the resources/issues/discussions with experts of mine and came to below observation:

You specified the loss function as binary-crossentropy for the reconstruted image and None for the encodre output. it should comput the loss only for the reconstructed image, and not for the encoder output. However, TensorFlow still performs an internal shape compatibility check for all outputs, even if you don’t specify a loss function for one. This ensures the outputs can be procesed togethr during calculation like backpopagation. In your case, the reconstructed image and encoded representation have mismatched shaps, causing the erro.

May be you can define your custom loss function ,even if you don’t weight it during training
so that it will allow you to mainatain control over the training or Modify the output shape of encoder.

Thanks.

How do i define my own custom_loss, i tried the following:

def custom_loss(y_true, y_pred1):
    # Compute binary cross-entropy loss for the first output
    bce = tf.keras.losses.BinaryCrossentropy()
    loss1 = bce(y_true, y_pred1)
    
    return loss1

def custom_architectureV2(input_shape):
    inputs = input_shape
    # Encoder
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(inputs)
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(conv1)
    pool1 = MaxPooling2D((2, 2), padding='same')(conv1)
    
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(pool1)
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(conv2)
    pool2 = MaxPooling2D((2, 2), padding='same')(conv2)
    
    # Bottleneck
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(pool2)
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(conv3)
    
    # Flatten and reduce dimensionality
    flat = Flatten()(conv3)
    avg_pool = GlobalAveragePooling2D()(conv3)
    
    # Output layer
    conv_output = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(conv3)
    outputs = conv_output
    
    # Create the model
    model = Model(inputs, [outputs, avg_pool])
    
    return model

in the variable inspector i see that y_pred and y_true have the following aspect:
y_pred1-> <tf.tensor ‘model/conv2d_6/Sigmoid:0’ shape(none,25,25,3),dtype:float32>
y_true-> <tf.tensor ‘IteratorGoNext:1’ shape(none,100,100,3),dtype:float32>

Still i get an error:
ValueError: logits and labels must have the same shape, received ((None, 25, 25, 3) vs (None, 100, 100, 3)).
ValueError: Shapes (None, 100, 100, 3) and (None, 25, 25, 3) are incompatible

My guess is that for some reason it is interpreting that my output list the first item is teh y_true and the second the label which doesnt make sense to me at all. I mean why does it do that?, they are different outputs

Pls help!

Official TensorFlow 2.16 + Python 3.12 Documentation via JARaaS Hybrid RAG - 6/17/2024
Note: Sources on the response.

The error you are encountering arises because the shapes of the outputs for your model are not matching the shapes expected during the computation of the loss function. Specifically, your final output conv_output has the shape (None, 100, 100, 3) while the intermediate output conv3 (from the bottleneck layer) has the shape (None, 25, 25, 3). TensorFlow is trying to compute losses for both outputs, but the shapes are incompatible.

Here’s a detailed breakdown of the issues and how to resolve them:

Issues:

  1. Incompatible Shapes:

    • The final output matches (None, 100, 100, 3).
    • The intermediate output (encoder output) matches (None, 25, 25, 3).
  2. Loss Specification:

    • You specified None for the loss function of the encoder output. This might not be correctly passed, leading to the error.

Suggested Solution:

  1. Change the Model Architecture Slightly:

    • Make sure the Conv2D layers in the decoder mirror the encoder layers more closely regarding the shape changes.
  2. Correct the Loss Function:

    • Specify the loss function appropriately, or ensure the second output’s loss is not being calculated.

Modified Code:

Here’s a refined version of your code that should address these issues:

import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Concatenate, Input
from tensorflow.keras.models import Model

def custom_architecture(input_shape):
    inputs = Input(shape=input_shape)
    
    # Encoder
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(inputs)
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(conv1)
    pool1 = MaxPooling2D((2, 2), padding='same')(conv1)
    
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(pool1)
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(conv2)
    pool2 = MaxPooling2D((2, 2), padding='same')(conv2)
    
    # Bottleneck
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(pool2)
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(conv3)
    
    # Decoder
    upconv2 = UpSampling2D((2, 2))(conv3)
    conv2_upsampled = Conv2D(128, (1, 1), activation='relu', padding='same')(conv2)
    concat2 = Concatenate(axis=-1)([upconv2, conv2_upsampled])
    conv4 = Conv2D(128, (3, 3), activation='relu', padding='same')(concat2)
    conv4 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv4)
    
    upconv1 = UpSampling2D((2, 2))(conv4)
    conv1_upsampled = Conv2D(64, (1, 1), activation='relu', padding='same')(conv1)
    concat1 = Concatenate(axis=-1)([upconv1, conv1_upsampled])
    conv5 = Conv2D(64, (3, 3), activation='relu', padding='same')(concat1)
    conv5 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv5)
    
    # Output layer
    conv_output = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(conv5)
    outputs = conv_output
    
    # Create the model, notice we are not using the bottleneck output here as we don't compute its loss
    model = Model(inputs, outputs)
    
    return model

input_shape = (100, 100, 3)
model = custom_architecture(input_shape)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Assuming train_ds is your dataset
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir="logs", histogram_freq=1)
history = model.fit(train_ds, epochs=1001, batch_size=32, shuffle=True, callbacks=[tensorboard_callback])

Explanation:

  1. Model Architecture:

    • The encoder and decoder parts are structured to handle the dimensions correctly.
    • The bottleneck (conv3) is not included in the output of the model since it does not require calculation of its loss.
  2. Loss Function:

    • The loss function is binary cross-entropy for the final output, which is the reconstructed image. If you need to include any specific loss for the intermediate output in your use case, ensure the shapes are correctly managed and custom loss computations are handled manually.

Notes:

  • It’s crucial to align the shapes of intermediate layers correctly.
  • If you need the bottleneck features (conv3) for another purpose, consider managing them separately outside the primary model or adjust the concatenation strategy.

Sources:

Dynamic Shapes in Deep Learning: torch.compiler_dynamic_shapes.rst (internal document)

Working with tf.keras Layers: extension_type.ipynb (internal document)

1 Like

I cannot thank you enough for the explanation. However i need the encoder output, so far i managed to run the program by modifying the outputs. Instead of conv3 directly i treat conv3 apart at the end like shown in the code. So it matches the same shape.

def custom_architecture(input_shape):
    inputs = input_shape
    # Encoder
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(inputs)
    conv1 = Conv2D(64, (13, 13), activation='relu', padding='same')(conv1)
    pool1 = MaxPooling2D((2, 2), padding='same')(conv1)
    
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(pool1)
    conv2 = Conv2D(128, (13, 13), activation='relu', padding='same')(conv2)
    pool2 = MaxPooling2D((2, 2), padding='same')(conv2)
    
    # Bottleneck
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(pool2)
    conv3 = Conv2D(256, (13, 13), activation='relu', padding='same')(conv3)
    
    # Decoder
    upconv2 = UpSampling2D((2, 2))(conv3)
    conv2_upsampled = Conv2D(128, (1, 1), activation='relu', padding='same')(conv2)
    concat2 = Concatenate(axis=-1)([upconv2, conv2_upsampled])
    conv4 = Conv2D(128, (3, 3), activation='relu', padding='same')(concat2)
    conv4 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv4)
    
    upconv1 = UpSampling2D((2, 2))(conv4)
    conv1_upsampled = Conv2D(64, (1, 1), activation='relu', padding='same')(conv1)
    concat1 = Concatenate(axis=-1)([upconv1, conv1_upsampled])
    conv5 = Conv2D(64, (3, 3), activation='relu', padding='same')(concat1)
    conv5 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv5)
    
    # Output layer
    #outputs = Conv2D(1, (1, 1), activation='sigmoid', padding='same')(repeat_gap)
    conv_output = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(conv5)
    outputs = conv_output#layers.Flatten()(conv_output)
    
    
    flat = layers.Flatten()(conv3)
    flat = layers.Reshape((100,100,16))(flat)

    #flatDense = layers.Flatten()(conv3)
    #flatDense = layers.Dense(3, activation='relu')(flatDense)

    flat=Conv2D(3,(1,1), activation='relu', padding='same')(flat)

    codification_output = flat  

    # Create the model
    
    model = Model(inputs, [outputs,codification_output])#conv3])
    
    return model

Then as i want to use that as an array to compare to other images i applied a flatten layer and a dense(25) to turn the tensor to an array and reduce the size respectively. Snippet of my code below:

            coded_img = infered_patch[1] # position 0 of the result is the reconstructed image and position 1 the result of the encoder
            flatten_coded_img = layers.Flatten()(coded_img)
            dense_flatten_coded_img = layers.Dense(25,activation='relu')(flatten_coded_img)
            coded_array = dense_flatten_coded_img[0].numpy()
            shrinked_array = shrink_array(coded_array)

My objective is to use the encoder part of the nn to identify how different two images of the same object are, in order to clarify if is the exact same object or another object of the same type. (I have photos taken by camera and by phone and i need to say wether this phone photo is from this product etc., as exepcted photos taken by the phone are in different angles and by external factors some of the features of the object may be a eroded).

As a sum up i want to get the array of the encoder to use it as a sort of id and compare it to the rest of the ids of the photos. If my hypothesis is correct the one of the same object taken by the phone camera should be closest one from the phone camera ones.

thanks again for the support, What do you think of my approach is it valid?, do you think there is another way to implement this?, maybe another type of nn ?