How can I perform multi-modal early-fusion using these 2 models?

Javier_Romero1 · March 28, 2024, 6:26am

I have 2 models that both work with 224x224x3 images: 1 is for binary classification with an output of (None, 1) and the other one returns (None, 365), expecting as a final result the same binary classification of the 1st model. This is for my thesis and the idea is to check if early-fusion would improve the accuracy of the 1st standalone model.

For this, I need to perform early-fusion using these 2 models, but I have never performed multi-modal fusion before and, despite understanding the theory of how it works, I find myself lost in implementing it in code (i have only seen late-fusion examples, but not early-fusion) . So far, I’ve come to understand that I have to:

Extract input-level features of both models.

```
Concatenate them in a single layer.
```

Build a new model that ends up in binary classification.

Not necessary to retrain this new model.

What advice can you give me solving this issue? Are there any concepts I am not comprehending correctly?

This is what I have so far:

import keras
import pandas as pd
from places_365 import VGG16_Places365
from keras.models import load_model
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.optimizers import Adam
from keras.metrics import Precision, Recall


NUM_CLASSES = 2
EPOCHS = 5
BATCH_SIZE = 32

places365_model = VGG16_Places365(weights='places')
model_365_features = places365_model(Input(shape=(224,224,3)))

vgg19_model_location = 'models/vgg19_binary.keras'
vgg19_model = load_model(vgg19_model_location)
vgg19_model_features = vgg19_model(Input(shape=(224,224,3)))


concatenate_layer = keras.layers.Concatenate([model_365_features, vgg19_model_features])

model = Sequential()
model.add(concatenate_layer)
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss=losses.BinaryCrossentropy(),
              optimizer=Adam(learning_rate=0.0001),
              metrics=
              ['accuracy',
               Precision(),
               Recall()])

aniruthraj · January 8, 2025, 2:41pm

Hi @Javier_Romero1,

Sorry for the delay in response.
I suggest using the Keras functional API instead of a Sequential model for more complex layers. Define your input layers for both the Places365 and VGG19 models and then concatenate their feature outputs and build your final model with several dense layers leading to a binary classification output. Make sure to freeze the layers of both pre-trained models to prevent retraining them. Now, compile the model with binary cross-entropy loss and relevant metrics before training it on your dataset.

import keras
import pandas as pd
from places_365 import VGG16_Places365
from keras.models import load_model, Model
from keras import losses
from keras.layers import Dense, Input, Concatenate
from keras.optimizers import Adam
from keras.metrics import Precision, Recall

NUM_CLASSES = 2
EPOCHS = 5
BATCH_SIZE = 32

# Load Places365 model
places365_model = VGG16_Places365(weights='places')
# Get features from Places365 model (assuming it outputs a feature vector)
places365_input = Input(shape=(224, 224, 3))
model_365_features = places365_model(places365_input)

# Load VGG19 binary classification model
vgg19_model_location = 'models/vgg19_binary.keras'
vgg19_model = load_model(vgg19_model_location)
# Get features from VGG19 model (assuming it outputs a feature vector)
vgg19_input = Input(shape=(224, 224, 3))
vgg19_model_features = vgg19_model(vgg19_input)

# Concatenate features from both models
concatenated_features = Concatenate()([model_365_features, vgg19_model_features])

# Build the final model using Functional API
output = Dense(256, activation='relu')(concatenated_features)
output = Dense(128, activation='relu')(output)
output = Dense(64, activation='relu')(output)
output = Dense(32, activation='relu')(output)
output = Dense(16, activation='relu')(output)
output = Dense(8, activation='relu')(output)
final_output = Dense(1, activation='sigmoid')(output)

# Create the new model
model = Model(inputs=[places365_input, vgg19_input], outputs=final_output)

# Freeze layers of base models if you don't want to retrain them
for layer in places365_model.layers:
    layer.trainable = False

for layer in vgg19_model.layers:
    layer.trainable = False

# Compile the model
model.compile(loss=losses.BinaryCrossentropy(),
              optimizer=Adam(learning_rate=0.0001),
              metrics=['accuracy', Precision(), Recall()])

# Summary of the model architecture
model.summary()

Hope this helps.Thank You.

Topic		Replies	Views
A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 7, 7, 512), (None, 2)] General Discussion github , tfjs_converter , tfmodel	1	355	March 25, 2024
Merge two models General Discussion models	7	2782	May 4, 2023
How to delete redundant input layer from early-fusion model General Discussion model-code , tensorflow-data	1	118	January 3, 2025
About Feature Fusion and Model Fusion TensorFlow models , random_forests , help_request	4	2483	March 9, 2022
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 2, 2, 128), (None, 3, 3, 128)] TensorFlow models , keras	1	1548	June 6, 2023

How can I perform multi-modal early-fusion using these 2 models?

Related topics