I have 2 models that both work with 224x224x3 images: 1 is for binary classification with an output of (None, 1) and the other one returns (None, 365), expecting as a final result the same binary classification of the 1st model. This is for my thesis and the idea is to check if early-fusion would improve the accuracy of the 1st standalone model.
For this, I need to perform early-fusion using these 2 models, but I have never performed multi-modal fusion before and, despite understanding the theory of how it works, I find myself lost in implementing it in code (i have only seen late-fusion examples, but not early-fusion) . So far, I’ve come to understand that I have to:
-
Extract input-level features of both models.
-
Concatenate them in a single layer.
-
Build a new model that ends up in binary classification.
-
Not necessary to retrain this new model.
What advice can you give me solving this issue? Are there any concepts I am not comprehending correctly?
This is what I have so far:
import keras
import pandas as pd
from places_365 import VGG16_Places365
from keras.models import load_model
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.optimizers import Adam
from keras.metrics import Precision, Recall
NUM_CLASSES = 2
EPOCHS = 5
BATCH_SIZE = 32
places365_model = VGG16_Places365(weights='places')
model_365_features = places365_model(Input(shape=(224,224,3)))
vgg19_model_location = 'models/vgg19_binary.keras'
vgg19_model = load_model(vgg19_model_location)
vgg19_model_features = vgg19_model(Input(shape=(224,224,3)))
concatenate_layer = keras.layers.Concatenate([model_365_features, vgg19_model_features])
model = Sequential()
model.add(concatenate_layer)
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss=losses.BinaryCrossentropy(),
optimizer=Adam(learning_rate=0.0001),
metrics=
['accuracy',
Precision(),
Recall()])
Hi @Javier_Romero1,
Sorry for the delay in response.
I suggest using the Keras functional API
instead of a Sequential
model for more complex layers. Define your input layers for both the Places365 and VGG19 models and then concatenate their feature outputs and build your final model with several dense layers leading to a binary classification output. Make sure to freeze the layers of both pre-trained models to prevent retraining them. Now, compile the model with binary cross-entropy loss and relevant metrics before training it on your dataset.
import keras
import pandas as pd
from places_365 import VGG16_Places365
from keras.models import load_model, Model
from keras import losses
from keras.layers import Dense, Input, Concatenate
from keras.optimizers import Adam
from keras.metrics import Precision, Recall
NUM_CLASSES = 2
EPOCHS = 5
BATCH_SIZE = 32
# Load Places365 model
places365_model = VGG16_Places365(weights='places')
# Get features from Places365 model (assuming it outputs a feature vector)
places365_input = Input(shape=(224, 224, 3))
model_365_features = places365_model(places365_input)
# Load VGG19 binary classification model
vgg19_model_location = 'models/vgg19_binary.keras'
vgg19_model = load_model(vgg19_model_location)
# Get features from VGG19 model (assuming it outputs a feature vector)
vgg19_input = Input(shape=(224, 224, 3))
vgg19_model_features = vgg19_model(vgg19_input)
# Concatenate features from both models
concatenated_features = Concatenate()([model_365_features, vgg19_model_features])
# Build the final model using Functional API
output = Dense(256, activation='relu')(concatenated_features)
output = Dense(128, activation='relu')(output)
output = Dense(64, activation='relu')(output)
output = Dense(32, activation='relu')(output)
output = Dense(16, activation='relu')(output)
output = Dense(8, activation='relu')(output)
final_output = Dense(1, activation='sigmoid')(output)
# Create the new model
model = Model(inputs=[places365_input, vgg19_input], outputs=final_output)
# Freeze layers of base models if you don't want to retrain them
for layer in places365_model.layers:
layer.trainable = False
for layer in vgg19_model.layers:
layer.trainable = False
# Compile the model
model.compile(loss=losses.BinaryCrossentropy(),
optimizer=Adam(learning_rate=0.0001),
metrics=['accuracy', Precision(), Recall()])
# Summary of the model architecture
model.summary()
Hope this helps.Thank You.