i’m doing image captioning using this model

```
def define_model(vocab_size, max_length):
inputs1 = Input(shape=1120)
fe1 = Dropout(0.3)(inputs1)
fe2 = Dense(512, kernel_regularizer=regularizers.l2(1e-4), activation='relu')(fe1)
inputs2 = Input(shape=(max_length,))
se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
se2 = Dropout(0.3)(se1)
se2=BatchNormalization()(se2)
se3 = LSTM(512)(se2)
decoder1 = Concatenate()([fe2, se3])
decoder2 = Dense(512, activation='relu')(decoder1)
outputs = Dense(vocab_size, activation='softmax')(decoder2)
model = Model(inputs=[inputs1, inputs2], outputs=outputs)
opt =tf.keras.optimizers.Adam(learning_rate=1e-4)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
model.summary()
return model
```

i got results

```
epoch 1 loss: 5.3386 - accuracy: 0.1759 - val_loss: 4.9454 - val_accuracy: 0.2214
epoch 2 loss: 4.2776 - accuracy: 0.2832 - val_loss: 3.9343 - val_accuracy: 0.3105
epoch 3 loss: 3.5354 - accuracy: 0.3599 - val_loss: 3.8278 - val_accuracy: 0.3210
epoch 4 loss: 3.2257 - accuracy: 0.4039 - val_loss: 3.9480 - val_accuracy: 0.3101
epoch 5 loss: 3.0297 - accuracy: 0.4326 - val_loss: 4.1156 - val_accuracy: 0.3072
epoch 6 loss: 2.9005 - accuracy: 0.4505 - val_loss: 4.2219 - val_accuracy: 0.3053
epoch 7 loss: 2.8103 - accuracy: 0.4622 - val_loss: 4.2751 - val_accuracy: 0.3020
epoch 8 loss: nan - accuracy: 0.1060 - val_loss: nan - val_accuracy: 0.0000e+00
```

```
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
history = model.fit(train_generator, epochs=50, steps_per_epoch=train_steps, verbose=1, callbacks=checkpoint, validation_data=val_generator, validation_steps=val_steps,batch_size=64)
```