I have been training a decoder based transformer for word generation. But it keeps generating the same words over and over again

Progress_Munoriarwa · October 29, 2022, 2:27pm

I have been trying to create a decoder based transformer for text generation and the text its generating is the same no matter the input sequence

The following is my code some of , the code for preprocessing was remove

def process_batch(ds):
    ds = tokenizer(ds)

    ## padd short senteces to max len using the [PAD] id
    ## add special tokens [START] and [END]

    ds_start_end_packer = StartEndPacker(
        sequence_length=MAX_SEQUENCE_LENGTH + 1,
        start_value = tokenizer.token_to_id("[START]"),
        end_value = tokenizer.token_to_id("[END]"),
        pad_value = tokenizer.token_to_id("[PAD]")
    )

    ds = ds_start_end_packer(ds)

    return ({"decoder_inputs":ds[:, :-1]}, ds[:, 1:])


def make_ds(seq):
    dataset = tf.data.Dataset.from_tensor_slices(seq)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.map(process_batch, num_parallel_calls=tf.data.AUTOTUNE)

    return dataset.shuffle(128).prefetch(32).cache()

train_ds = make_ds(train_seq)
val_ds = make_ds(val_seq)

This is the decoder section i was using keras_nlp
It have 2 decoders layers

decoder_inputs = Input(shape=(None,), dtype="int64", 
                          name="decoder_inputs")

x = TokenAndPositionEmbedding(
    vocabulary_size= VOCAB_SIZE,
    sequence_length = MAX_SEQUENCE_LENGTH,
    embedding_dim = EMBED_DIM,
    mask_zero =True
    )(decoder_inputs)


x = TransformerDecoder(
    intermediate_dim = INTERMEDIATE_DIM, num_heads= NUM_HEADS
 )(x)

x = TransformerDecoder(
    intermediate_dim = INTERMEDIATE_DIM, num_heads= NUM_HEADS
 )(x)

x = Dropout(0.5)(x)


decoder_ouput = Dense(VOCAB_SIZE, activation="softmax")(x)

decoder = Model([decoder_inputs],decoder_ouput)

decoder_outputs = decoder([decoder_inputs])



transformer = Model(inputs=decoder_inputs, outputs=decoder_outputs, name="transformer")
#transformer.load_weights("/content/my-drive/MyDrive/projects/Olsen/weights-improvement-07-0.41.hdf5")
transformer.compile("adam",loss="sparse_categorical_crossentropy", metrics=['accuracy'])

aniruthraj · December 20, 2024, 9:32am

Hi @Progress_Munoriarwa,

Sorry for the delay in response.
This issue might caused by the decoder seeing all tokens at once during training, which leads it to learn inefficient patterns.I suggest to use use_causal_mask=True which ensures each position can only attend to previous position in decoder.

x = TransformerDecoder(
    intermediate_dim=INTERMEDIATE_DIM,
    num_heads=NUM_HEADS,
    use_causal_mask=True  # Add this 
)(x, decoder_sequence_length=MAX_SEQUENCE_LENGTH)

In addition to that, I recommend to add temperature parameter when generating text (around 0.7) to control randomness in the output and include dropout in the TransformerDecoder layers with dropout=0.1 to prevent overfitting.

Hope this helps.Thank You.

Topic		Replies	Views
I have an autoencoder that is supposed to create variation of the input text but this is the output: what went wrong. during evaluation it had an accuracy of 76%. Below is the code General Discussion models , help_request	1	463	December 11, 2024
Text generator modify General Discussion help_request	1	863	January 23, 2024
Getting NaN for loss General Discussion models , datasets , keras , help_request	3	28564	November 9, 2021
Transformer transalation General Discussion help_request	1	273	September 18, 2023
Though Training accuracy is high performance on training data during inference in transformer translation is poor General Discussion models , transformers	0	608	June 9, 2023

I have been training a decoder based transformer for word generation. But it keeps generating the same words over and over again

Related topics