Finetuned GPT2 Summary model generates from very begining not from 'Summary'

Seungjun_Lee · June 24, 2023, 11:29am

I created a text summary ml model by fine-tuning GPT2 with the custom dataset.

The each training example was created by concatenating inputs and targets like this: ‘Document:{document}\nSummary:{Summary}’

But the problem here is the model starts generating from Document not from Summary. Would be there anyway to handle this problem?

Or it would be something not possible at all?

Tanya · June 27, 2023, 11:18pm

@Seungjun_Lee Welcome to the Forum !

When fine-tuning a language model like GPT-2, it’s common for the model to generate text starting from the provided input rather than following a specific prompt or instruction. In your case, since the training examples are constructed by concatenating the document and summary as ‘Document:{document}\nSummary:{Summary}’, the model learns to generate text from the document part as it’s the first part in the concatenated input.

If you want the model to generate text starting from the summary, you can modify the way you construct the training examples. Instead of concatenating the document and summary together, you can provide the summary as the initial prompt and the document as the continuation. For example:

Training example :
Input: ‘Summary:{Summary}\nDocument:{document}’

By reversing the order of the summary and document in the input, you are instructing the model to generate text starting from the summary part. During training, the model will learn to generate text that follows the summary.

Keep in mind that this change in training data construction will require corresponding adjustments in your data preprocessing and model input handling. You’ll need to update your code to separate the summary and document parts appropriately during training and inference.

By modifying the training examples in this way, you can guide the model to generate text starting from the summary rather than the document.

Let us know if this solves the query.

Seungjun_Lee · June 28, 2023, 1:05am

Thanks, will try that!

So to clarify is that it is not possible to fine-tune GPT2 in Seq2Seq manner?

Topic		Replies	Views
Fine-tuning GPT2 for text summary Keras keras_nlp , transformers	1	823	December 27, 2024
OpenAI’s GPT-2 param run model General Discussion nlp , transformers	1	1286	September 25, 2025
Issue with creating a custom loss function Keras models , keras_nlp	0	448	June 26, 2023
T5 fine-tuned model: one method ignores min_target_length parameter while one does not General Discussion models , transformers	1	379	August 23, 2024
How to jointly predict a sequence and its associated scoremo Keras models , keras , help_request	1	1481	January 18, 2024

Finetuned GPT2 Summary model generates from very begining not from 'Summary'

Related topics