Finetuned GPT2 Summary model generates from very begining not from 'Summary'

I created a text summary ml model by fine-tuning GPT2 with the custom dataset.

The each training example was created by concatenating inputs and targets like this: ‘Document:{document}\nSummary:{Summary}’

But the problem here is the model starts generating from Document not from Summary. Would be there anyway to handle this problem?

Or it would be something not possible at all?

@Seungjun_Lee Welcome to the Forum !

When fine-tuning a language model like GPT-2, it’s common for the model to generate text starting from the provided input rather than following a specific prompt or instruction. In your case, since the training examples are constructed by concatenating the document and summary as ‘Document:{document}\nSummary:{Summary}’, the model learns to generate text from the document part as it’s the first part in the concatenated input.

If you want the model to generate text starting from the summary, you can modify the way you construct the training examples. Instead of concatenating the document and summary together, you can provide the summary as the initial prompt and the document as the continuation. For example:

Training example :
Input: ‘Summary:{Summary}\nDocument:{document}’

By reversing the order of the summary and document in the input, you are instructing the model to generate text starting from the summary part. During training, the model will learn to generate text that follows the summary.

Keep in mind that this change in training data construction will require corresponding adjustments in your data preprocessing and model input handling. You’ll need to update your code to separate the summary and document parts appropriately during training and inference.

By modifying the training examples in this way, you can guide the model to generate text starting from the summary rather than the document.

Let us know if this solves the query.

1 Like

Thanks, will try that!

So to clarify is that it is not possible to fine-tune GPT2 in Seq2Seq manner?