You can take a look at:
https://tensorflow-prod.ospodiscourse.com/t/research-finetuned-language-models-are-zero-shot-learners-by-google-research/4206
As the mentioned OpenbookQA is a multiple choiche Q/A dataset and it is available at:
You can also see how hugginface is finetuning a model with the Keras API:
And an example of Bert on Multiplechoice
(RocStories/SWAG dataset)