What technique or model to use

Rhaymison_Cristian · October 20, 2023, 11:52am

I would like guidance for developing a model.

The challenge is that I have a CSV with more than 150,000 medicines and their characteristics.

I need to train a model (as if it were a Gpt for example but obviously much less powerful).
Where I could ask anything about a certain medication and he would answer. The model should only be an expert on the data in this csv.

Would anyone have any ideas on how to proceed with this project? The idea was that it would be training not like Feelings Analysis where we have flags, but a fluid model where we ask questions and get an answer.
The answer does not need to be creative, just csv data.

The idea of the model is precisely because not everyone knows the exact name. So if someone asked about Pacentamol, he would identify that it was Paracetamol and return the data.

Thank you all in advance.

tagoma · October 20, 2023, 1:29pm

Hi @Rhaymison_Cristian
What does your data/dataset look like ?
Can you provide example of a question? How much advanced does it have to be from the basic user inputs the name of the medication and model spits out the “hard-coded” data?

Rhaymison_Cristian · October 20, 2023, 2:09pm

@tagoma
Of course I can provide it. Below is the csv header and an example line:
PRODUCT_TYPE;PRODUCT_NAME;PROCESS_END_DATE;REGULATORY_CATEGORY;PRODUCT_REGISTRATION_NUMBER;REGISTRATION_DUE_DATE;PROCESS_NUMBER;THERAPEUTIC_CLASS;REGISTER_HOLDING_COMPANY;REGISTRATION_SITUATION;ACTIVE_PRINCIPLE
“MEDICATION”;" (VITAMINS A ) + ASSOCIATIONS";“25/04/2000”;“SIMILAR”;104540166;“01/04/2005”;“25000025416”;“VITAMINS AND MINERAL SUPPLEMENTS”;“60874187000184 - DAIICHI SANKYO BRASIL FARMAC�UTICA LTDA”;“CADUCO/CANCELED”;

The idea is basically this. The user just types in the name of the medicine and the model returns the data that would be:
Correct name, active ingredient and therapeutic class.

example:
User input: Diasepan

output:
Diazepam, antidepressant, anxiolytic.

All this information is in the csv for training, as you can see.

Below the diazepam line:
“MEDICAMENT”;“FUNED DIAZEPAM”;“01/30/2001”;“SIMILAR”;112090016;“06/01/2015”;“25000012115”;“ANTIDEPRESSANTS”;"17503475000101 - EZEQUIEL DIAS FUNDATION - FUNED ";“ANSIOLYTIC”;

Felix_M · October 21, 2023, 2:22am

Are you referring to something like this?

Prompt-tuning an Existing LLM

Perhaps the most common approach to customizing the content of an LLM for non-cloud vendor companies is to tune it through prompts. With this approach, the original model is kept frozen, and is modified through prompts in the context window that contain domain-specific knowledge. After prompt tuning, the model can answer questions related to that knowledge. This approach is the most computationally efficient of the three, and it does not require a vast amount of data to be trained on a new content domain.

Topic		Replies	Views
Is it possible to indicate that the value of ... is x data and that is how it learns? General Discussion models , help_request	1	279	August 21, 2024
Text classification to generate query params General Discussion datasets , ai , machine-learning , model	0	395	December 15, 2023
Which model to use for multicolumn string data General Discussion models , keras	3	420	January 6, 2023
Brand new and want help to learn General Discussion models , training	1	456	January 16, 2023
How do I train a model for my specific context? Google AI Studio model , model-training	1	253	August 5, 2025

What technique or model to use

Related topics