Hybrid Language - LLM route?

Hello community!

I’m new. Looking for tutorials that match a specific use case.

It is a new LANGUAGE of new unique tokens.

Basically : 1,000 new tokens to train.

How best to do that?

ENVIRONMENT: WEB and Node.js.

With a SPEECH to TEXT app, which does not recognize the 1,000 new tokens yet.

GOAL: a STT to recognize NEW tokens.

Imagine medical terminology, of long complex words, of concatenated sub tokens.

Q: Similar to other projects?

Q: What tutorials would be most relevant?

TYSM.

1 Like