Tokenizer vs TextVectorization

Jeremy_Ouyang · July 21, 2023, 6:41am

I was wondering what the difference between these two classes were? Can I just directly use TextVectorization in place of using the Tokenizer?

chunduriv · July 21, 2023, 7:02am

@Jeremy_Ouyang,

Both have different purposes and use cases. If you’re building an end-to-end deep learning model for a specific NLP task, TextVectorization is usually more convenient because it handles the tokenization and vectorization in one step and can be easily integrated into your model as a layer.

Where as Tokenizer may be more appropriate if you require more control over the tokenization process.

Thank you!

Topic		Replies	Views
NLP TextVectorization tokenizer General Discussion nlp	1	719	January 18, 2023
Suggestions on the TensorFlow documentation regarding deprecated APIs Site Feedback docs	1	2079	September 12, 2022
Why not specify the shape out of `TextVectorization` class to Keras model General Discussion nlp , keras , help_request	1	1106	November 23, 2021
Trying to use AutoTokenizer with TensorFlow gives: `ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).` General Discussion models , transformers	3	3145	January 9, 2023
TextVectorization significantly slower than sklearn's CountVectorizer General Discussion models , keras , help_request	2	1973	August 31, 2021

Tokenizer vs TextVectorization

Related topics