Reducing the parameter size of LaBSE(language-agnostic BERT Sentence Embedding) for practical usage

jeongukjae · September 16, 2021, 2:37am

To get good quality language-agnostic sentence embeddings, LaBSE is a good choice. But due to the parameter size(Bert-base size, but #param is 471M), it is hard to fine-tune/deploy appropriately in a small GPU/machine.

So I applied the method of the paper “Load What You Need: Smaller Versions of Multilingual BERT” to get the smaller version of LaBSE, and I can reduce LaBSE’s parameters to 47% without a big performance drop using TF-hub and tensorflow/models.

GitHub: GitHub - jeongukjae/smaller-labse: Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

Relative Links:

Language-agnostic BERT Sentence Embedding(LaBSE) (Paper: [2007.01852] Language-agnostic BERT Sentence Embedding, TF-hub:Google | LaBSE | Kaggle)
Load What You Need: Smaller Versions of Multilingual BERT (Paper: [2010.05609] Load What You Need: Smaller Versions of Multilingual BERT, GitHub: GitHub - Geotrend-research/smaller-transformers: Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.)

lgusm · September 16, 2021, 2:40pm

Nice work Jeon!!

Does the preprocessing model still works with your model? or is there still a need for it?
Did you think about publishing this to TFHub too?

jeongukjae · September 16, 2021, 3:19pm

Thank you

The preprocessing model is exported using the modified vocab file. So this model can be used with updated preprocessing model, not the original one. (You can check here make_smaller_labse.py#L37)

And I didn’t think about publishing this model, because I didn’t train the model, just patched it. Is it okay to publish?

lgusm · September 16, 2021, 4:36pm

Yes I think you should!!

Of course, mention the base model on the description and all.
I’d also publish the updated preprocessing model too to keep consistency.

jeongukjae · September 17, 2021, 3:22am

Oh, then I will check the docs and send a PR publishing this model!

lgusm · September 17, 2021, 9:49am

This is perfect!! Thanks!

Keep me posted, I’d love to try it on this Colab: Classify text with BERT | Text | TensorFlow

jeongukjae · September 19, 2021, 11:13am

Thanks!! I created a PR to upload this model.

lgusm · September 20, 2021, 10:30am

Very good!! thanks for contributing to the community!

lgusm · September 20, 2021, 1:33pm

and it’s live: Kaggle | smaller_LaBSE_15lang | Kaggle

well done!!

I’d update the documentation to link to your preprocessing too!
Great work!

Topic		Replies	Views
How to change seq length in BERT preprocessor from TF Hub General Discussion tfhub , help_request	1	2377	August 26, 2021
Improving state of NLP for TFLite ( Mobile / flutter ) General Discussion models , nlp , tflite , tfhub , education	5	3504	March 24, 2022
Improving Dataflow Pipelines for Text Data Processing Show and Tell education	3	1014	March 3, 2022
TensorFlow Hub new Models thread Announcements models , tfhub	9	6813	March 24, 2022
With `with strategy.scope():` BERT output loses it's shape from tf-hub and `encoder_output` is missing TensorFlow distributed-training , tf-hub	0	589	December 15, 2022

Reducing the parameter size of LaBSE(language-agnostic BERT Sentence Embedding) for practical usage

Related topics