Fine-Tuning Embedding Models

FAQs

The amount of data depends on the use case, but generally, several thousand examples of similar content pairs are needed. Adding more examples will improve the quality of your fine-tuned model.

Depending on the amount of data and computational resources available, the process typically takes several hours to days.

If done properly, fine-tuning significantly enhances domain-specific understanding while maintaining general language capabilities. However, overly narrow training can make the fine-tuned model too specialized.

Test the model on sample content from your use case. Compare similarity scores and rankings of related content pairs before and after fine-tuning to measure improvement.

Fine-tuning is most beneficial when you have a large amount of domain-specific data that differs significantly from the model's base training data. Before deciding to fine-tune, consider factors such as the uniqueness and complexity of your content, the amount of labeled data available, and the potential benefits and improvements you expect from fine-tuning.

Get started with MongoDB Atlas

Try Free

Fine-Tuning Embedding Models

FAQs

How much data is needed to fine-tune?

How long does fine-tuning take?

Will fine-tuning affect the model's general language capabilities?

How do I know if fine-tuning is working?

Is fine-tuning the right choice for my use case?

Get started with MongoDB Atlas