Which embedding model should you use?

Apr 19, 2024

Today you will learn how to find the right embedding model for your RAG application. Let’s get started!

Problem❗

Text embeddings are vector representations of raw text that you compute using an embedding model

An embedding model maps your raw text into a vector

These vectors representations are then used for downstream tasks, like

Classification → for example, to classify tweet sentiment as either positive or negative.
Clustering → for example, to automatically group news into topics.
Retrieval → for example, to find similar documents to a given query.

Retrieval (the “R” in RAG) is the task of finding the most relevant documents given an input query. This is one of the most popular use cases for embeddings these days, and the one we will focus on today.

There are many embedding models, both open and proprietary, so the question is:

What embedding model is best for your problem? 🤔

Let me show you how to find the right model for your RAG application ↓

Solution 🧠

First, go to the Massive Text Embedding Benchmark (MTEB) Leaderboard, to find the list of best embeddings models for the retrieval task in your language, for example English.

As per today (April 18th 2024) the number 1 model in the leaderboard is Salesforce/SFR-Embedding-Mistral with

Embedding quality: 59%, measured with the average Normalized Discounted Cumulative Gain (NDCG) over 15 different datasets.
Model size: 7.1 billion parameters

Top 10 embedding models as per April 18th 2024

At this point you might think that Salesforce/SFR-Embedding-Mistral is the model you need… and you are probably wrong 😵‍💫

Why ❓
Because embedding quality is not the only measure you should look at when you build a real-world RAG app. Model size matters, because larger models are slower and more expensive to run.

For example 💁
The 7th model in the leaderboard is snowflake-arctic-embed-l with

Embedding quality of 55.98% → 5% worse than the leader.

Model size: 331 million parameters → 95% smaller than the leader

So, if you are you willing to trade 5% of quality, for 95% cost reduction, you would pick snowflake-arctic-embed-l.

In general, to find the sweet spot ⚖️ between embedding quality and cost, you need to run a proper evaluation of your retrieval step, using your

Dataset → e.g. explodinggradients/ragas-wikiqa
Vector Db → e.g. Qdrant
Other important RAG hyper-parameters, like your chunk size and chunk overlap.

Let’s go through an example with full source code.

Hands-on example 👩🏽‍💻👨🏻‍💻

All the source code shown in the video is available in this Github repository.
Give it a star ⭐ on Github to support my work 🙏

Step 1. Git clone the code

From the terminal

$ git clone https://github.com/Paulescu/text-embedding-evaluation.git

Step 2. Install Python dependencies

$ make install

Step 3. Setup external services

Create an .env file

$ cp .env.example .env

and paste your

OPENAI_API_KEY
QDRANT_URL and
QDRANT_API_KEY

OpenAI GPT-3.5 Turbo

You will need an OpenAI API key, because ragas, the framework for RAG evaluation we use, will be making calls to `GPT-3.5 Turbo` to evaluate the context information quality.

Qdrant

We will use Qdrant as the VectorDB, so you also need to create a FREE account on Qdrant.cloud to get your QDRANT_URL and QDRANT_API_KEY

Step 4. Select the models and dataset you want to evaluate

Update the list of models you want to evaluate and the dataset in the config.yml

models:
  # 109 million parameter
  - sentence-transformers/all-mpnet-base-v2
  
  # 334 million parameter
  # - 'Snowflake/snowflake-arctic-embed-l'
  
  # 7.11 billion parameter
  # - 'Salesforce/SFR-Embedding-Mistral'

datasets:
  - explodinggradients/ragas-wikiqa

Step 5. Run the evaluation

From the command line

$ make run-evals

The Python script behind this command

Loads the model and dataset from HuggingFace, with questions, contexts and answers.

Embeds the contexts into the Vector DB, in our case Qdrant.

For each question retrieves the top K relevant documents from the Vector DB

Compares the information overlap between the retrieved documents and the correct answers, using context precision and context recall.

Finally logs the results, so you know what worked best.

{
    "model_name": "sentence-transformers/all-mpnet-base-v2",
    "dataset_name": "explodinggradients/ragas-wikiqa",
    "top_k_to_retrieve": 2,
    "context_precision": 0.9999999999499998,
    "context_recall": 0.7666666666666666,
    "seconds_taken_to_embed": 4.0,
    "seconds_taken_to_retrieve": 0.0
}

By benchmarking different models you will understand where is the sweet spot between quality and price for your particular dataset and RAG setup.

Bonus 🎁 → Which Vector DB should I use?

If you want to build a demo RAG app, any Vector DB will do the job.

However, if you plan on building real-world ML products you need to be more careful with your choice.

My personal recommendation when it comes to Vector DBs is Qdrant.

Why?

Because

Its high-performance (thanks to Rust 🦀), so you get the fastest and most accurate results at the cheapest cloud costs 🤑

It is extremely easy to scale and upgrade, 🎛️ and

Gives you the option to keep your data 100% private 💾, thanks to the new Qdrant Hybrid Cloud

Click to learn more

And before you leave…

Generative AI is very cool, but the reality is that most real world business problems are solved using tabular data and predictive ML models.

If you are interested in learning how to build end-2-end ML systems using tabular data and MLOps best practices, join the Real-World ML Tutorial + Community and get lifetime access to

→ 3 hours of video lectures 🎬
→ Full source code implementation 👨‍💻
→ Discord private community, to connect with me and 350+ students 👨‍👩‍👦

🎁 Gift

Use this direct payment link in the next 5 days and get an exclusive 30% discount!

Which embedding model should you use?

Problem❗

Solution 🧠

Hands-on example 👩🏽‍💻👨🏻‍💻

Step 1. Git clone the code

Step 2. Install Python dependencies

Step 3. Setup external services

OpenAI GPT-3.5 Turbo

Qdrant

Step 4. Select the models and dataset you want to evaluate

Step 5. Run the evaluation

Bonus 🎁 → Which Vector DB should I use?

And before you leave…

🎁 Gift

Wanna learn more Real World ML?

Subscribe to my weekly newsletter

Every Saturday

For FREE

Stay Connected