Subscribe

GPT4 or Llama 2?

Oct 10, 2023

The first question you have when you build a product using a Large Language Model is the following:

Should I use a third-party API, like OpenAI GPT4, or run an open-source LLM, like Llama 2, inside my cloud environment?

This is how I approach this dilemma ↓

 

OpenAI APIs to validate product idea 💡

When your goal is to quickly put together a Proof of Concept (PoC), use a third-party API, like OpenAI GPT4.

Why?

Because third-party LLM APIs

→ are cheap at small scale and do not have upfront costs,
→ are easy to integrate with your code (all you need is an API key), and
→ require no deployment and maintenance on your end.

Third-party APIs bring you up-to-speed without much effort. This way you can validate your PoC without delving into a rabbit hole of technical difficulties and engineering challenges.

Once you have validated your Proof Of Concept, it is time to go one step further.

 

Open-source local LLMs to build your product 🚀 

You can build a PoC using closed LLMs, but if you plan to play the long-game, you better invest in

→ developing a fine-tuning LLM stack inside your cloud environment, and
→ creating high-quality datasets for the business problems you want to solve.

Why?

Because of 3 reasons

 

Reason 1. Fine-tuned models are better and cheaper 🤑

Small fine-tuned models often beat 100x larger models, like GPT4, when you use high-quality dataset for the particular problem you want to solve.

For example
SQLCoder is a small 15B parameter model that outperforms GPT-3.5 for natural language to SQL generation tasks.
Moreover, when fine-tuned on a given schema, it also outperforms GPT-4❗

Smaller models are also cheaper to run, as you need less powerful hardware to deploy them.

Hence, invest some resources on building high-quality datasets, and learning how to deploy open-source models in your cloud environment. The investment will pay off real fast.

How to deploy an open-source LLM?

Open-source
 models, like Llama 2, Falcon or Mistral, are quickly becoming easier to deploy on consumer hardware, thanks to open-source projects like

→ llama cpp 42k⭐
→ Candle Rust 9k⭐, or
→ vLLM 8k⭐

With these libraries you can run models up to 10B parameters on a single MacBook 🎉

 

Reason 2. Keep your data at home 🏠

LLMs are as good as the data you

→ embed in your prompts (aka the context), and
→ the historical data you can use to fine-tune them (aka supervised data for fine-tuning)

Data is your differentiator, so you better keep it safe inside your cloud environment.

When you use external third-party APIs, like OpenAI, you send your private data through the Internet. And this is not good.

If you deploy your own open-source model inside your cloud environment, you keep your data at home.

 

Reason 3. Switching back to third-party APIs is always an option 🔙 

Going back to OpenAI APIs is easy. Going the other way around is hard.

Hence, you don’t lose much by investing today in levelling up your LLM stack and in-house human expertise.

Wanna learn more Real World ML?

Subscribe to my weekly newsletter

Every Saturday

For FREE

Join 22k+ ML engineers ↓