GPT4 or Llama 2?
Oct 10, 2023The first question you have when you build a product using a Large Language Model is the following:
Should I use a third-party API, like OpenAI GPT4, or run an open-source LLM, like Llama 2, inside my cloud environment?
This is how I approach this dilemma ↓
OpenAI APIs to validate product idea 💡
When your goal is to quickly put together a Proof of Concept (PoC), use a third-party API, like OpenAI GPT4.
Why?
Because third-party LLM APIs
→ are cheap at small scale and do not have upfront costs,
→ are easy to integrate with your code (all you need is an API key), and
→ require no deployment and maintenance on your end.
Third-party APIs bring you up-to-speed without much effort. This way you can validate your PoC without delving into a rabbit hole of technical difficulties and engineering challenges.
Once you have validated your Proof Of Concept, it is time to go one step further.
Open-source local LLMs to build your product 🚀
You can build a PoC using closed LLMs, but if you plan to play the long-game, you better invest in
→ developing a fine-tuning LLM stack inside your cloud environment, and
→ creating high-quality datasets for the business problems you want to solve.
Why?
Because of 3 reasons
Reason 1. Fine-tuned models are better and cheaper 🤑
Small fine-tuned models often beat 100x larger models, like GPT4, when you use high-quality dataset for the particular problem you want to solve.
For example
SQLCoder is a small 15B parameter model that outperforms GPT-3.5 for natural language to SQL generation tasks.
Moreover, when fine-tuned on a given schema, it also outperforms GPT-4❗
Smaller models are also cheaper to run, as you need less powerful hardware to deploy them.
Hence, invest some resources on building high-quality datasets, and learning how to deploy open-source models in your cloud environment. The investment will pay off real fast.
How to deploy an open-source LLM?
Open-source models, like Llama 2, Falcon or Mistral, are quickly becoming easier to deploy on consumer hardware, thanks to open-source projects like
→ llama cpp 42k⭐
→ Candle Rust 9k⭐, or
→ vLLM 8k⭐With these libraries you can run models up to 10B parameters on a single MacBook 🎉
Reason 2. Keep your data at home 🏠
LLMs are as good as the data you
→ embed in your prompts (aka the context), and
→ the historical data you can use to fine-tune them (aka supervised data for fine-tuning)
Data is your differentiator, so you better keep it safe inside your cloud environment.
When you use external third-party APIs, like OpenAI, you send your private data through the Internet. And this is not good.
If you deploy your own open-source model inside your cloud environment, you keep your data at home.
Reason 3. Switching back to third-party APIs is always an option 🔙
Going back to OpenAI APIs is easy. Going the other way around is hard.
Hence, you don’t lose much by investing today in levelling up your LLM stack and in-house human expertise.