Subscribe

Batch and Real-time ML deployments

Jan 15, 2024

These are the 2 most popular ways to deploy ML models in the real-world

  • Batch prediction

  • Online prediction as a REST API

Let’s take a closer look at each one.

 

Wanna get more real-world MLOps videos for FREE?
→ Subscribe to the Real-World ML Youtube channel ←

 

Batch prediction

 

In a batch prediction your model is

→ fed with a batch of input features on a schedule (e.g. every 1 hour/day),

→ generates predictions for this batch, and

→ stores them in a database, from where developers and downstream services can use them whenever they need them.

Batch inference fits perfectly into Spark’s philosophy of large batch processing, and can be scheduled using tools like Apache Airflow.

 

Pros

 

✅ It is the most straightforward deployment strategy, and this is its main advantage.

 

Cons

 

❌ It can be terribly inefficient, as most predictions generated by a batch-scoring system are never used. For example:

  • If you work for an e-commerce site where only 5% of users log in every day, and you have a batch-scoring system that runs daily, 95% of the predictions will not be used.

❌ The system reacts slowly to data changes. For example

  • Imagine a movie recommender model that generates predictions every hour for each user. It will not take into account the most recent user activity from the last 10 minutes, and this can greatly impact the user experience.

  • Oftentimes, the slowness of the system is not just an inconvenience but a deal breaker. As an example, an ML-powered credit card fraud detection system CANNOT be deployed as a batch-scoring system, as this would lead to catastrophic consequences.

To make ML models react faster to the data, many companies and industries are transitioning their systems to online predictions.

 

Online prediction (aka real-time ML)

 

The key difference between batch inference and real-time inference is the ability of your model to react to recent data. For that, you need to collect model inputs in real-time (or almost real-time, with a streaming tool like Apache Kafka) and pipe them into your model.

The model is deployed either

  • as a container behind a REST (or RPC) endpoint, using a library like Flask or FastAPI.

  • as a lambda function inside your Streaming.

Image

Although real-time ML systems unlock new possibilities, they also come with high implementation and maintenance costs for most companies, as well as a lack of human expertise to keep them working.

 

Pros

 

✅ Your model takes into account (almost) real-time data, so its predictions are as fresh and relevant as they can be. Real-world examples are recommendation systems like TikTok’s Monolith or Netflix.

 

Cons

 

❌ Real-time ML has a steep learning curve. While Python is the lingua franca of ML, streaming tools like Apache Kafka and Apache Flink run on Java and Scala, which are still unknown to most data scientists and ML engineers.

Fortunately, things are changing fast in the last months and years, with the emergence of Python first tools like Bytewax, Quix or Pathway.

 

My advice 💡

 If you haven’t built any ML system end-to-end, I recommend you start with a batch-scoring system.

This will help you build confidence, learn tons of things, and add your first real-world project to your ML portfolio.

 

Wanna build your first ML app with my help?

 Join the Real-World ML Tutorial + Community and build step-by-step, a batch-scoring system that predicts taxi demand in NYC hour-by-hour and serves predictions to this public dashboard.

You will get lifetime access to

→ 3 hours of video lectures and slides 🎬
→ Full source code implementation of the system 👨‍💻
→ Discord private community, to connect with me and 200+ students 👨‍👩‍👦

The Real World ML Newsletter

Every Saturday

For FREE

Join 20k+ ML engineers ↓