Subscribe

Let's build a real-time ML system

Sep 20, 2024

Let me show you step by step, how to build an ML system that can predict crypto prices in the next 5 minutes using real-time market data.

Let’s start!

 

System design 💡📐

As any ML system, ours can be decomposed into 3 types of pipelines:

  • Feature pipeline transforms real-time market trades into 1-minute Open-High-Low-Close candles.

  • Training pipeline loads historical candles and builds/trains a predictive ML model, and push it to the model registry.

  • Inference pipeline with a REST API that loads the model from the registry, and for each incoming request fetches the freshest features from the store, generates a prediction and returns it to the client.

 

Let’s break down each of these 3 pipelines.

 

1. Feature pipeline ⚡

Our feature pipeline does 3 things:

  1. Ingests real-time trade data from the Kraken websocket API,

  2. Transforms this stream of trades into 1-minute Open High Low Close candles, and

  3. Saves these candles into the Feature Store

I recommend you implement each of these steps in a separate micro services, and use a streaming data platform (aka message bus) like Apache Kafka, or Redpanda, to transfer data between them.

Why do I need a message bus? 🤔

The message bus makes the pipeline recoverable without data loss if something bad happens, for example, if the Feature Store service is down.

It also allows one-to-many communication, so the transformed data can be read in real-time by more than one service (more on this later).

 

Each of these 3 micro services is a streaming application that continuously reads, transforms and produces data to Kafka/Redpanda topics.

 

My recommendation 💡

The easiest way I know to build streaming applications is with the Quix Streams open-source Python library. It is written in pure Python, and exposes a Pandas like API, that makes it very easy to use from day 1.

➡️ Give it a star ⭐ on Github to support the open-source

 

Once you have the feature pipeline up and running, you need to run it in “historical” model, to backfill historical candles and save them to the feature store. Kraken has a historical data API you can ingest trade data from, transform it and save it in the EXACT SAME WAY you process the real-time data.

Once you have historical features in the feature store, we can move on to the training pipeline.

 

2. Training pipeline 🏋️ 

The training pipeline does 3 things:

  1. Reads historical features and targets from the feature store,

  2. Builds a predictive model, that maps features to targets, and

  3. Saves the model artifact to the model registry.

My recommendation 💡

I recommend you first build a quick baseline model, without using ML, to establish the baseline performance.

Then you start iterating on the model, trying to squeeze as much signal from the features as possible. A good model for tabular data is XGBoost, which you can further optimize with hyperparameter tuning.

 

Once you are happy with the results, push the model to the model registry, so it can be later used by our inference pipeline.

 

3. Inference pipeline 🔮

We can serve predictions in real-time using a REST API, that

  1. Loads the model from the registry and starts listening for incoming request.

  2. For each request, it fetches the freshest features from the store,

  3. Feeds them into the ML Model to generate a fresh prediction, and

  4. Returns this prediction to the client app.

 

Bonus 🎁 → Incremental learning

Predicting crypto prices is very hard, because any pattern between historical data and future prices is doomed to live short.

Because of this, I suggest you implement an incremental learning system, that keeps your deployed model as fresh as possible.

 

Attention 🚨

Incremental learning is not necessary 95% of real-world problems, but this one is a 5% exception.

 

 

Wanna build this system with me? 👩‍💻👨🏽‍💻👨‍💻

 After completing this 4-week program (+ A LOT of hard work on your end) we will know how to

  • Build modular and scalable real-time ML systems

  • For the business problem you care about

  • Using any real-time data source

  • Following MLOps best practices, for fast iteration and short-time-to-market.

And of course, we will implement incremental learning ⚡

 

Wanna know more about

Building a Real-time ML System. Together?
↓↓↓

👉 Click HERE to learn more

Talk to you next week

Peace and Love

The Real World ML Newsletter

Every Saturday

For FREE

Join 20k+ ML engineers ↓