Subscribe

Real time ML for asset price prediction

Dec 24, 2023

Let’s design a real-time ML system that can predict crypto prices in the next 60 seconds, using 3 pipelines.

  • Feature pipeline, to transforms raw trades into trading indicators.

  • Training pipeline, to generate a good predictive model.

  • Inference pipeline, to serve the system predictions in real-time.

Warning 🚨
This blog post won’t make you rich. However, you will learn the tools that can help you (together with lots of experimentation) make some money with ML.

Let’s get started!

 

The feature pipeline 📊

 

is the Python script that

→ Ingests raw market data, like a real-time stream of trades from the Kraken Websocket API

→ Transforms this raw data into predictive trading indicators, like momentum indicators, and

→ Saves these indicators (aka features) in a feature store. Features are saved in an online feature group, so they can be later retrieved by the inference pipeline at low latency.

What is an online feature group? 🤔

Feature Store is like dragon with 2 heads 🐉. Internally, it has 2 databases:

  • An offline database, that stores all the historical feature values. This database has unlimited capacity, but it is slow to query. Hence, it is not best to serve features in a real-time ML system, but it is great to generate training data.

  • An online database, to store the latest key-value feature pairs. It is designed for speed, so it is ideal for fast serving in a real-time ML system.

 

 

The training pipeline 🏋️‍♂️

 

is the Python script that

→ Fetches historical features, and targets from the Offline Feature Store.

→ Generates a predictive model, possibly using Machine Learning and hyper-parameter tuning, and

→ Pushes the model to the model registry.

Advice 💡

Before building any ML model, establish a baseline performance using an simple heuristic (`Predicted price = Average of last 10 prices`)

Predicting asset prices is very difficult, and simple baselines like these are hard to beat.

Then, try a linear model, like Lasso, to make sure you are not overfitting the training data. And if things look good, give XGBoost a try.

 

 

The inference pipeline 🔮

 

Finally, you need to serve models predictions through a REST API. You can build one in Python using a framework like FastAPI. The logic is the following:

→ Load the model from the registry and start accepting requests.

→ And for each request:

  • fetch the most recent features from the online Feature Store,

  • pass them to the model to generate a prediction,

  • send response to the client.

 

How can you improve this system? 🤔

 

The most effective way to improve an ML system is by building more feature pipelines, that can add valuable signals to your model.

In this case, we could build a second real-time feature pipeline that computes sentiment from a stream of financial news.

 

Video lecture 🎬

 

👉🏽 Subscribe to the Real-World ML Youtube channel for more free lectures like this ↓

 

Let’s keep on learning.