Imagine you want to build a trading bot for crypto currencies using ML.
Before you even get to work on your ML model, you need to develop a real-time feature pipeline that produces the data your model need both at training time, and at inference time.
Pipeline design
A real-time feature pipeline has 3 steps:
Ingestion step → connects to an external real-time data source, for example, raw trades from the Kraken Websocket API
Transformationstep → transforms this raw data into meaningful signals for your ML model, for example, Open-High-Low-Close (OHLC) candles.
Storing step → these features are saved in a feature store, from where your ML models can retrieve them.
In a real-world setting, each of these steps is implemented as a separate service and containerised with Docker. Communication between these services is implemented with a message broker like Apache Kafka or Redpanda.
This way you make your system scalable, by spinning up more containers as needed, and leveraging Kafka consumer groups.
This is all great, but
How do you implement this in practice? 🤔
Development
We will use an open-source Python library called Quix Streams to develop our real-time feature pipeline.
is a cloud native library for processing data in Kafka using pure Python.
With Quix Streams you get the best from both worlds, low-level scalability and resiliency from Kafka, plus an easy-to-use Python interface.
In our example, we have 3 pipeline steps, each of them implemented in pure Python in their own subdirectory:
trade_producer (ingestion) → reads data from Kraken and pushes it into a Kafka topic.
trade_to_ohlc (transformation) → transforms these raw trades into OHLC candles, which are typically used to derived trading indicators, and push them into another Kafka topic.
ohlc_to_feature_store(destination) → saves the final features in the Hopsworks Feature Store.
Plus, a containerised Streamlit dashboard that fetches and plots these features in real-time.
Let’s take a closer look at the transformation step trade_to_ohlc
Real-time data transformation with Quix
Transforming real-time data in Quix is as easy as using a standard Pandas-like API, thanks to the Quix Streaming Data Frame object.
Streaming Data Frames ⚡
offer a Pandas-like API to work with real-time data. Which means, you can add, remove and transform columns as you do in Pandas, but for real-time data ⚡
To transform raw trades into OHLC 10-seconds candles, we use stateful window operators. More precisely, we build 10-second tumbling windows to bucket trades, and then compute bucket-level metrics, like
Open → first price in the window
High → highest price in the window
Low → lowest price in the window
Close → last price in the window
You can easily implement tumbling windows with Quix using the newly added tumbling_window API
provides fully managed containers, Kafka and observability tools to run your applications in production.
To deploy your pipeline to Quix Cloud you need to
→ Sign up for FREE → Create a Quix Cloud Project and an environment, and → Fork this repository and link it to your newly created Quix Cloud environment.
Now it is your turn 🫵
If you wanna learn real-time ML, you need to get your hands dirty.