Let's build a REST API in Rust 🦀 - Part 3
Nov 11, 2024Data processing with Rust ✨
Today, I will show you step by step how to serve Real World data, and on the way I will introduce you to your new best friends for data processing in Rust: Polars 🐻❄️
Let’s start!
You can find all the source code in this repository
The starting point 📍
Our API has 2 endpoints:
-
/health to indicate if the API is up and running.
-
/trips to serve historical data of taxi rides.
And it works like a charm on localhost. However, our API currently returns some fake data we hard coded last week.
Let me show, step by step, how to adjust the code so we serve Real World data from this website.
Our plan 🧭
Let’s write a new function, called get_trips() in our backend.rs file, that
-
Extracts the year and month from the timestamp in from_ms
-
Downloads the corresponding parquet file,
-
Loads the parquet file, filters and sorts it using Polars, and
-
Returns the filter data as a list of Trips.
Let’s go one by one:
Step 1 → Get year and month from the timestamp in milliseconds
fn get_year_and_month(from_ms: i64) -> (i32, i32) { ... }
extracts the year and month given the timestamp in milliseconds from_ms.
Why unwrap() ❓
The function from_timestamp() returns an Option<DateTime<Utc>> because not all timestamps are valid datetime values.
When you call unwrap():
If the Option contains a value (Some(value)), it returns the value
If the Option is None, it will panic (crash the program with an error message)
Step 2 → Download parquet file with real world data
The function
fn download_parquet_file(year, month) -> String { ... }
downloads the parquet file with historical data for the given year and month from this website, using the reqwest crate, and saves the data asyncronously into a local file, using tokyo.
Step 3 → Load parquet file into memory
The next function
fn get_trips_from_file(
file_path: &str,
from_ms: i64,
n_results: i64
) -> Result<Vec<Trip>> { ... }
loads and filters the parquet file that the previous function downloaded using Polars.
What is Rust Polars? 🐻❄️
Polars is a lightning-fast DataFrame library written in Rust, similar to pandas in Python. Think of it as pandas' speedy cousin, but with some key modern design choices that make it great for handling large datasets.
Key differences from pandas:
It's significantly faster (often 5-10x)
Uses lazy evaluation by default (like Spark)
Better memory efficiency
More predictable behavior (stricter rules about data types)
Finally, we need to transform the Polars dataframe into a list of Trips, and return them.
BOOM!
Next steps 👣
Next week we will
-
add some logging, and
-
dockerize our REST API for production
Talk to you next week,
Take care,
Loved your loved ones,
And of course, keep on learning Real-World-Zero-BS ML/MLOps and GenAI with me 😉
Pau