Subscribe

Let's build a REST API in Rust 🦀 - Part 3

Nov 11, 2024

Data processing with Rust ✨

 In the last 2 weeks we have built:

Today, I will show you step by step how to serve Real World data, and on the way I will introduce you to your new best friends for data processing in Rust: Polars 🐻‍❄️

Let’s start!

You can find all the source code in this repository

Go to the repo

 

The starting point 📍

Our API has 2 endpoints:

  • /health to indicate if the API is up and running.

  • /trips to serve historical data of taxi rides.

And it works like a charm on localhost. However, our API currently returns some fake data we hard coded last week.

Let me show, step by step, how to adjust the code so we serve Real World data from this website.

 

Our plan 🧭

Let’s write a new function, called get_trips() in our backend.rs file, that

  • Extracts the year and month from the timestamp in from_ms

  • Downloads the corresponding parquet file,

  • Loads the parquet file, filters and sorts it using Polars, and

  • Returns the filter data as a list of Trips.

Let’s go one by one:

 

Step 1 → Get year and month from the timestamp in milliseconds

fn get_year_and_month(from_ms: i64) -> (i32, i32) { ... }

extracts the year and month given the timestamp in milliseconds from_ms.

Why unwrap() 

The function from_timestamp() returns an Option<DateTime<Utc>> because not all timestamps are valid datetime values.

When you call unwrap():

  1. If the Option contains a value (Some(value)), it returns the value

  2. If the Option is None, it will panic (crash the program with an error message)

 

Step 2 → Download parquet file with real world data

The function

fn download_parquet_file(year, month) -> String { ... }

downloads the parquet file with historical data for the given year and month from this website, using the reqwest crate, and saves the data asyncronously into a local file, using tokyo.

 

Step 3 → Load parquet file into memory

The next function

fn get_trips_from_file(
    file_path: &str, 
    from_ms: i64, 
    n_results: i64
) -> Result<Vec<Trip>> { ... }

loads and filters the parquet file that the previous function downloaded using Polars.

What is Rust Polars? 🐻‍❄️

Polars is a lightning-fast DataFrame library written in Rust, similar to pandas in Python. Think of it as pandas' speedy cousin, but with some key modern design choices that make it great for handling large datasets.

Key differences from pandas:

  1. It's significantly faster (often 5-10x)

  2. Uses lazy evaluation by default (like Spark)

  3. Better memory efficiency

  4. More predictable behavior (stricter rules about data types)

 

Finally, we need to transform the Polars dataframe into a list of Trips, and return them.

BOOM!

 

Next steps 👣

Next week we will

  • add some logging, and

  • dockerize our REST API for production 

Talk to you next week,

Take care,

Loved your loved ones,

And of course, keep on learning Real-World-Zero-BS ML/MLOps and GenAI with me 😉

Pau

The Real World ML Newsletter

Every Saturday

For FREE

Join 19k+ ML engineers ↓