Subscribe

3 ML systems you can build TODAY

Feb 12, 2024

Let me share with you 3 complete ML systems, built by Master students at KTH Royal Institute of Technology in Stockholm, under the supervision of Jim Dowling.

 

This is what stands you out from the crowd 🙋🏽👨‍👩‍👧‍👦

These projects are not ML model prototypes inside notebooks, but end-2-end ML apps, that

  • ingest live data, and

  • output fresh predictions

that are useful to solve a particular problem.

This is what has business value, and gets you an ML job.

 

Design principles 📐🏗️

The 3 systems are built using the Feature-Training-Inference pipeline blueprint, where

  • The Feature pipeline transforms live raw data into ML model features, and saves them in the Feature Store.

  • The Training pipeline reads historical training data from the Feature Store, trains a good predictive model and pushes it to the model registry.

  • The Inference pipeline loads a batch of recent features, and generates fresh predictions using the ML model produced by the training pipeline.

This is a universal blueprint you can follow to build any ML system, both offline and real-time.

Without further ado, let’s get to to the projects!

 

Wanna get more real-world MLOps videos for FREE?
→ Subscribe to the Real-World ML Youtube channel ←

 

1. Flight Delay Predictor 🛬

In this project, Giovanni Manfredi and Sebastiano Meneghin have developed a batch-scoring system that uses

  • Weather data, and

  • Historical flight delay information

to predict flight delays at Stockholm’s Arlanda airport for the upcoming day.

Image by Giovanni Manfredi and Sebastiano Meneghin

 

Data sources 

They combine 4 different data sources to get

Historical data is used to train their predictive model. Live data is used to generate live features and generate predictions every day.

 

Stack

They have used a 100% serverless stack, with

  • Modal as the compute engine, where the feature processing, model training happens.

  • Hopsworks as the Feature Store and Model Registry

  • Hugging Face Spaces to host a public Gradio UI, where you can input

 

Github repo

Wanna see their full-source code implementation?
→ Click here to see the full source code implementation

 

2. Wave Height Predictor 🌊🏄‍♀️

Mischa Rauch has built a system that uses

  • historical wave height, period and direction, and

  • surf information

to predict wave heights at Huntington Beach, California.

Image by Mischa Rauch

 

Data sources

Mischa combine 2 different data sources to generate the training data he needed for the prediction

Tip 💁‍♀️

In real-world ML projects, transforming raw data into features, and computing the target metric to predict often involves a lot of data engineering work.

This is something you can see in this project.

 

Github repo

Wanna see their full-source code implementation?
→ Click here to see the full source code implementation

 

3. Twin Celebrity finder 🔎👸

Beatrice Insalata built this app to help you find your Twin Celebrity. She uses

  • A dataset of celebrity portrait images from HuggingFace datasets, and

  • A pre-trained Computer Vision model (RestNet-50)

to help you find your celebrity twin.

 

Tip 💁‍♀️

In real-world Computer Vision projects, you DON’T need to train any Neural Network model from scratch.

Instead, you

  • pick a pre-trained model, like RestNet-50, and

  • fine-tune it using a labeled dataset that is relevant for your problem, like this

Image by Beatrice Insalata

My Twin Celebrity is Tom Hiddleston. Which one is yours?

Github repo

 

Wanna see their full-source code implementation?
→ Click here to see the full source code implementation

 

Need more project ideas? 🔎 

In this website you will find other 30 projects developed by KTH Master Students under the supervision of Jim Dowling.

→ Click here to see all the projects 🔎

 
Image by Serverless ML

 

That is it for today.

Enough reading.

Now it is time to find a project idea and get your hands-dirty.

The Real World ML Newsletter

Every Saturday

For FREE

Join 20k+ ML engineers ↓