3 ML systems you can build TODAY
Feb 12, 2024Let me share with you 3 complete ML systems, built by Master students at KTH Royal Institute of Technology in Stockholm, under the supervision of Jim Dowling.
This is what stands you out from the crowd 🙋🏽👨👩👧👦
These projects are not ML model prototypes inside notebooks, but end-2-end ML apps, that
ingest live data, and
output fresh predictions
that are useful to solve a particular problem.
This is what has business value, and gets you an ML job.
Design principles 📐🏗️
The 3 systems are built using the Feature-Training-Inference pipeline blueprint, where
-
The Feature pipeline transforms live raw data into ML model features, and saves them in the Feature Store.
-
The Training pipeline reads historical training data from the Feature Store, trains a good predictive model and pushes it to the model registry.
-
The Inference pipeline loads a batch of recent features, and generates fresh predictions using the ML model produced by the training pipeline.
This is a universal blueprint you can follow to build any ML system, both offline and real-time.
Without further ado, let’s get to to the projects!
Wanna get more real-world MLOps videos for FREE?
→ Subscribe to the Real-World ML Youtube channel ←
1. Flight Delay Predictor 🛬
In this project, Giovanni Manfredi and Sebastiano Meneghin have developed a batch-scoring system that uses
-
Weather data, and
-
Historical flight delay information
to predict flight delays at Stockholm’s Arlanda airport for the upcoming day.
Data sources
They combine 4 different data sources to get
-
Weather data, both live and historical,
-
Flight data, both live and historical.
Historical data is used to train their predictive model. Live data is used to generate live features and generate predictions every day.
Stack
They have used a 100% serverless stack, with
-
Modal as the compute engine, where the feature processing, model training happens.
-
Hopsworks as the Feature Store and Model Registry
-
Hugging Face Spaces to host a public Gradio UI, where you can input
Github repo
Wanna see their full-source code implementation?
→ Click here to see the full source code implementation
2. Wave Height Predictor 🌊🏄♀️
Mischa Rauch has built a system that uses
-
historical wave height, period and direction, and
-
surf information
to predict wave heights at Huntington Beach, California.
Data sources
Mischa combine 2 different data sources to generate the training data he needed for the prediction
Tip 💁♀️
In real-world ML projects, transforming raw data into features, and computing the target metric to predict often involves a lot of data engineering work.
This is something you can see in this project.
Github repo
Wanna see their full-source code implementation?
→ Click here to see the full source code implementation
3. Twin Celebrity finder 🔎👸
Beatrice Insalata built this app to help you find your Twin Celebrity. She uses
-
A dataset of celebrity portrait images from HuggingFace datasets, and
-
A pre-trained Computer Vision model (RestNet-50)
to help you find your celebrity twin.
Tip 💁♀️
In real-world Computer Vision projects, you DON’T need to train any Neural Network model from scratch.
Instead, you
pick a pre-trained model, like RestNet-50, and
fine-tune it using a labeled dataset that is relevant for your problem, like this
My Twin Celebrity is Tom Hiddleston. Which one is yours?
Github repo
Wanna see their full-source code implementation?
→ Click here to see the full source code implementation
Need more project ideas? 🔎
In this website you will find other 30 projects developed by KTH Master Students under the supervision of Jim Dowling.
→ Click here to see all the projects 🔎
That is it for today.
Enough reading.
Now it is time to find a project idea and get your hands-dirty.