Let's build your first real-world ML app
Dec 30, 2023Companies hiring ML engineers do not look for ML course completers. They look for problem solvers, who can solve real-world business problems using data, a bit of science and lots of engineering.
And the thing is, the only way to become good at problem solving is by… (drums 🥁🥁🥁) … solving problems!
👉🏽 Subscribe to the Real-World ML Youtube channel for more free lectures like this
What is preventing you from building your first ML app?
-
“I don’t have data” (lack of data)
-
“I don’t have any project idea” (lack of inspiration)
-
“It costs money” (tight budget)
-
“I don’t know how to do it” (lack of knowledge)
Whatever is blocking you from building an ML product, I have a solution for you 🤗 ↓
Excuse 1 → “I don’t have data”
-
Search in Kaggle datasets, one of the largest repositories of public datasets in the World.
-
Find a public API in this TOP repository.
-
Find the website you are interested (for example NBA stats page) and build your own scrapper. True, it is time-consuming, but you will learn tons of Python in return.
Excuse 2 → “I don’t have a project idea”
I bet you will find something that will inspire you.
Excuse 3 → “It costs money”
To run an ML app you need 3 types of services:
-
Computing services, to run your model training jobs, and your model inference in a real-time ML system.
-
Solution: GitHub actions are free VMs you can use to run such jobs.
And Streamlit offers free computing to run your inference. Boom.
-
-
Storage service, to store and serve data (aka features), model artifacts (e.g. pickle files with serialized models), and metadata (e.g. offline validation metrics of your latest model).
-
Solution: Hopsworks is a managed feature store with up to 25GB of free storage, which is more than enough to build a serious ML project. Click here to get your feature store for free. If you wanna use a full-featured experimentation tool + model registry you can use Weights&Biases too.
-
-
Orchestration, to coordinate the execution of the 3 pipelines of your system (feature pipeline, training pipeline, inference/deployment pipeline). I recommend you read this previous installment of The Real-World ML Newsletter to have the full context.
-
Solution: Again, GitHub actions are flexible enough to build a fully working ML app with the 3-pipeline design.
-
Excuse 4 → “I don’t know MLOps”
-
“tool-specific “, meaning it is hard to grasp the underlying structure and essence of an ML app, beyond specific tools or stacks.
-
“too large to understand”, like most engineering blogs published by Tech Giants like Uber, Binance, or Facebook.
The idea is simple 💡
Is there a universal way to design ML systems, that you can learn once and apply every time?
Yes!
Stop thinking in terms of tools, and start thinking of 3 pipelines:
→ Feature pipelines → transform raw data into model features
→ Training pipelines → produce models
→ Inference pipelines → serve models’ predictions.👉🏽 Subscribe to the Real-World ML Youtube channel for more free lectures like this
Now it is YOUR turn 👊
You have no excuses.
It is time to get your hands dirty.
Enjoy the journey.
Pau