One Project To Learn MLOps
Aug 14, 2023data:image/s3,"s3://crabby-images/96a9d/96a9d13df234212ce94a2193168859de7635d379" alt=""
Let’s build a Machine Learning service to predict the Air Quality Index (AQI) in your city in the next 3 days, using a 100% serverless stack.
You will learn a lot, AND you will build something useful for society. Win-win
These are steps to build this ↓
data:image/s3,"s3://crabby-images/c87e0/c87e0bddaf11dda121ebb12693d4aec55da245e1" alt=""
Step 1 – Feature generation script
1 → fetches raw weather and pollutant data from an external API like https://aqicn.org
2 → computes features from this raw data (aka model inputs), and targets (aka model outputs)
3 → stores these features in the *Feature Store*
data:image/s3,"s3://crabby-images/fc3c2/fc3c24fbd42bb644bdaa9f78acd3961cb9cacc99" alt=""
Step 2 – Backfill historical (features, targets)
To train a Machine Learning model later, you need enough historical data (features, targets) in your Feature Store. Run the feature script for a range of past dates, to get enough training data.
data:image/s3,"s3://crabby-images/9841d/9841d2ec67b0ce2a8cc31bef673a82760587ddad" alt=""
Step 3 – Model training script
1 → fetches historical (features, targets) from the Feature Store.
2 → trains and evaluates the best ML model possible for this data, e.g. XGBoostRegressor.
3 → stores the trained model in the Model Registry.
data:image/s3,"s3://crabby-images/7c61b/7c61b3f289b24d49b40156f7b25df4a15d7e6d5b" alt=""
Step 4 – Automate execution of the feature script
Create a GitHub action to automatically run the feature script (from step 1) every hour.
GitHub actions are serverless computing power to run your code on a schedule. For free. Beautiful.
data:image/s3,"s3://crabby-images/8ca49/8ca49d8ea54bea6d4b0b9cf7b3989742ab9e308f" alt=""
Step 6: Create a web app to show model predictions
Streamlit is a powerful Python library to develop and deploy web data apps.
Your app
1 → loads the model and features from the *Feature Store*
2 → computes model predictions and shows them on a beautiful UI.
BOOM!
data:image/s3,"s3://crabby-images/de66a/de66a938d71dde1c48ade79d14a3498e6a7d6035" alt=""
Bonus
You can create another GitHub action to automate the model training script.
Why re-train the model?
Because ML model performance decreases over time. The best way to mitigate this is to regularly re-train the model, like once a week.