Automatic Model Re-training
Aug 11, 2023No matter how good your predictive Machine Learning model is today, it will eventually expire.
Why?
Because a predictive ML model is essentially a mapping between
-
a set of features (aka inputs) → what you know at the time of the prediction, and
-
a target (aka output) → what you want to predict
And the thing is, the relationship (aka correlation) between the features and the target can change a lot over time.
If you do not re-train your models, they degrade over time and eventually become obsolete, as they no longer capture the current relationship between features and targets.
Models that go obsolete harm the business as they generate wrong predictions, that lead to sub-optimal business decisions.
Model re-training is especially important in domains like fraud or recommender systems, where human behavior changes quickly.
The question is
How can you automatically retrain your models?
To add automatic model re-training into your ML system you need to follow these 3 steps
Step 1. Monitor the model error
The easiest way is to build a monitoring dashboard, where you plot the error of your model (e.g. Mean Absolute Error) at an hourly/daily frequency.
By looking at this chart you can understand the frequency at which it makes sense to re-train your model.
Here is an example 👉 click to see the monitoring dashboard of the Real-World ML Tutorial
🚨 Attention
As seen below, monitoring is necessary to detect other issues, like model downtime.
Step 2. Trigger the training script
Once you understand the frequency at which you need to refresh the model, you need to automate the training.
For that, you can create a GitHub action in your repository, that triggers your training script at the frequency that makes sense for your problem.
I also recommend adding the option to trigger re-training on demand, in case you need to urgently replace the model in production.
For example, this is what the re-training pipeline for the Real-World ML Tutorial looks like.
Step 3. Validate the model
Before pushing your re-trained model to production, you need to make sure it meets the minimum performance threshold.
For example, if your model test error is test_mae
and the maximum error you want to allow is MAX_MAE
, you only push the model to production when test_mae < MAX_MAE
Here is an example from the RWML Tutorial 👇
Wanna learn hands-on how to implement model re-training? 👩💻👨🏽💻
The only way to learn ML is to build ML.
And in the Real-World ML Tutorial, you will learn step-by-step how to implement automatic model re-training.
If you want to implement your first automatic re-training system with my help, enroll in the Real-World ML Tutorial + Community.