Subscribe

How to structure your ML code

Feb 16, 2024

Because real-world ML projects do not fit in one Jupyter notebook

Jupyter notebooks are a great tool for fast iteration and experimentation during your ML development.

However, they are not enough when you go beyond this experimentation phase, and want to build a real-world end-2-end ML app.

 

The problem

ML apps, like any other piece of software, can only generate business value once they are deployed and used in a production environment.

And the thing is, deploying all-in-one messy Jupyter notebooks from your local machine to a production environment is neither easy, nor recommended from an MLOps perspective.

Often a DevOps or MLOps senior colleague needs to re-write your all-in-on messy notebook, which adds excessive friction and frustration for you and the guy helping you.

So the question is

Is there a better way to develop and package your ML code, so you ship faster and better?

Yes, there is.

Let me show you.

 

Solution

Let me show you 3 tips to structure your ML project code with the help of Python Poetry.

 

What is Python Poetry? ✍️

Python Poetry is an open-source tool that helps you declare, manage and install dependencies of Python projects, ensuring you have the right stack everywhere.

You can install it for free in your system with a one-liner.

 

Tip 1 → Poetry new 🏗️

Imagine you want to build an ML app that predicts earth quakes.

Go to the command line and type

$ poetry new earth-quake-predictor

With this command Poetry generates the following project structure.

earth-quake-predictor
├── README.md
├── earth_quake_predictor
│   └── __init__.py
├── pyproject.toml
└── tests
    └── __init__.py

You can now cd into this newly created folder

$ cd earth-quake-predictor

and generate the virtual environment

$ poetry install

where all your project dependencies and code will be installed.

I recommend you build modular code, for different parts of your system, including:

  • data processing and feature engineering.

  • model training

  • model serving

like this ↓

earth-quake-predictor
├── README.md
├── earth_quake_predictor
│   ├── __init__.py
│   ├── data_processing.py
│   ├── plotting.py
│   ├── predict.py
│   └── train.py
├── pyproject.toml
└── tests
    └── __init__.py

 

Tip 2 → Doing notebooks the right way 📔

If you are into notebooks, and want to use them while developing your training script, I recommend you create a separate folder to store them

earth-quake-predictor
├── README.md
├── earth_quake_predictor
│   ├── __init__.py
│   ├── data_processing.py
│   ├── plotting.py
│   ├── predict.py
│   └── train.py
├── notebooks
│   └── model_prototyping.ipynb
├── pyproject.toml
└── tests
    └── __init__.py

Now, instead of developing spaghetti code inside an all-in-one Jupyter notebook, I suggest you follow these 3 steps

  • Write modular functions inside a regular .py file, for example a function that plots your data

    # File -> earth_quake_predictor/plotting.py
    
    def my_plotting_function():
      # your code goes here
      # ....
  • Add this cell at the top of your Jupyter notebook to force the Jupyter kernel to autoreload your imports without having to restart the kernel

    %load_ext autoreload
    %autoreload 2
  • Import the function and call it, without having to re-write it.

    from earth_quake_predictor.plotting import my_plotting_function
    
    my_plotting_function()

 

Tip 3 → Dockerize your code 📦

To make sure your code will work in production as it works locally, you need to dockerize it.

For example, to dockerize your training script you need to add a Dockerfile

earth-quake-predictor
├── Dockerfile
├── README.md
├── earth_quake_predictor
│   ├── __init__.py
│   └── ...
├── notebooks
│   └── ...
├── pyproject.toml
└── tests
    └── __init__.py

The Dockerfile in this case looks as follows

 

Where each instruction is a layer, that builds on top of the previous layer.

 

From this Dockerfile you can create a Docker image

$ docker build -t earth-quake-model-training .

and run your model training inside a Docker container

$ docker run earth-quake-model-training

BOOM!

 

That’s it for today guys.

Talk to you next week.

Enjoy the weekend.

Peace, Love and Laugh.

Pau

Wanna learn more Real World ML?

Subscribe to my weekly newsletter

Every Saturday

For FREE

Join 22k+ ML engineers ↓