Subscribe

The MLOps project that gets you the job

Oct 04, 2023

If you build one open-source MLOps project and share it with the world (which obviously includes potential employers), you will land an ML Engineering job. As simple as that.

 

Why?

Because most job applicants don’t do it. And employers know that.

Having 10 online course certificates on your LinkedIn profile is cool. But having one solid open-source project on your GitHub is what differentiates you, and gets you the job.

 

How to build a kick-ass ML/MLOps project?

To build a real-world ML project that gets you the job you need 3 things:

  1. real-world problem that is interesting for you.

  2. source of data, because there is no Machine Learning without data.

  3. A basic understanding of MLOps.

Let’s go one by one.

 

1. How to choose the problem 

The trick is to work on a real-world problem that genuinely interests you, so you don’t quit when things get tough.

Building a project is way harder than completing an online course. You will go through ups and downs. So, by working on a problem you are passionate about will make you stick and not quit when things get tough.

A few project ideas I would work on, for example, would be:

  1. Predict the outcome of NBA matches. I love basketball, played it for a long time, and have some domain expertise I could leverage here.

  2. Build a chatbot that writes stand-up comedy. I occasionally perform in stand-up clubs and admire the ability of humans to make other humans laugh. Moreover, NLP is an area that is booming and generating lots of job opportunities with the emergence of Large Language Models.

  3. Build a trading bot. I am deeply interested in real-time ML, and financial trading is the perfect playground for that: plenty of data, APIs, and also market interest.

I hope these examples inspire you. Now, go and think about yours.

What do you really want to build?

 

2. How to find the data 

Without data, there is no Machine Learning.

Luckily, there is still plenty of freely available data to build amazing ML apps. Here I am giving you 3 options.

 

Option 1. Use a dataset from Kaggle and simulate API calls 

Kaggle has plenty of historical datasets you can use to get started. Such datasets help you train an ML model, however, they do not provide access to recent, live data. To simulate live data you can sample historical data from your CSV/Parquet file.

💡 This is the technique we use in The Real-World ML Tutorial to generate live taxi rides from historical data.

 

Option 2. Build a web scrapper

 

For example, in case I wanted to predict NBA match outcomes, I could build a web scrapper to get all the historical AND live data from NBA matches from here.

Building a web scrapper in Python is a great exercise, that will sharpen your software engineering skills. So, if you haven’t done it before, I encourage you to do it.

 

Option 3. Find a public API

 

There are tons of public APIs you can use for many real-world problems, like the ones in this repository.

Using a battle-tested API is the ideal option to build a robust feature pipeline for your ML app.

 

3. How to learn the fundamentals of MLOps 

MLOps becomes way simpler when you realize every ML system is made of 3 pipelines:

  1. The Feature pipeline computes model features and saves them in the feature store.

  2. The Training pipeline fetches model features and targets from the store, trains the ML models and stores it in the model registry.

  3. The Inference pipeline makes the model predictions available to downstream services or human operators, either offline (aka batch-scoring system), or real-time (aka REST API).

This design is called the 3-pipeline architecture, and it applies to EVERY ML system.

 

Now it is YOUR time 🤟

Find a problem you are interested in, collect relevant data and use the 3-pipeline design to build and publish a working ML app.

This is what stands you out from the crowd.

Building your first project is hard, so if you need help consider joining 200+ students in The Real-World ML Tutorial + Community.

It is a self-paced, 3 hours course where you will learn step-by-step how to build a Machine Learning system that predicts taxi demand in NYC, hour by hour.

At the end of the tutorial you will have all the tools you need to build your OWN project.

And land the job.

The Real World ML Newsletter

Every Saturday

For FREE

Join 19k+ ML engineers ↓