The Machine Learning engineer of the future ๐ฎ
Aug 04, 2024Machine Learning is moving fast. Very fast.
Job requirements for ML positions have changed a lot in the last year, since ChatGPT showed to the world what Large Language Models are capable of, and companies started (often without a plan) to look for ML engineers that can integrate these models with their proprietary data (e.g. RAG) and generate business value.
So, as an ML engineer I constantly ask myself:
What are the skills I should be learning today to stay relevant in the Machine Learning world ๐ค ?
Let me share with you 3 things that I’ve done (and I am still doing) to bulletproof my ML career.
This is no BS advice.
This is what I am actually doing, because I think it makes a lot of sense.
And I think it will help you.
Let’s start ↓
Tip 1 → Build a personal project using LLMs ๐๏ธ
Large Language Models (LLMs) are like electricity, and what most companies need are professionals who can build machines (e.g. RAG systems) using this electricity.
They don’t need scientists to research deeper into the magic process of generating electricity. They need engineers to build machines using this electricity that can solve their problems.
What about ML researchers?
Sure, there are (and will always be) companies like OpenAI, Meta or DeepMind, looking for ML researchers that can improve the existing LLMs. However, this is not what most companies need. So I would not put my money (aka career) here.
Most companies out there need ML engineers who know can integrate
-
these LLMs with
-
their company proprietary data
to build products that positively improve business metrics. This is where the money is.
My advice ๐ก
Every ML job listing nowadays has the keyword RAG in it. But the problem is, that LLMs and RAG are so new, that almost no one has any professional experience to showcase.
So if you have no experience, you don’t even get the change to get to that experience.
To solve this chicken & egg problem I recommend you stop reading blog posts about RAG, and completing courses, and try to build your own project.
You build a project, you publish on Github and you add it in every job application.
If you need help, you can follow the course that Paul Iusztin, Alexandru Rฤzvanศ and myself published a few months ago.
The Hands-on LLM course
Full source code and video lectures are FREE
๐ Click here
It is straight to the point, and comes with a full source code implementation you can reuse for your problem.
Be brave, and you will be ahead of the pack!
Tip 2 → Learn Machine Learning System design ๐
The design principles behind ML systems, whether you use classic ML models or LLMs, are THE SAME.
Any ML system can be decomposed into at least 3 types of programs (aka pipelines)
→ Feature pipelines, that transforms raw data into ML model features (e.g. vector embeddings) that are saved in a Feature Store or Vector DB.
→ Training (or fine-tuning) pipelines, that read historical features from the Feature Store/Vector DB and generate a new model artifact, either by training from scratch or fine-tuning a base LLM. This model artifact is then pushed to the model registry.
→ Inference pipelines, that load the model from the registry, and the input from the client app (for example a vector of numerical features, or a text prompt), generate a prediction (or a generation) and return it to the client app.
This is a universal blueprint, that together with CI/CD workflows (aka MLOps) help you build any ML system faster.
My advice ๐ก
The only way to learn ML system design, is to design and build a system from scratch. This is precisely what we will do (again) in September, in the 2nd cohort of the Building a Real-Time ML System. Together.
Building a Real-Time ML System. Together ๐จ๐ป
25 hours of coding sessions, full source code implementation, and lifetime access to all future cohorts (next cohort starting on September 16th)
→ Click here to learn more
Tip 3 → Go beyond Python ๐๐
Python has been (and still is) the number 1 programming language in the Machine Learning world, because
-
it is easy to write and read, and
-
it has a rich ecosystem of highly-performing computing libraries written in lower languages like C, C++ or Rust, like Pytorch, Numpy or Polars.
However, Python has never been designed for efficiency. Which means Python is not the best language for ML model inference, especially for extremely large language models like the ones we see today.
Because of this, Python is a very good option for ML Researchers, that want to quickly build and experiment with different models. However, for ML engineers that want to deploy cost-efficient solutions, Python is not the best option.
What about Mojo?
Mojo is a new programming language, still in a very early stage. Its mission is incredibly exciting: blend the expressiveness of Python with the performance of C.
It has a growing community that will surely push it very far.
However, at the moment, it is still far to compete with Python for most real-world problems.
My advice ๐ก
I am personally very bullish on Rust.
Rust is a modern compiled language, that is already used to power popular Python libraries like Polars or tokenizers. Rust syntax is more involved than Python, but still way more accessible than C or C++.
I have been trying to learn Rust in my spare time for the last 3 months... but things haven't got well, because I haven't been disciplined enough.
I haven't committed 100%, and I want to change that.
Let’s Rust ๐ฆ
In November I will start a new live course in which we will learn how to use Rust to build ML software.
We will build small projects, and learn the Rust language as we build, together.
There will be lots of errors, debugging and questions... It will be painful. But I truly believe we will learns tons of stuff, that will make us all better ML engineers.
This course will be 100% free for all my students from my Real-World ML Tutorial or Building a Real-Time ML System. Together.
Now it’s your turn ๐ซต
Don’t be afraid to try new things.
Data Scientists who live inside Jupyter notebooks are a thing of the past.
You need to be brave, learn to design ML systems, build projects and go beyond Python.
These are investments that will pay off.
Talk to you next week,
Peace and Love
Pau