7 Technologies Every Machine Learning Engineer Should Know
Updated: Aug 2, 2020
Nowadays machine learning engineers are some of the most important roles for companies looking to leverage data and automated statistical tooling to derive meaningful business insights. It is therefore no surprise that the number of machine learning engineer job openings grew 344% between 2015 and 2018.
In this article, we are going to discuss the 7 technologies and tools that every machine learning engineer should know to succeed in their job. Being familiar with these tools is not only important for machine learning careers but also for machine learning interviews, where companies often test your knowledge of these technologies. Let's get started!
While it's often difficult to recommend a single programming language as the de-facto one to learn, in recent years, Python has unequivocally become the go-to language for the kinds of analyses that data scientists and machine learning engineers do.
First released in 1991, Python has evolved into a powerful and easy-to-use scripting language that can be used for rapid data prototyping and machine learning modeling. In fact, many of the most powerful commonly-used machine learning and data analysis libraries (Pandas, Tensorflow, PyTorch, Scikit-learn, etc.) are either written in Python or support Python natively.
Though Unix has existed since the 1970s, it still the primary workhorse of virtual machines and servers. As a machine learning engineer, it is highly likely that you will interact with a Unix-based system at some point during your job.
Unix shells are the primary entrypoints for interfacing with Unix machines, and hence it is extremely important to familiarize yourself with shell commands/tools such as ls, cd, cat, ssh, grep, curl, and wget. These will allow you to navigate underlying server filesystems, modify and update files, and download data through web requests.
Any engineering job today will require that you work with either codebases that you inherit or codebases that you start from scratch. Version control systems are arguably the most important tool used in codebases. They allow you to keep track of modifications to source files which enable teams to efficiently collaborate on and scale their projects.
Git is an open-source distributed version control system that forms the core of platforms like Github and has largely become the predominant tool for version control out there today. Knowing Git will make you a more productive machine learning engineer, so it is well-worth the investment to learn it.
Scikit-learn is a general-purpose library for building machine learning models from data. It has been around since 2007, and to this day is still a robust go-to solution for prototyping models using the data you have.
It provides extensive support for model training and evaluation and has implementations for just about every statistical algorithm you can think of. In addition, it is an incredibly well-architected library that makes it super easy to experiment with new models in just a few lines of Python.
A Deep Learning Library
Today there are many deep learning libraries used for building neural networks, with the two most popular ones being Tensorflow and PyTorch. Rather than suggest you learn a specific deep learning library, instead we will recommend that you learn a deep learning library.
At the end of the day, it doesn't really matter which one you are comfortable with as any deep learning library will provide you the tooling you need to build flexible neural networks. However do spend the time to learn one library well, as neural networks have really become the name-of-the-game in machine learning today. Neural networks power state-of-the-art systems in all domains from computer vision to machine translation to recommendation systems, and as a machine learning engineer you will inevitably find yourself building a neural network.
A Plotting Library
Graphical plotting is a very important way to derive insights from your data. Within the Python ecosystem, there are several commonly used graphical libraries including matplotlib, seaborn, plotly, and bokeh.
Again, here as in the case of deep learning libraries, it's not that important which one you learn as long as you learn one of them well. Graphically interacting with your data will allow you to understand your data characteristics, analyze the behavior of models, and also communicate your findings.
A Cloud Computing Framework
Today there are three major cloud computing providers: Amazon Web Services (AWS), Microsoft Azure (Azure), and Google Cloud Platform (GCP). Unless you are running your products on on-premise servers, virtually every business needs the cloud for their machine learning and data science applications.
Cloud computing provides easy and cost-effective access to servers for training and hosting machine learning models, data silos for storing data, and resources for scaling up services to web-scale traffic. As a machine learning engineer, you should absolutely get familiar with some cloud provider's offering because the cloud has become the primary means of rolling out new software to the world.
And with that, we are done with our odyssey through the key technologies needed for machine learning today.
While it can certainly feel like there is a lot of knowledge required to be a machine learning engineer, at Confetti AI we are invested in helping individuals learn the skills to succeed in these roles.
If you're interested in pursuing a career in machine learning or data science, get in touch!