Welcome dear reader. In this post we will learn what AutoML is, who it is for, how you can use it, engage in a little bit of debate, and list the main platforms where you can find it. Ready? Lets get to it then!
What is AutoML?
Before explaining what AutoML is, lets see what a traditional end-to-end Machine Learning pipeline looks like:
As we see, this pipeline has many steps and requieres a deep understanding of the Machine Learning and Data Science world in order to be able to run through it correctly. It requieres analytical capabilities for exploring and preparing the data and building the models, and also software engineering capacities for deploying the model and putting int on a productive environment.
Not very many people have either of these, so the question is:
How can we use the awesome power of Machine Learning without having to be an expert on the field?
The answer to this question is AutoML.
AutoML is short for Automated Machine Learning a series of Machine Learning solutions and softwares that allow non-machine learning experts to implement and build their own models.
It provides intuitive methods and processes that take a data set and allow you to play around with it, explore it, and build machine learning models out of it, all without having to throw a single line of code!
Aside from being used by non Data Scientist, AutoML can also be used by individuals that already have knowledge of the field of Machine Learning to speed up their processes and make them more efficient.
AutoML platforms usually have the following characteristics:
- They have a nice, intuitive interface, that is controlled by clicks or drag and drops.
- They usually are no code platforms, although some of them allow developers to insert code snippets to enhance their workflows.
- AutoML pre-processes and cleans the data automatically.
- It selects and sometimes constructs the features that will go into the Machine Learning models.
- It tries a wide range of model families with their hyper-parameters to find the optimal model and parameter combination for an specific task.
- They have a clean and self-explanatory interface for analysing the results of the models.
- They assist in model deployment
The normal functionalities of an AutoML platform are the shown in the following diagram:
As you can see this image reflects the main strength of AutoML: fully automating an end-to-end Machine Learning process. You put the data in, select the target or independent variable you want, click a button, wait and get the results.
It is worth highlighting that an AutoML platform not only provides model or algorithm selection and hyper-parameters optimisation, returning only a predictive model. It goes further than this, allowing us to do feature selection, raw data analysis, and gives us an interface to make data-driven decisions.
As wonderful as they might seem, AutoML platforms do have their drawbacks. Most of them only work on tabular (data presented in a table structure) so any kind of Natural Language Processing (NLP) or Computer Vision project can not be performed in one of these platforms.
Also, depending on the platforms, developers have more or less customisation capabilities. The least advanced platforms can be restrictive on what you can do, stopping individuals with enough skills to perform complex pipelines from doing so.
Who is AutoML for?
The beautiful thing about AutoML is that it can be used by a wide range of end-users: from people with little or null Machine Learning expertise to experts.
Citizen Data Scientists, business managers that are confortable with technology but have no ML expertise, and analyst can all use AutoML platforms to exploit the benefits of Machine Learning and Big Data.
This is because AutoML requieres no coding or deep ML knowledge, and its main goal is increasing the widespread adoption of AI and bringing Machine Learning to the masses.
Machine Learning experts can also use AutoML to quickly implement and study models, however our feeling is that these kind of platforms, while avoiding some tedious scripting, don’t allow you to go very deep inside the guts of the algorithms, and many times have restrictions that for very specific or complex models might constraint what you can do versus if your were coding.
However, they do make finding optimal solutions easier by quickly trying a wide range of pipelines that would maybe not even occur to the data scientists, and they usually have a really neat and simple way of deploying models to production.
Overall, AutoML is targeted towards users with little knowledge of Machine Learning, as itemoves the need to be an expert in the field in order to be able to use this technology.
Lets be honest, ML and AI seem like magic to people and tools like this will have a huge role in making them more human like.
Will AutoML replace Data Scientists?
One question that we could ask ourself is:
Does AutoML do a better job than human experts?
The harsh answer to this is bluntly no. AutoML is not here to replace individuals that are experts in Machine Learning but rather to assist them and make their work faster, while allowing non-experts to play around, build applications and extract insights using this amazing techology.
However, while AutoML probably still performs worse than human experts when building a model and getting results, the human experts probably spend a lot more time building and creating the models manually, so human experts in combination with this technology could be empowered, boosting the productivity and reducing the time to market of Machine Learning products.
Awesome, lets see some examples AutoML platforms and finish off with where we think this is heading and some further resources.
Automated Machine Learning platforms
Some of the most popular AutoML platforms on the market (both free and paid) are:
- Auto Sklearn: The AutoML platform of the famous and widely used Python library for Machine Learning Scikit-Learn.
- Google AutoML: The Google AutoML tool of Google Cloud Platform, to train and build ML models with minimal effort and expertise.
- H20 AutoML: The H20 Cloud platform is one of the best known AutoML platforms for institutions.
- Big ML: BigML is another widely known company that implements AutoML. Follow them on Linkedin, they are very active and produce a lot of great content.
- JADBIO: Another great AutoML platform that provides leading-edge AI tools & automation capabilities enabling life-scientists to build & deploy accurate & interpretable predictive models, with no coding
There are many more both, paid and open source, so feel free to do your own research find the one that suits you best. Lets end with where we think the future of AutoML is going.
The future of AutoML
AutoML will not replace, but boost productivity and make the work of data scientists much more interesting and challenging. Also, domain knowledge of a specific sector will become more important: knowing what data to use and when, and coming up with automated AI solutions.
Complementary scripting will be come less important and experts will have to be more specialised. Manual scripting is becoming more and more automated, as we have seen with this week news on Github Copilot.
However, what you will always need is people that are qualified enough to choose the right data, do the right analysis and make the right decisions.
Finally, today only 5% aprox of data is analysed, and AutoML will allow us to extract insights from even more data than what we are doing now. In the following years we expect all the big clouds like Google, Amazon and Azure to implement AutoML systems.
Summary and further resources
That is it, we hope that you have learnt what AutoML is all about, and that we have lit a small spark in your mind so that you will research this technology by yourself. Some further resources are:
- arxiv.org paper: Taking the Human Out of Learning applications: A Survey on Automated Machine Learning.
- The Hundred Page Machine Learning Book is the perfect companion for an AutoML platform – short, concise, with little code, it is perfect to have by your side when using one of these tools.
- Other books like Python Machine Learning will take you a step further and use these platforms to the maximum of their capabilities.
Lastly, this awesome Statquest video on the topic