23 Nov 2021Blog

Data science is more than machine learning

We all know that a typical Data Scientist spends most of their time not building machine learning models. Instead, they do “boring stuff” such as cleaning and manipulating data, endless explorative analysis, and a constant search for new and better data. Still, the premise for this description is that in the end, the goal for a Data Scientist is to build a machine learning model. Some will even use the number of ML models put into production as a success metric for data science projects. I often meet Junior Data Scientists who are disappointed at the small amount of machine learning they do in their daily work.

This definition of data science is harmful. It encourages Data Scientists to employ complex models where there might be easier solutions. Just because the problem is complex does not mean the solution has to be. Many times a descriptive dashboard, some smart aggregations or just an informative plot might be more useful for the organisation than a machine learning model. Furthermore, the problem at hand might not even be a machine learning problem but rather an engineering or optimisation problem.

As a Data Science Consultant, I see this all the time. Data Scientists are hired to work with complex problems and are expected to solve these with AI and all that cool stuff. Many managers think that data science is equal to machine learning. However, the most successful data scientists that I have met tend to focus on one thing only: how can we bring value to an organisation using data?

Rather than thinking of how they can build a machine learning model they think of the actual problem at hand and weigh a number of solutions against each other. They recognise that data science is a process in which they can employ a number of different methods to solve problems.

If you frame data science like this, it is not important if data products developed by Data Scientists are simple aggregations or deep learning models. If anything, simple solutions should be in favor of complex ones. Because maintaining data science is hard, even if you have a state of the artfully scalable data science platform.

Does this mean that Data Scientists should let go of their programming languages and instead write SQL and build business intelligence dashboards? Quite the opposite. Programming provides a rich toolbox for solving complex tasks that is unmatched by such tools. Furthermore, programming provides Data Scientists with a language to express themselves in a way that is essential for solving data problems. It is the perfect way to approach a problem with an open mind. Every solution to a data science problem is likely to begin in a notebook. But we need to be open about how the end product best will provide value to the organisation.

Because machine learning does not have an inherent value. Value is created by the organisation that can use the products developed by Data Scientists, regardless of whether they are machine learning models or not.