9 november 2017Blogg

Competent data scientists using ready-made machine learning solutions?

Some 350 European data professionals attended the Nordic Data Science and Machine Learning Summit. The event was held in Stockholm between 18 and 19 October with plenty of content, as some two dozen half-hour talks were given on two tracks. Here are some thoughts on a current issue that was raised: where can you find more skilled data scientists?

The recruitment of data scientists was a hot topic included in several talks. Right now, companies face great difficulties in trying to find good data scientists. This is understandable, as the list of requirements for the ideal data scientist is rather lengthy. This problem had clearly motivated the speakers to contemplate how this recruitment issue could be alleviated.

Right now, companies face great difficulties in trying to find good data scientists.

One suggested solution was the internal training of data scientists by companies. This is how my workplace has chosen to do it. Another suggestion under consideration is making machine learning and artificial intelligence more accessible and user-friendly with the right tools. Some of the larger IT companies have developed various ready-made analytics platforms with user-friendly web interfaces meant to bring the opportunities of machine learning to a wider audience than just data scientists.

Pre-made analytics solutions are not straightforward for data scientists

I have some experience of a ready-made closed analytics solution. This type of software allows the creation of some simple models rather quickly. The implemented machine learning models are easily connected and trained with the data with a simple graphical user-interface. In the right circumstances, the model parameters can be adjusted to potentially make it accurate and well-suited for new data. Certain visualisations can also be generated quickly, which is always important for understanding data.

Unfortunately, the attempt to build a user-friendly analytics solution in a single software package also usually poses some limitations for the user.

Unfortunately, the attempt to build a user-friendly analytics solution in a single software package also usually poses some limitations for the user. If the machine learning algorithms are not open source, the implementation of the model cannot be examined. Adjusting the model is then limited to the parameters that are offered to the user and often the graphical user interface will only scratch the surface of the model implementation. It may be extremely difficult to improve the pre-made machine learning model if it does not suit the problem at hand, potentially hurting the results. The everyday analytics problems are much more complex than textbook examples.

The everyday analytics problems are much more complex than textbook examples.

The problem posed by the black box issue of machine learning models is just as big. If data scientists cannot examine an algorithm’s source code, it may be difficult or even impossible to explain to others the workings of the machine learning models you have created. Communication is one of the most important duties of data scientists. The decisions made by an artificial intelligence may be hard to trust if no one can give a satisfactory explanation of its functioning.

Data scientists should be given free rein

Instead of closed package solutions, data scientists should be given free rein to solve issues. Open source machine learning libraries are the best and most popular tools for data scientists. Solutions based on them are possible to both explain and expand.

One talk at the Nordic Data Science and Machine Learning Summit also discussed the theme of open and closed source code from the perspective of analytics. Based on the other presentations and discussions, it can be concluded that solutions based on open source libraries are clearly in the lead.