6 Nov 2017Blog

Data DevOps – breathing new life into the living dead

Nowadays, many think that a data warehousing is slow, not that agile, or even dead. Many people also seem to think that it is enough to take some generic cloud services and just start running your analytics on them. Unfortunately, this idea is not possible.

When producing more and more operational and continuous analytics, the demand for coherent and well-modelled data is crucial. In addition, the data needs to be managed in an automated manner, for a system to achieve 24/7/365 operational mode. On top of this, the development team needs to produce business value continuously from the day one. In other words, the data warehouse and analytics development and deployment need Data DevOps to become alive. We have developed a cure for the living dead ones.

Programmers’ revolution in data management

The revolution of the programmers started over ten years ago with NoSQL databases and Hadoop data management frameworks. It disrupted the data management landscape for years. The revolutionary software made it to exist, but in the end, they needed to return to the relational algebra and say yes to SQL (SQL in Hadoop, No SQL turned into Not Only SQL etc.). This revolution turned the queries into hundreds of lines of one-off analysis programs, custom programs. Not speaking of transactional safety nor optimal performance.

The revolution of the programmers started over ten years ago with NoSQL databases and Hadoop data management frameworks. It disrupted the data management landscape for years.

The same result is with the cloud data lakes at worst. In addition, the cloud implementations bring the IT infrastructure management and skillset into data management projects. Now on top of all that, there still need to be a vision of the data management, data modelling, and data utilization: queries, reports, analytics, APIs. Therefore, the full stack of an information system is at danger to become custom-programmed and too varied from each other, producing person lock-ins at worst.

In the end, the problem is not in relational algebra and in SQL – quite the opposite – but in the labor-intensive management of the traditional databases, in DBA work in other words. Therefore, the revolution should have been NoDBA instead of NoSQL. The worst of the both worlds, DW/Analytics and Software Development, is the high maintenance overhead and highly custom-programmed system. To me this combination with notion of something new and evolved produces a picture of a living dead, a zombie.

New revolution: Software-Defined Operational Analytics

Traditionally, the data warehouse and analytics stack (figure 1.) consists of custom deployment frameworks and of separate tools and a software, without having a designed and automated approach, which would cut across the whole stack – making the process tool-by-tool-oriented.

For us to be able to concentrate in business value creation, we need to combine the best sides of the data warehousing and analytics together with software development plus with additional software-defined vision.

In greater context, the combination of the software development principles in data warehouse and analytics development and the software-defined vision enables a unique Data DevOps approach. This new approach works as a vertical crosscutting of the whole stack.

Figure 1. Traditional Data Warehouse Stack, tool-by-tool oriented thinking.

Automating and supporting the most common and important features of the different layers of the stack (figure 2.), which covers about 85% of the solution work, results in an optimized data warehouse and analytics stack for development, deployment and runtime management. This is what we call Software-Defined Operational Analytics. The software-defined concept is familiar from cloud architectures like software-defined storage (Hadoop Distributed File System, AWS S3 Object Storage) and software-defined networks.

Figure 2. Solita’s Agile Data Engine Stack is vertical approach, which enables Data DevOps.

By using the same concept, we are referring to the different automation areas (figure 3.) in the stack. Currently we are addressing this approach in Solita’s Agile Data Engine software, which has received good feedback from the current customers and from the data engineers. Now, automation is easily available for everyone!

Figure 3. The automation areas of operational analytics.

Is your data warehouse and analytics system a living dead? If you recognize that it is, please let us help you. We have an easy solution for you to make alive.

Read more about Agile Data Engine
Agile Data Engine in Twitter

Harri Kallio works at Solita’s Cloud and Analytics Business Development team. Currently he leads Solita’s Agile Data Engine software development. He has 12 years of experience in software product development in the area of Logistics Industrial Internet and Telecom Network Monitoring. Harri speaks SQL at breakfast and about performance architectures while sleeping; concepts while awake. He is proud father of two daughters.