In the previous blog post, Veli-Matti Ojala considered how companies can enrich data from IoT devices with context-related data from other source systems. He also highlighted the risks of creating silos within a company data infrastructure. Breaking those silos and creating a “single version of the truth” among all data sources is traditionally done by data warehousing. The term data warehousing has gotten a bad reputation, however, and many people now prefer to talk about data hubs or data platforms instead (see my colleague Pasi Jalonen’s blog post). In this post, I use the two terms to refer to same thing.
1. Don’t build single-point solutions – build a data platform
When it comes to IoT, the question to ask is whether data hubs or platforms are still necessary – and what are they, anyway? They are centralised systems for collecting and storing structured and unstructured data from separate sources, in order to deliver consistent and timely data to users.
There might be cases where an organisation has already implemented or is planning to implement a data platform. Those who don’t yet have one might be tempted to connect their first device to the IoT by creating a single point solution for that IoT-enabled device. This is usually a single web or mobile application or basic analytics tool with simple backend that is useful only for that one device that is to be connected.
Experience shows, however, that companies hat take this unsystematic approach sooner or later run into problems with managing the infrastructure, and with coping with the varying requirements of different users.
Data platforms are often considered to be hard to implement, which is why many companies avoid creating one. At Solita, we have adopted methods for developing a data platform incrementally, and for modelling only data that is needed in a given situation. Robustness and flexibility can be achieved with a data vault-modelling framework, for example.
2. Seek the best possible technological combination of storages for different types of data
For companies that already have a data platform in place, or are starting to build one, it seems natural to store all the IoT data using the same storage technologies, i.e. relational database management systems. These systems are used for all the other data in the data platform. This approach might work for a while, but the quantity of data in the context of IoT can overwhelm the used database engine and cause problems with performance and suboptimal usage of computing and storage capacity. Of course, some database technologies separate computation and storage to tackle the problem, but as the amount of data grows and older data is no longer needed frequently, it should be moved to cheaper options that are available, for example long-term archiving systems.
Conventional database technologies do still have a role in the IoT world, though.
Calculated and aggregated values – that is, key performance indicators – should be stored together with the context related data for visualisation of solutions, and perhaps also for basic ad-hoc analytical querying. However, as knowledge of the analytical possibilities of IoT data increases, for instance in the context of data science or machine learning, data will need to be available also in the most detailed, rawest form, i.e. the form in which it was collected from the device in the first place. Data should remain accessible for more than just a few years. Heavy querying and retrieving large amounts of data disturbs normal usage of a data platform, and in turn affects the applications that are built on the platform.
3. When you input data, don’t use tools that used to work – use tools that work now
When inputting and processing data, what needs to be changed when using a data platform for IoT-related purposes? In conventional data warehousing environments, data is retrieved through “pulling”, rather than pushing, from the originating source systems using extract, transform and load (ETL/ELT) technologies that are usually run in a batch-oriented way at certain intervals.
IoT data usually comes in streams, and cannot be handled in a batch-oriented manner. Data volume and velocity of downloading emphasise the need to think differently. It is not possible for example, to carry out lookups for foreign keys – they need to be created on the go. Data loading needs to be fast, and to be scalable when necessary. In many cases, companies will also need to think about building multiple data pipelines for the same data: one pipeline for real-time alerts, one for heavy calculations, and one for the intake of raw data.
Even though IoT way well be just another source of data, it definitely shouldn’t treated as being just the same as other forms of data.
Unfortunately, many companies seem to make same mistakes around IoT. Iot offers us a great amount of opportunities, but it can also be frustrating to get started with. Keep in mind the following points and you will be fine:
- Don’t build single-point solutions – build a data platform
- Seek the best possible technological combination of storages for different types of data
- When you input data, don’t use tools that used to work – use tools that work now
I assure you that getting started with IoT data platforms will be more than worth it for your company’s productivity and profitability.