Data catalogs are dead! Long live data catalogs!

Juha-Pekka Joutsenlahti Data Advisor, Solita

Published 30 Apr 2024

Reading time 4 min

After seeing all the great presentations and having plenty of insightful discussions at the Data Innovation Summit 2024, it became quite obvious: Data cataloging, as we currently know it, is dying. And there are several reasons for that.

From the architecture perspective, we are gonna see a more clear distinction between technical data catalogs and discovery tools that are meant for more business-oriented people. The main reason for this is that the current blueprint of data catalogs is not sufficiently serving either business or data and dev people.

Developers want automation and technical data contracts that enable CI/CD pipelines for deploying new data products to data catalogs. Current data catalog solutions have been building improved API capabilities, but surprisingly many of these tools have issues fluently supporting publishing content in a standardised format through API.

And then there is another, completely different, group of users for data catalogs who are not interested in the technical details but just want to find the available data, understand the content, and evaluate (usually from the data quality perspective) if the data available is sufficient for the need. For these users, current data catalog solutions are way too complex to use and are filled with a lot of information not necessary for them, making it very difficult to find what they are looking for. This is why data product catalog and data marketplace type of solutions are now getting more attention. We need a very simple UI with a good user experience and clear and intuitive search functionality with only the necessary details of the quality and access point of that data product. This kind of practical UI is something that would be useful for both technical and business users. Also, whereas technical people have data contracts to agree on the data distribution on a technical level, there is also a need for data sharing agreements between business people to agree e.g. permitted usage of the data, data classifications, and regulatory/legal compliance.

Organisations that have been gathering experience from data catalogs are starting to rightfully question the need for certain “traditional” data catalog features. For example, do we really need to have a lineage in a data catalog? Many developers already have their way of seeing the lineage through other systems they are using and often the domain is also quite well aware of the lineage within its own context. This does not indicate that lineage would or should not be in a data catalog, but it is good to stop for a while and think about the actual role of a data catalog. What are the actual features needed? And what is the role of different systems in the ecosystem?

There is also another phenomenon emerging that will strongly affect how we do data cataloging: A more human-centric approach. After several data catalog implementation projects, it has become obvious that those initiatives that involve end users from the beginning are more likely to succeed both from a functional perspective as well as a user adoption perspective. So if you want to make the behaviour change a little bit easier (it will never be easy), do involve people in early phases to understand their needs and use cases that the catalog would help with. So instead of blindly listing down tons of features that data catalogs should do, we should stop for a while and think about the actual usage. Who are the users? What are the actual use cases that a data catalog should do to make people’s lives better?

In summary, we are seeing some early indicators of a disruptive change in data cataloging as a practice. The traditional data catalog will vanish and be replaced with a new kind of cataloging solution that provides services for both technical and non-technical people. We will start to see either data catalog vendors starting to adapt to this direction or be replaced with metadata lake solutions that allow organisations to independently build such services on top of the available metadata.

Also, the amount of current data catalogs in the market has rapidly increased and it is not a sustainable situation. So it is fair to estimate that those vendors who are not capable of adapting to this new phenomenon, will slowly fade away.

Many times the people responsible for implementing a data catalog rely heavily on what they think the catalog should do. But why couldn’t we ask the users what they actually need? This way we can create an impact that lasts. 

  1. Data
  2. Tech