14 Sep 2021Blog

Your data catalog should take your data teams back to the future of metadata

Make your data catalog promote the future possibilities

We don’t sell cars by describing their assembly lines, nor do we sell computers by providing exact drawings of the motherboards. While we do care about what goes into the food we buy, we still rather read the recipe suggestions on the pack than study the list of ingredients in detail. However, in data management, and especially in metadata management, we think that endless documentation, diagram drawing, dictionaries, glossaries, taxonomies and various definitions are the main drivers of data awareness and literacy. We assume that the better we are at explaining our most intricate data details and processing paths, the more data literate and data-driven we will become.

Stop the Madness

It is time to stop this documentation madness. Metadata management, and essentially data literacy, should be much more about promoting the potential of data than about explaining and tracing the past. We data professionals spend too much time and effort on the past, thinking about capturing the history as accurately as possible, while ignoring the value of promoting future possibilities with data. Our data lineage drawings should outline a fascinating treasure map to a possible new data product instead of being another (boring) schematic of what systems and processes have manipulated that data. The data dictionary should flow like a Sherlock Holmes novel, where every new attribute provides an intriguing clue to unlock a new data insight. Our data catalogs should be community platforms for data detectives, sharing their findings and posting new clues, and solving the mysteries of data together.

I know the above statements will not go down easy with many data professionals. It challenges many conventions. To be clear, this is not an either/or question, but of course a much more nuanced topic. But rest assured, my claims are based on insights from years and years of speaking with data professionals in many different roles across many different organisations. I clearly remember one specific conversation with an analytic data consumer that made me understand this person’s, and all data worker’s, truly underlying needs, and those needs are not to have better documentation. Let me explain.

Data workers too want safety and inspiration

In my countless conversations about metadata management and data catalogs with various organisations and roles, the conversation always begins and ends with information about data in some shape or form being the solution. But once you ask ‘Why’ enough times, you start realising that documentation, in its broadest sense, is an assumed solution to a problem that is widely misrepresented. As data consumers, we don’t actually want to read more data documentation or in general, use our precious time to study where the data comes from and who has done what to it. We think we do because it has been the only solution approach to date. However, what we really and truly need, is two things:

  1. We need to feel safe in using the data i.e. trust it, and
  2. We want to know how to use it, even be inspired to use it in novel ways.

Sounds simple, right? As mentioned, cars and computers are not sold by how they were manufactured. Their sales pitches are promises of something. It’s all about the potential and the possibilities. Maybe it’s the ability to go somewhere conveniently without polluting the environment or maybe it is the ability to connect with a global audience through amazing digital networks. Whatever it may be, it’s rarely about the mechanics of the product itself. Have you ever seen anyone selling a TV by pitching the great user manual it comes with?

Practically, what does this mean for data management and especially metadata management, data intelligence, data literacy and data catalogs? Remember, the underlying need for data professionals is two-fold: They need to feel safe in working with the data and they need advice, ideas and inspiration in how to use it. Traditionally we have solved this by approaches that are skewed to the left in a timeline: We travel back in time with our metadata Delorean and trace the history of the data up until where it is today, thinking that this is what a data worker needs. But this underlying need is better addressed by approaches that look to the right on the timeline: Data workers want to know what to do next, what not to do next, what paths are available, and which paths are already covered. Data workers want their metadata discovery solution to take them into the future and see the possibilities of data.

You don’t actually need a Delorean but you do need to re-think your communication efforts

Here are some simple steps you can take in your organisation to start solving for these fundamental needs:

  • Whether the effort is labeled data literacy, data awareness, data intelligence, metadata management or data cataloging, change the balance of your efforts from now being 90% looking into the past to something closer to 50/50 (past/future), or even more aggressively looking into the future. Promote stories of how the data is being used, what the data workers have learned, and what value they created.
  • Provide more explicit “recipes” to get data workers going faster and learning by doing. Nobody wants to spend time reading a manual full of technical documents, diagrams and specifications, and building everything from scratch every single time.
  • Promote the work of those building data products, models and other artifacts. This is not to disrespect the hard work of managing the data up until its point today (the left side on the timeline), but if you want data workers to be engaged and productive, the work happening on the future side of the timeline is much more interesting. As a side note, this is why I personally also truly believe in collaborative analytics and data science platforms as opposed to laptop-bound and/or purely code-based notebook approaches. But that is a different story.
  • Change the data management philosophy in your organisation to be more accountable also for what data workers do with the data. If that sounds scary, then just note that this also means you can take credit for the positive outcomes. Instead of just saying “here is the data and its source documentation, the rest is up to you”, say “here is the data and here are a few simple things you can or should do with it…”. Being more prescriptive does not necessarily mean more work for the data management team. It just means focus is shifted to things like promoting good practices and the data community’s efforts.

Bottom line, regardless if your data asset or product is a flat file in a lake, a nicely structured schema in a data warehouse, a visual BI report, or a rest API, you should think about promoting its use potential and opportunities at least as much as you think about describing its history, lineage, dictionary, sources, transformations, etc. Look to the future, not just the past. Data workers are just like the rest of us, they don’t want to read a manual full of historical information. They need to trust the data and be inspired to use it, and that is better achieved by focusing on where the data should flow next, not where it has come from.

  • We talk about data catalogs, data products, promoting data assets and many other data related topics in our podcast: Data by the Slice with my colleague Antti Loukiala and expert guests.