Having a list of data domains, terms, KPIs, formulas for KPIs and other metrics, technical and business data lineage in front of your eyes at any moment might be needed. Just think about it – what is a benefit it would be in core work-related tasks!
Bring the company’s data to the table described. Make your company really data-driven.
Who can benefit from the data catalog in my company?
Let’s have a look at how many roles the company could have under which people are interested in data from either way. Very often all the hints of expressed interest are not considered as a sign of a change is needed. And could be dealt with in an ad-hoc way with an always similar approach – verbally explaining, whiteboarding, on paper, reverse engineering together or alone.
Role Name
- Chief Data Officer
- Risk, Fraud & Payment Management Officer
- Compliance Officer
- Chief Information Security Officer
- Financial Controller
- Product Manager
- Data Analyst
- Technical Analyst
- Product Owner
- Team/Technical Lead
- Backend Developer
- QA Engineer
- BI Engineer
- Data Engineer
- Data Scientist
- Data Architect
- Solution / Lead Architect
- Subject Incident Manager
- DevOps Engineer
The list is quite impressive, isn’t it? Most of the roles are present at any company either in the government, public, or private sectors. Even in startups half of the roles are present too. For startups, you might argue, but please, then take a second look inside as several roles could be united for the same person.
Signs indicating data catalog need
Hey, let’s puzzle out some signs that might be triggering here and there inside your company! Such signs could indicate that bringing a data catalog to the table would help.
Below some questions are extracted from real cases. Here and there on different meetings, kitchen talks, various Slack channels, ad-hoc questions from colleagues during the day, between different colleagues in the business or talking to dev representatives, even thinking:
- What is the formula that is used to calculate Metric-X / KPI Y?
- What are the downstream consumers for a corresponding Kafka-topic?
- If to split the name and last name in the data warehouse dimensional table for the client into 2 columns what are the dependencies in refactoring?
- For a particular Tableau (Power BI, name your tool) report what is the data source used?
- What are the columns from the data warehouse fact and dimension tables that are used in Report-X for a marketing team?
- By the way, where is a Customer’s Activity Data located? A Sales Team has requested a Customer Churn Rate to be calculated every 30 minutes.
- Why does this report for the sales team keep having so many bugs every month?
- What is the business value that report XYZ delivers? Hopefully it is fine if it is not accessible for 3 hours on Thursday.
And so on…The list could be continued with different variations of such questions. The main point here is to indicate that every time the questions are specific and aren’t related to each other. However, there is a common thing between all these questions: the purpose of each case is to obtain some knowledge about the questioned data – discover corresponding as-is picture so that to decide in the context and proceed further.
Why will the data catalog help me?
All right, how does the data catalog help here?
Registry of metadata for the company’s data
A data catalog collects metadata about data from various sources and exposes it to a company user as a registry of all data assets that exist in the company.
That is how all data schemas of microservices storages, of a data warehouse, data lake, business intelligence reports, excel files, schemas of message broker queues and topics are becoming available for observation to a company user.
Every discovered data schema could be described with the human language in the data catalog level already. This adds business meaning to the presented schema.
That is how there is no need to ask for what kind of data a company has from service-X, or in a data warehouse. Such a feature increases the productivity of employees – information found faster, quality of the obtained knowledge – as provided business meaning will be verified by a subject matter expert.
Technical data lineage
This feature discovers and presents how the data flows between data locations. Using the metadata available from the integrated source technical lineage could be either on an entity (table) level or on a column level.
If there is a message broker, e.g., Kafka message broker, used in the company – there might be a need to use an intermediate tool that will help to obtain metadata on topics and only then integrate it into a data catalog.
Such a feature helps a lot with reverse engineering on data using only one place for it. It is often laid out as a horizontally/vertically oriented tree flow diagram that could be drilled through clicking on data attributes of entities or just entities, to explore how the data flows from the originating source to its destination, considering transformation rules.
Internal meta model
Very often data catalogs come with an internal meta model. It provides terms and relations between them, so that imported metadata from a company’s ecosystem could be categorised over those terms. These terms could be data domain (or domain), glossary, business term, KPI, metric, business process, data entity, data attribute, report, dashboard, database, table, column, or event.
The internal meta-model is a powerful help in having a data catalog implemented in a clean way. It also helps a lot for business roles in a company to understand data better without going to its technical level in terms of databases and columns.
Collaborate on data right there!
The possibility of asking questions or raising comments about data is a very nice touch that catalogs bring to the table collaboration-wise. This could break some silos and people could get better answers when they don’t understand some data. In the end, it’s all about sharing knowledge and right answers about data.
All discussions are in one place and not split through different channels (coffee corner, Slack channels, etc.), thus increasing a level of awareness and certainty.
Making data catalog work for you
For Product Owners, Scrum Masters, and Delivery Engineers data catalog could provide more excellent value if it is integrated back into an engineering task management system as a data supplier.
A list of data domains, reports, and dashboards are assets that could be used to categorise tasks in backlog and prioritize them with business stakeholders. After all, provided per sprint operational analytics is enriched with mentioned assets.
Contact us if you want to know more and get your free Data Catalog guide.