03.02.2021Blog

Distributed data, decentralised governance and organic accountability – how to make a data mesh work

How to make a data mesh work

As formalised in Conway’s law, an organisation’s structure and its software architecture are very closely related. Given that this law applies to people and organisational structures, as well as software and technology, it would be foolish to ignore this law when designing data and information flows and structures.

Have you felt the pain of setting up an enterprise-wide data management program only to realise that old organisational patterns are hard to overcome and/or change? Maybe you have experienced the construction of a new all-encompassing data platform for the whole organisation, only to witness the data platform team being buried under conflicting demands from various functions, trying to desperately deliver business value while also building new organisation-wide capabilities? Are you working on a “business-driven data strategy” or is it, in fact, a data management and governance strategy trying to push business functions to work in new, common ways? In this post, we will explain what causes these challenges and this friction, how to be less of a “control freak”, and embrace data, information and insight flowing freely where it is meant to flow.

Melvin Conway’s “How committees invent” [1] is a fantastic read. Published already in 1968, the world has come a long way since, but the insights of the paper are more valid than ever. Conway wanted to publish the article in Harvard Business Review, but it was rejected as there was no proof that his statement holds. He ended having it published in a computer magazine Datamation. Ever since the publication, the ideas presented in the paper have been under heavy study and proved to hold. Ideas Conway presented are not tied to software development, they are more abstract. The first sentence of the conclusion is now known as Conway’s law. The law states:

“Any organisation that designs a system (defined broadly) will produce a design whose structure is a copy of the organisation’s communication structure”

In practice, this means that if we have organised our development teams into three parts, we will get three-layer architecture. Or if we have three teams working on a solution, we will get three different solutions. The law is pretty simple in the end.

Friction is caused by misaligned system design

Organisations and companies can be seen as systems, and, as all larger systems, they are made out of smaller connected subsystems. These subsystems in organisational context are teams, projects or departments, in other words, groups of people focusing on specific business functions. Conway’s law also implies that organisational structure is the first act of system design. As we have divided into subsystems, each one will contribute to the overall system design. Today, organisations are as much digital as they are physical, and this can be seen as Conway’s law embodied as each department has its own digital domain made out of applications specifically built for their needs. These applications also host and persist the digital representation of the business functions context, in the form of data.

To put in more plainly; Data consists of persisted events produced by sensors, people or software used in specific business functions. These business functions form an interconnected system called an organisation. Therefore data too is bound by Conway’s law. There is no escaping this fact.

As data is distributed to different subsystems, our data management systems need to align to this as well. This means that data needs to be managed in the subsystem and in the business function where it is created, persisted and understood. Then, in order to enable inter-organisational communication through information, subsystems need to expose their information through interfaces. These communication paths and interfaces do not necessarily need to be digital, but they must exist.

A company’s organisational structure mirrors what kind of product or service the company produces. If we organise people, technology and data in misaligned ways, tension is bound to happen. Hence, any (new) data management or governance program will also bump against these existing structures and this tension is hard to overcome by just introducing new ways of working and/or a new, shiny data platform.

Three different ways to design the organisation

Systems can be represented by networks and an organisation can be seen as a network of departments or business functions. Networks can roughly be divided into three types: centralised, decentralised and distributed.

The centralised network is simple. It has a single central point of sharing information between its subsystems. This kind of network is usually a small company or an organisation with a strong single domain. In a small company, this is typically the most optimal way of working. However, as the company grows, this model starts to form a single point of failure and bottlenecks appear. The decentralised network has multiple information sharing nodes, that are connected to different subsystems. From an organisational perspective, this can be seen as each department forming its local centre that acts as a hub for the specific department. These hubs can then connect to exchange information. Finally, the distributed network, where each subsystem is a small autonomous cross-functional team. The team is responsible for their business function, but also has freedom of execution.

Datamesh blog Networks
Fig. Different types of networks

To align your data management to your organisation structure, you first need to identify your organisation communication structure. Different network structures point you in the right direction. Unless you are a small company or one with a strong single domain, centralised data management will not align with your organisational structure and you will experience friction in your data governance, management and platform programs.

Decentralised, domain-oriented data management

In larger organisations, with more complex communication structures, a centralised approach contradicts Conway’s law. Hence, we need a decentralised data management architecture. Luckily, we can learn from nearby technology domains. Decentralisation is nothing new in the field of software development, where we have years of experience and extensive research findings. Lately, Zhamak Dehghani from ThoughtWorks, has coined the concept of decentralised domain-oriented data management as Data Mesh [2]. For those of us, who have worked our professional careers within data management, BI and analytics, this new paradigm challenges us profoundly. It refers to our approaches as “monolithic” and some of the thinking is close to revolutionary. However, for those familiar with software design, this is nothing new or radical. It’s just how things have organically evolved and it has worked.

Data mesh builds on top of four principles that also help to explain its core idea.

  1. The first principle is based on a domain-oriented, decentralised data ownership and architecture, meaning that each subsystem or an organisational function needs to own their data. Central to this idea is a concept from a domain-driven design called “bounded context” [3]. A “domain” is essentially equivalent to an organisational function, it has its own unambiguous language, and is in the best position to manage its data.
  2. The second principle is to think of data as a product and to expose the domain’s information in a form that is usable to others.
  3. The third principle defines the data infrastructure as a platform that offers different capabilities in a self-service manner.
  4. The fourth and final principle focuses on federated computational governance, balancing the act of having just enough centralised control to ease the work, but keeping the decision making as local as possible. Governance relies on standards and ways of working more than e.g. controlling access and “hoarding” all data into one platform.

A domain-oriented, distributed data platform embraces and builds on the idea that an organisation is a connection of subsystems. The approach is to keep data in its own organisational context, where it belongs to and where it was created. A Data Mesh is built on the principle of selected, central infrastructure components that allow local teams to flourish in developing products and services in their own domain – and being accountable for their own data.

Data governance does not have to be about centralised control

We all know this is not just an infrastructure or technology problem, so how do we make the processes and governance models work too? Do not all the data governance models out there assume a centralised network? In fact, there are supporting concepts out there for data leaders and professionals. In 2008 Bob Seiner introduced his thinking around the Non-Invasive Data Governance framework [4] and it fits perfectly into the principle of Conway’s Law. Seiner’s point is for data governance to not be threatening and aggressive (=invasive), however, most (other) data governance approaches are exactly that.

Typically data governance frameworks are additional layers placed on top of existing organisational structures, hence they will be perceived as threatening and invasive, especially if they are introduced by some fairly new data management or governance function. So just like Data Mesh is about building the connectors between different teams and functions, Seiner’s Non-Invasive Data Governance framework is about appreciating that most of the governance work already happens by someone somewhere in the organisation, and the point is merely to make it formal, transparent and visible to all levels of the organisation, and build those relationships and connectors.

So what are these connectors and bridging enablers in large organisations? If you cannot or do not want to enforce common systems and practices on a distributed organisation, how do you still keep data and teams connected? As an example, many large organisations today look to data catalog platforms to be that connector, and we believe this is one good place to start.

Community-based data catalogues, ideally with crowdsourcing aspects, offer great incentives for the functions to catalogue and describe their data assets. In return, they also get access points to all other data assets across the organisation and learn about use cases, approaches and work patterns. Similarly, API management suites make it easier to connect across the organisation by exposing data products to other users without enforcing control on how the API itself is created. As always, we will not be successful in a distributed system by technology implementations only. The keys to success are to first understand the current system and networks. Then, we need to design and organise our data, teams, ways of working and technology with two main purposes:

  1. existing domains can keep on doing what they do best, and
  2. we develop central connectors and interfaces; be they technical, human or organisational; that allow finding, sharing and leveraging organisation wide data assets and associated value.

The friction in Conway’s Law is not because of individuals not wanting to learn or share with others – they usually do. The friction is organisational. We should build communities across the domains in the Mesh and offer opportunities to share and learn, but we need to be careful with trying to instil control, decision making or prioritisation into these communities. Let them grow organically, innovate and take your company to new heights.

Organise data like you organise your business and people

In the end, it is about your customer and your business. You design services and products to fulfill your customers’ needs, and it all begins with designing the organisation structure you believe can deliver to those needs. Data will allow you to uplift your company and it is an integral part of your organisation. Data has to be managed and accounted for in the organisation, in the subsystem where it is created, consumed, and turned into information and insights.

In his book ‘Skin in the Game’ (2018), Nassim Taleb accurately points out the need for business functions to get their skin in the data game, manage the data locally and take responsibility for it. The book is full of punchy quotes that data management professionals should study carefully:

“Decentralisation reduces large structural asymmetries”, and “Bureaucracy is a construction by which a person is conveniently separated from the consequences of his or her actions”.

Let’s not become or create data governance bureaucrats with no skin in the game. Data laying outside the organisation in which teams work and products are created, is out of the subsystem, out of place, and out of context.

To sum up and help you get started, these are our key points:

  • How you organise your data management, governance and technology must fit your organisation structure, not the other way around. The friction you experience in your data management program, the thing you label as “organisational resistance to change”, is because you are invading and disrupting the systems and networks in your organisation.
  • There are proven models of distributed systems in software development and the technology is available. When combining learnings from software development with key principles from data management, we can develop distributed data management models that work and create business value.
  • There are supporting frameworks for data governance as well as platforms that help establish the touchpoints between functions in the decentralised organisation.

Supported by human and team based connectors, we can combine distributed systems with centralised data and information sharing.

The biggest challenge for those of us looking at the world through traditional data management lenses, is to let go of the need to control everything and let the organisation carry on “doing its thing”. If the system worked in establishing the core business processes of the company to date, then why fight it? Instead, learn from it, adjust and adapt your data management strategy to fit the system. Less friction, more happy people and more value from data.

Do you want to learn more about distributed data systems? Check out our introduction to the data mesh video!

CTA: Introduction to data mesh video

About the authors:

Antti Loukiala is an all-around architect, engineer and developer, with deep knowledge of both microservice based architectures as well as monolithic data management architectures.
Lasse Girs is a data business designer fluent in business and tech language, and with diverse experience in making organisations data-driven, including initiatives like data literacy, data governance, and data cataloging.

References:

[1] M.E. Conway, “How Do Committees Invent?” Datamation, Vol. 14, No. 4, 1968, pp. 28-31.
[2] https://martinfowler.com/articles/data-mesh-principles.html
[3] https://martinfowler.com/bliki/BoundedContext.html
[4] See e.g. https://tdan.com/what-is-non-invasive-data-governance/7354 or Seiner’s book “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” from 2014.