14 Jan 2022Blog

Factories, words and ‘being data-driven’

'Charlie_Chaplin' by twm1340 is licensed under CC BY-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse&atype=rich
Photo: “Charlie Chaplin” by twm1340 is licensed under CC BY-SA 2.0

Everyone today wants to be ‘data-driven’, at least if you read organisations’ strategy papers, listen to presentations, and observe the public discussion around data and digitalisation. However, a gentle scratch on the surface of that aspiration will reveal that the expression ‘being data-driven’ is understood in vastly different ways. In this blog post I aim to identify underlying problems in the language we use in the data domain and emphasise one key point: You need to be clear and unambiguous in articulating your data strategy and data aspirations.

It’s almost four years since I moved over to ‘the dark side’ to become a consultant. For years I had already been defining, developing, and implementing various types of systems (technologies, processes, skills, etc.) to enable organisations to benefit from data. In my mind, this meant that I was working on making organisations data-driven.

As a new consultant, one of the first things I did was read into as many as possible data utilisation success stories. Primarily, this was to collect an inventory of great ideas and to understand the secrets of making these data ideas work, but partially it was also to understand how organisations were articulating their data-related aspirations.

This research led me to identify three types of meanings for ‘being data-driven’:

  1. Making informed decisions, typically enabling (human) decision-makers to use facts in conscious decision making.
  2. Monetising data, typically creating or defining direct or indirect revenue and/or value streams, that are somehow based on data.
  3. Being innovative, typically enabling new innovations and an ‘innovation culture’ or ‘innovation mindset’ by measuring experiments and innovations with ‘hard facts’ (as opposed to gut feel, sales pitches, and qualitative methods).

Type one, informed decision making, is to date still the most common meaning for ‘being data-driven’, often perceived as difficult to implement because of resistance to change and cultural reasons. A brilliant colleague of mine, Antti Rannisto, opened my eyes and mind to why this was so difficult for everyone in the data world. Antti, an ethnographer and sociologist, and I, a data enthusiast and former data analyst, then wrote a blog post describing this collision of our two worlds. In the post, we outlined our synthesis and recommendations for organisations aspiring to become data-driven by enabling their decision-makers to make informed decisions. If you haven’t yet, you should read that blog.

Data monetisation (type two), be it direct or indirect, is today not as topical as it was a few years ago. Partially, I believe this is because indirect monetisation (typically using data to save cost/drive revenue) is extremely difficult to assign a monetary value to. Direct monetisation (typically selling data or data products/services), on the other hand, is relevant only for a small subset of organisations out there (to date). While the revenues are massive, and these few companies can disrupt global economies and influence politics, there are still not that many googles and metas out there.

Being data-driven to foster an innovation culture or mindset (type three) does not exclude the two beforementioned types, it is just another way of expressing what ‘being data-driven’ means. It most likely involves informed decision-making (type one) and can also involve designing innovative data-based products or services (type two), but in my research I noticed a distinct group of organisations expressing their data aspirations around being innovative, taking calculated risks, and looking for new business opportunities, all in some way ‘driven by data’ to a much bigger extent than before.

Wait, there is more to ‘being data-driven’?

I have later realised that being data-driven is not limited to those three types alone. At Solita, we do a podcast called ‘Data by the Slice’ where we talk with leaders and thinkers across many fields. In a recent episode, my former boss and currently VP of Analytic Tools and Platforms at Pfizer, Debbie Reynolds, described being data-driven as something where everything is connected. Not just in a narrow IoT type of way, but more broadly as a system of connected applications, processes, people, and algorithms. This sounds quite different to ‘enabling human decision making’ or ‘monetising data’, right? These are not opposites and don’t necessarily exclude each other, but they are different verbalisations, nonetheless.

Many data management professionals will frown upon and prefer to avoid anything to do with microservices architectures, typical in software development intense organisations. However, it is inevitable that being data-driven also relates to how operational applications and software work, and how people and machines make micro-decisions in vast volumes. This blog is not about data mesh, distributed ecosystems, API architectures, or data products, so I won’t drill deeper into ‘the mesh’ now, but I hope the point is clear.

Being data-driven is also about everything being connected, thus allowing data, information, and algorithms to be exchanged within complex systems of people and machines.

Enabling humans to make informed decisions, as a meaning for ‘being data-driven’, places humans as the utilisers of data (to inform their decisions). Increasingly though, machines are becoming the main targets of the information supply, with decision-making becoming automated. Naturally, this is also very much in line with ‘being data-driven’; we use data to automate decisions and processes, thus saving time and resources, and possibly also improving the decision quality by removing the human bias from the equation (at least in theory).

‘Being data-driven’, or nowadays often ‘being AI-driven’, increasingly refers to process and decision automation, which has a very different implication to an organisation and the individuals within. Originally, we were aiming to equip humans with more facts for their decision-making – thus potentially suggesting they have not been making good decisions to date – and now we are replacing those humans with automation. Still, it’s all about being data-driven.

Words matter

Now that we have established that the term ‘being data-driven’ can mean many different things, let’s explore the actual words and the semantics a little further. Again, if you want your data strategy to be understood (which you should want), you need to consider the words you use. My aim is not to dive deep into an academic etymology study, but I simply want to highlight the importance of the language used in a data strategy and eventually reduce the use of ambiguous terms like ‘being data-driven’.

Let’s begin with commonly used terms like ‘data-driven’, ‘data-fueled’ and ‘data-powered’. These all paint a picture of a machine of some kind. The more I think about these the more they take me back to the first industrial revolutions. I think of cogs and assembly lines, factories and production plants, big machines powered by massive electric plants and fueled by black oil. Despite everything in data today being digital, we still use language from a few hundred years back and from an era where everything was tangible and physical. In the data domain, we are surprisingly used to this language of factories, engineering, and production processes. How did we end up here and could we have chosen differently?

When writing about data-related metaphors, it is unavoidable to mention oil. We’ve all seen and heard the metaphor. Once again, the language takes us back to physical machines, cogs, engines, smoke, and pollution. I think of Charlie Chaplin in ‘Modern Times’. It’s a man’s world and these are masculine words. You need to be strong to endure the hard labor. It is also hierarchical.

I picture an upper-class factory owner sitting behind his massive wooden desk, smoking a pipe, and shouting to his subordinates ‘We need more data, faster and at a higher quality!’.

Occasionally you see different language used. Sometimes data workers are referred to as artisans or craftsmen. The term ‘artisan’ takes us longer back in history (mid 16th century) and nicely blends art and creativity with skilled production. But we are still manufacturing something concrete, tangible, and physical. Just like ‘mining’ stems from an extremely physical activity and ‘engineering’ is all about machines.

One of the most common analogies used in data context is the restaurant analogy. There are ingredients (data), kitchen appliances (hardware and software), recipes (code or algorithms), and kitchen staff (data workers), and in the end, it’s all about producing nice meals (data products or endpoints) for the customers. Sometimes the output can be extremely fancy (white glove custom development or analytics), and sometimes all you need is a buffet (self-service BI or rest APIs).

As an analogy, this one obviously works because it is so commonly used. Sometimes I wonder though, should we ask our data workers if they truly want to work long hours in sweaty kitchens, repeating the same tasks over and over, with a foul-mouthed Chef Ramsay screaming in their faces with every mistake they make. I have never worked in a restaurant kitchen so my perception is not based on the actual work, but I would argue that most data professionals have also not worked in actual restaurant kitchens either, and they might have a similar perception and association with this restaurant kitchen analogy.

Let’s explore one more metaphoric theme. There is a common idea that everyone should have access to relevant data and/or information and then make their own decisions using this data. The term we typically use for this utopia is data democratisation. For us in the western world, ‘democratisation’ has a positive connotation. Our representative democracies mostly work – or at least we don’t know any better model. But go around the world and ask what people truly think of democratisation efforts and you might get a different perspective.

‘Democratisation’, however, well we mean by it, can be an aggressive word, a word that suggests enforcing one specific way of governing. However common it is, it might not always be the best metaphor you pick to describe your data strategy, especially in a global and multi-cultural context.

As a comparison, what I like about the underlying mindset with Data Mesh is that it seems to be based not on an inside-out enforcement to ‘democratise’ data development but instead on an outside-in mindset, a more organic approach to data development. By inside-out I refer to the model where we, the central data strategy function (the inside), define and design what the organisation’s ‘data democracy’ will look like. We build the models and structures for the rest of the organisation (the outside) and then work tirelessly to implement these structures across the organisation (hence inside-out).

As I have studied the discussions around the Data Mesh movement, especially the aspects around distributed systems and decentralised models, it looks to me like an aspiration for a model where the different functions (the outside) in the organisation can keep growing and evolving as they have to date, thus continuing their own strand of being data-driven.

Sure, a large organisation needs some common principles and approaches to make everything interoperable, something like an internal United Nations or Jedi Council for data (this is what makes it outside-in), but then let each function or domain do what they do and be who they are. I have tried to coin the term organic data management to represent this approach. Calling it organic, in contrast to democratisation, is less disruptive and less enforcing, at least to me.

Metaphors are like portals we travel through all the time

In Minecraft, a game my 8-year-old son plays frequently, you can travel from one world to another through portals. As an example, you can travel from the normal world to something called the Netherworld through a Nether Portal. As a sidenote, while speaking of metaphors for parallel universes as well as the afterlife, it is quite interesting to note the direction of travel for these ‘other worlds’. While a vertical movement is common in many belief systems, e.g. the ancient celts were less specific on the direction and simply called it the ‘Otherworld’.

My colleagues Paavo Toivanen and Antti Rannisto introduced me to a groundbreaking book in the field of linguistics, ‘Metaphors We Live By’ (1980), by George Lakoff and Mark Johnson. The writers define metaphors as ‘understanding and experiencing one kind of thing in terms of another’. We take something from the familiar real world and use it to describe something less familiar and often abstract in nature. We travel from one universe to another, using language from the familiar universe to describe what we see in the new universe (yes, I did just describe the term metaphor by using a metaphor).

In the data world, we do this all the time. As discussed, we talk about manufacturing tangible goods using machinery and manual labor. We do this to make sense of what we do and to make it understandable and real. However, most of us are not linguists or have backgrounds in human sciences, and we are not trained or even inclined to think and question the metaphors we come up with. We mostly evaluate them from a mechanical perspective, not considering their possible ideological weights or other, often unconscious connotations. Think of everything you have read about comparing data to oil. How much of that focuses on the mechanics and the tangible aspects of data and oil, and how little of that deliberation considers oil more broadly as a phenomenon, its impact on our societies, and how that influences how we view, use, react to, and feel about the word oil?

Or think of the word ‘drive’. Do you associate this term with steering more than with moving? I personally think of movement, but for some of my colleagues this is about steering, and the Finnish term for data-driven, ‘dataohjattu’, directly translates to data-guided. Driving also involves machinery, power transmission, and some type of energy source, so we’re back to the machinery world again. Much before engines existed, the word drive related to hunting animals, so it’s also not a long stretch to associate driving with survival and, subsequently, survival through violence. Driving is associated with motion and suggests that we should aspire to move (forward), the point being that standing still is perceived as bad. But how sustainable is it to always aspire to drive and be in motion?

A lot of our data related metaphors and analogies are already so established that we don’t really think about them, question them, or analyse their possible connotations. But maybe, once in a while, we should stop and think about the language we use. ‘Language is an essential component of creating and shaping our reality’ is something my colleague Antti R. keeps saying, and he knows what he is talking about. When we popularise the word ‘science’, along with symbols like lab coats and test tubes, to describe specific types of data work, does that make data more approachable and accessible (remember what data democratisation is about), or does it in fact alienate parts of the target audience?

My ask and recommendation, following this layman’s deliberation of data linguistics, is simple: As you are defining your data strategy and preparing to articulate your organisation’s aims to become data-driven, consider these points:

  • ‘Being data-driven’ can mean many different things depending on a multitude of factors, and you need to use more articulate and explicit language in your data strategy
  • The language we have learned to use in our data communities is full of metaphors and analogies referring to very different times and places. While metaphors are helpful in making the abstract more tangible, the ones we have chosen are probably not helping with issues like inclusion and diversity, and maybe it is time we left the male-dominated factory and assembly lines onto something more inclusive and current.
  • While we don’t consciously think about the metaphors and analogies we use every day, they still influence how we perceive our reality, thus creating our reality. Every now and then, it can be a healthy exercise to stop and think about our language, dive into the semantics and even the etymology of our words, understand the broader context and connotations and re-consider what meaning these words are associated with.
  • A data strategy should be communicated using various means and as a dialogue. Words are great but so are images, sounds, emotions, and the way we tell the strategy story. Try, test, and listen to what resonates and what does not.

As with so many other improvement areas in our data communities, this is also one that requires a diverse, cross-functional approach. You need to include more than just data experts when developing your data strategy. Language is extremely important, but also extremely complicated, especially in multi-lingual organisations.

You don’t want your data strategy to fail just because you didn’t think and test how people understood, reacted to, and felt about it. The data domain is still unknown to most people, and metaphors are a very effective way to communicate these complex matters, but we should always remember to consider the broader context of the metaphors and the experience of those we are communicating with.