The Trouble with Data

I a bit of a data analysis freak. I like my data, preferably correct.

How do you like your data? “Accurate, not Cut”

Me, James Bond style, Approaching the bar

That would be me. I don’t take it re-worked, re-cut or in extracts. I take it raw so I know what it is, what it means, what it tells me, what flaws it has. 

The problem with data is that people more often than not do not understand what they are looking at.

The Problem with Data Analysis

Say you asked for a piece of analysis. Someone sends you a chart and you take it as it is. You believe that you have done the right question. The person believes they have interpreted what you were after the right way and extracted what you intended. A whole bunch of assumptions which my time working inside an organization (other than in the client facing side of the business) has taught me are mostly flawed.

When we start looking at a problem, most likely we don’t really know what we are after. We feel that there is some data that illustrates our story but really we may not even be asking for the right things. So we get Finance or Ops, or HR to run whatever data we feel is the right one. After what is usually not a short turn around and a probably cumbersome project on the other side, we get some nicely formatted charts.

Kindly send through the raw data

Me, straight away

I can feel the mouse shaking on the other side. Why does she want the raw data? She does not even understand what it is. Haven’t I answered your question? Well yes, partially. However, I am conscious I may not know the question I am asking is the right one. Therefore, I want to look at the data and just let the screen tell me a story, rather than have a pre-done story and find the facts to go along.

Because of my typical outsider perspective in many problems, it is not uncommon that an analysis would take many iterations until we even get the data set aligned what I am really after.

Why raw data?

So now you have this raw data, you know there are different questions to be asked, you start drawing new conclusions. Data is king, right? So your data analysis is by now looking unique and truly interesting. You draft charts around, you even put together some slides telling this story. What does it all mean?

Well, look at all this data.

And suddenly someone asks a basic question that immediately highlights the data is not correct. The data analysis becomes flawed even if it is still perfectly valid. You lose an incredible amount of credibility and you go back to square one.

Checks and balances. That is something that often does not exist with data producers. Because they do not have the full picture, the people that pull out the reports are more often than not unable to evaluate the accuracy of the data they are providing. 

I know, ludicrous. By no means should this should be even possible, that is why you have people extracting reports rather than machines. But it is the case in many places.

So as an analytical thinker, before analysing, stop and think whether what you are looking at even makes sense. I have sent data sets back within 30 seconds of receipt by simply asking “kindly clarify why the total does not match the total in report x that you sent last week”. This usually gives rise to a whole other round of iterations that leads to new data sets and ultimately an accurate and complete data set.

Is this it then? Well, only partially.

How to do data analysis?

Data sets some times give you but a side of the story, or but the icing on the cake. If you get data that is aggregated at a too high level, then chances are you need to go back and forward trying to figure out why trends changed and what are the drivers as the data has insufficient detail. Here, there is a fine balance. 

I always err on the side of too much data, but I recognise data sets are just so large that we do have to filter out fields and it is sometimes more efficient to ask for additions than to deal with spreadsheets that do not load on any common person excel. Or that block each time you add values. You don’t want to deal with the wheel that shows up when excel is “thinking”. So shooting at common fields but keeping an open mind to what may be missing is important, so you can dig deeper into a few points.

But there is the digging dipper and the digging laterally.

Sometimes, the data may only tell one side, and as we look at different sources we may reach complementary, more insightful (though sometimes contradictory) outcomes. Data is king, but there is a queen in the game as well, how to chose then?

There is no choice really, there is only understanding what different sources tell you and why they may be represented in different ways. That will broaden the understanding and avoid undermining by different players.

The thing with data is that people make decisions on the basis of it. It is called management information, under the assumption that it informs management of the decisions to be done. However, in large organizations, MI can be built in many different ways and tell many different stories. The key importance to know different sources, so you understand the different stories, and can challenge each and any one of them.

Photo by Markus Spiske on Unsplash


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.