As a philosophy major, I spent more than my fair share of time thinking about what knowledge is, as a general concept. Although this academic indulgence was partially mandated by my professors, my friends in business school viewed it as entirely unhelpful for a career in business, or as they bluntly put it, ‘the real world’.
Since entering the workforce, I’ve spent significantly less time pondering those higher order epistemological questions, and instead ‘just getting on with it’, usually meaning some variation of:
gathering and analysing information (anything from the price of electricity in southern Europe to colleagues’ dietary requirements),
discussing said information,
deciding what to do as a result,
doing those things,
reviewing the outcomes,
and repeat.
Little did I know that in his 2020 Coalesce keynote, Tristan was exploring how groups can know stuff by way of philosophy, the history of communication, and evolutionary biology. Turns out that it’s hard to build consensus in groups, but that certain conditions help - like one-to-many communication. There were some recommendations:
Prioritize trust and accuracy of the information that gets circulated
Enable decision makers to plug directly into an organisation’s ‘central nervous system’ of information
Build cultural norms that reward transparency and ‘epistemic humility’1
We have at dbt Labs recently formalised this process of building knowledge at scale as the analytics development lifecycle, including opinions on how to do it well. Considering all this attention we give to knowledge and the investments that are constantly made in its quest, you’d think we’d have a fairly good sense of what knowledge is. Consensus, even.
The truth is, we rarely think about it, because we take it for granted. But could digging into the concept of knowledge itself unearth something valuable about how to build it?
Knowledge from first principles
We do, at least, seem to collectively agree on some key ideas:
Evidence matters: statements seem to need some kind of backup if we’re going to feel confident relying on them or trust them. We expect others to provide evidence for their assertions, and may call them out if we think it’s lacking.
Context matters: the level of accuracy we need depends on what we are trying to achieve. A range might be enough for estimating the size of an addressable market; but much more precise numbers might be needed for financial reporting.
And it turns out that these everyday intuitions around knowledge are pretty close to those of the professionals. Going back to Plato’s time, philosophers have tended to agree that a statement of fact has to be, at minimum, a justified true belief for it to potentially be knowledge. Let’s break that down:
Justified: you need to have a reason for believing the fact for it to be knowledge; you can’t just happen to believe it, or believe it by accident.
True: you can’t know something that’s false.
Belief: to know something, you have to believe it.
Until fairly recently (by the standards of academic philosophy), the idea that justified true belief = knowledge was widely accepted in the field. But in 1963, the philosopher Edmund Gettier, who was working as an assistant professor at Wayne State University, started to question this status quo. Following a flash of inspiration2, he published a 3-page paper aptly entitled “Is Justified True Belief Knowledge?” which set out some hypothetical scenarios, or thought experiments, designed to challenge our intuitions about the definition of knowledge, and specifically, the idea that it’s justified true belief. Thought experiments of this kind quickly became known as ‘Gettier problems’ – and although they have never quite achieved the A-list level of fame that trolley problems from moral philosophy have enjoyed, they remain a commendable runner up.
From epistemics to analytics
To this day, Gettier problems continue to be hotly debated, but can they teach us anything useful in the world of data, or are they just academic edge cases?
Data folk do at least seem to have a similar fondness for hypothetical scenarios. And just as moral philosophers love a trolley problem, in data, all roads lead back to the CEO’s Dashboard. A hackneyed example, but a classic for a reason: familiar ground to get us all on board, and with an undeniable sense of importance and urgency. We intuitively accept that if it were broken, that would be a Bad Thing that should be fixed immediately. So, I hereby offer you the meeting of two worlds - the CEO-dashboard-Gettier-problem:
Our canonical CEO sips her coffee and looks at her KPI dashboard, which indicates that the business acquired 4 new customers in the past day.
Looking at this, she believes that they acquired 4 new customers. This belief is justified, and it also happens to be true. However, unbeknownst to her, there was an error causing the daily data pipeline refresh to fail.3 So she was actually looking at yesterday’s numbers.
However, they did also acquire 4 customers in the past day (and the same the day before), but nevertheless, the dashboard doesn’t reflect the latest data.
Gettier would say the CEO does not know that the business acquired 4 new customers in the past day, even though this is true, she believes it, and there was a reasonable justification for her belief (the dashboard, though it happened to have stale data). Justified true belief wasn’t good enough to create knowledge in this case. Luckily, her belief isn’t causing any harm because it happens to be true, but things could have so easily gone wrong and she would have been none the wiser – until it was too late.
I’m willing to bet that we all have our own real life version of this story. They teach us what can go wrong with misapplied confidence (too much, or too little). False confidence can lead to errors: if the company had actually acquired 400 customers yesterday, the CEO would still have believed the number was 4. On the flip side, lack of context about information being offered up can result in underconfidence, and that can impair our ability to act - especially as groups of humans trying to achieve complex things together.
Once you’ve been burned by inaccurate data, you can become more cautious. So how can we instil confidence, or provide appropriate caveats, when communicating with many people at once?
Fixing the knowledge creation process
It seems like we need to offer up additional context alongside information to illustrate how trustworthy it is (evidence matters). What, specifically, would’ve helped in the CEO’s case? A data health indicator next to the KPIs could have helped avoid false confidence in a number that could have been wrong, and could have spurred action to talk to the data team and establish the facts. Even better if she could have dug into the data health indicator to establish exactly what was wrong - in this case, stale data, and determine whether that problem would have an impact on what she wanted to use the information for. Maybe she just needed an overall view for the year, and so a day’s worth of data missing wouldn’t have been a problem (context matters).
We usually want to move from knowledge to inference so that we can use the knowledge we’ve gained to achieve some sort of goal. This is the raison d’être of the Analytics Development Lifecycle. But without guardrails in place, we could make inferences from seemingly justified beliefs based on perceived patterns that do not actually exist, and then act on mistaken beliefs. And if we could make these kinds of mistakes, so could an LLM. Adding in contextual information - such as freshness and accuracy metadata - helps avoid this, while letting us move forward.
Building knowledge is an experimental, collective process, whereby individual pieces of knowledge get knitted together to create discoveries that have meaningful impact. By providing richer context and the ability to map how each individual piece links to existing knowledge, we can guard against poor inference, whether it is humans or machines doing the work, or both. Doing this at scale requires good tooling as well as organisational norms around how we work together.
How’s that, business school friends?
No Dunning-Kruger effect here, pls.
Sadly, there’s no evidence to suggest it was triggered by anything as iconic as an apple falling on his head, or on exiting a bathtub.
Apologies for any painful memories this may surface.
nooo Nina you did not just sneak prelims philosophy into data