The Cultural Context of Data
Some armchair anthropology to supplement your diet.
Every year I try my best to inject some new thinking into the analytics engineering ecosystem at my Coalesce keynote. Originally, this was not so hard—back in 2020 there were only so many of us in the ecosystem, and expanding the frontier of knowledge could be done by simply tripping over an idea and having the perspicacity to realize it.
In 2023, contributing a new thought is a much more daunting prospect. This community that we’ve built together is full of incredibly smart, ambitious, accomplished people—hundreds of thousands of them. That’s amazing (obviously). But it does make delivering a keynote to y’all a real nightmare :P
The way I try to do this is by pulling in new perspectives from outside our field, outside of our conversation. I try to be a shittier Malcolm Gladwell, combining the best of others’ thinking and making it relevant to what we do. Jellyfish and salamanders. Easter island. Telegraphs. The electrification of manufacturing. Etc.
I want to test out a theme with you this week. It is not a keynote, but it is maybe one thread of a keynote. My hope in doing this here is twofold. First, I’m hoping you can critique my thinking and make it better. Second, I’m hoping you can suggest new avenues of research to explore. What other threads should I weave in? What other books / papers should I read? Seriously, give me a reading list and I will mow through it. If your thread gets pulled into the tapestry, I will 100% call you out live on the main stage.
Enough preamble, let’s get into it.
I’m always fascinated by research that attempts to explain the mystery of humanity. It’s a knot that we’ll never fully untie—many of the mysteries lie too deeply interred in the past for us to fully be able to explain ourselves to ourselves. Which is so strange to me: we’ll never really know how we got to be like this…whatever this is.
The best book I’ve read on this topic in the recent past is called The Secret of our Success by Joseph Henrich. The book isn’t brand new (2015), but it was new to me. Wikipedia’s summary of Henrich’s research interests is pithy:
[Henrich] is interested in the question of how humans evolved from "being a relatively unremarkable primate a few million years ago to the most successful species on the globe", and how culture shaped our species' genetic evolution.
“Relatively unremarkable primate” puts quite a fine point on it.
The big idea in the book is “gene-culture coevolution”1:
Human characteristics are the product of gene–culture coevolution, which is an evolutionary dynamic involving the interaction of genes and culture over long time periods.
Ok maybe that’s only marginally helpful. Here’s an example.2
Fire is a physical process, but starting a fire, keeping it going, and cooking are all culture. They are things that humans have to figure out how to do, how to refine, and then how to pass down to subsequent generations. If we couldn’t form and transmit culture, we couldn’t cook our food.
But the fact that we can do all of these things, in turn shapes us at the most basic level of all: our genes. In the ~100,000 years that we’ve had fire and cooking, our jaws have changed, our teeth have changed, and most importantly, our guts have changed. Cooking, it turns out, is essentially “pre-digestion”—it is externalizing the digestive process, allowing the initial parts to happen outside of our bodies. This allows our bodies to be more efficient, our intestines much smaller.
It also means that we cannot go back. Having learned to cook, and now having had our genes re-wired around the expectation of cooked food, our bodies are no longer suitable to survive in the wild without this cultural knowledge.
Gene-culture coevolution. Culture is not a layer on top of genes, it is hard-wired into who we are. There is a feedback loop.
Credibility is central
If we can no longer survive without our culture, and cultural knowledge has to be passed down, then the act of passing down cultural knowledge is critical for our survival.
Obviously this is true on a group level. If our generation all completely stopped educating our children all at the same time, clearly the prospects for human civilization in 100 years (after we’re all dead) would be rather bleak. This idea—the importance of education to society at large—is a pretty well-understood one.
We’re also familiar with this at the individual level. Each one of us has felt the pressure to succeed as children, in school, in college…get good grades, get into a good school, learn quickly on the job. We’re all competing to position ourselves to receive the best cultural knowledge (what else is a degree from an Ivy League school?), and on some level what we’re all afraid of is being left out in the cold…outside the warm glow of the cultural fire.
Not so different from our oldest homo sapiens ancestors.
It’s a bit hard to see the dynamics of how this plays out in our modern context though. We’re enmeshed in this cultural fabric ourselves which always makes it hard to get perspective. So Henrich illustrates this dynamic on the micro scale in two ways: in observational studies of young children and in observational studies of tribes with traditional cultures. Here is my biggest takeaway:
In cultural knowledge transmission, credibility is everything.
The reason to not eat a particular berry is frequently non-obvious. Many foods don’t immediately kill you—this would be easy to observe. Many foods cause long-term health or nutritional problems, and it’s possible that no single individual has ever been able to truly see the cause-and-effect relationship between one thing and the other. Food prohibitions are arrived at by experimentation, guesses, and, ultimately, empirical success (survival).
This is just one example, but this extends to countless things about our cultural knowledge. Success is hard to directly observe, and causality is hard to assess, but culture evolves over hundreds of generations towards successful solutions. Because we often cannot observe the deeper reasons and judge success for ourselves, we absolutely need to be able to trust that the person who is passing on cultural knowledge to us is doing so with high fidelity. That’s often all we have to go on.
Assessing an individual’s credibility is, then, critical to survival. Trusting the wrong person to act as a mentor could be a deadly decision. And so we are wired to do this nearly from birth. In study after study, even young babies develop their own heuristics for who is trustworthy and who isn’t. Babies even differentiate between whom they trust in what domains! I’m not going to do a full lit review here; if you want it, read the book. But this stuff is fascinating. Who knew your 6-month old was, all the time, assessing your credibility!?
Many status games that adults play are to position themselves either to receive cultural knowledge from the most credible mentors or, conversely, to mentor the most qualified from the next cohort. Both convey individual and group survival benefits.
We are constantly, constantly, evaluating each others’ credibility. We cannot avoid it—it is literally central to who we are as a species.
Personal vs. institutional credibility
There are fundamentally two things that can have credibility: individuals or institutions. You can either trust or not trust a single professor, and you can either trust or not trust your university.
Perhaps the biggest change in human cultural evolution in the past couple of thousand years has been about transitioning the primary trust / credibility relationships from human-to-human to human-to-institution.
Human-to-human trust networks don’t scale beyond a certain point and are as fragile as the people in them (i.e. very fragile).
Human-to-institution trust networks scale and can be made more robust than any individual human.
Now instead of trusting that our local sheriff is just, we are asked to trust that the legal system is just. In the best case, this works far better. We are living through an interregnum of the growth in institutional trust—trust in institutions is at a low ebb—but this trend has been going on for hundreds of years at this point and likely fairly robust.
Here’s the punchline. dbt asks people to move from human-to-human trust networks to human-to-institution trust networks.
Prior to dbt, the primary way that data was passed around a company was emailed spreadsheets. When you get a spreadsheet in an email, the credibility in that spreadsheet flows directly from the human sending it to you, and you trust it exactly as much as you trust them. As a species, we are very good at assessing this kind of credibility.
After dbt, the way data moves through a company is querying a set of curated tables maintained through a (seemingly) arcane process. Those of us who produce those tables understand and trust the process. We know how to introspect it, how to assess its credibility. Did the job run? Did the tests pass? Did anyone merge code and did the CI run? We have very high trust for the institution.
To those on the outside, understanding of this new institution is still low. And they are now being told by the people who used to email them spreadsheets, who they used to know how to hold accountable for the accuracy of those spreadsheets, to just query the data themselves. Here are tools, go to town, trust the results, the institution is solid!
Let’s just be honest here: that’s not gonna be good enough. If self-service has so far failed to achieve everything we know it can, this is a huge part of the reason. We’ve cast down a system that relied on human-to-human credibility and haven’t provided a clear alternative way to assess credibility to the people who rely on data to do their jobs.
This isn’t a critique of dbt, or of any particular product, or the MDS at large! It’s simply a recognition that the shift to the modern data stack is not only a technical transition, it’s a cultural transition. It requires brand new ways to do one of the most fundamentally human things of all: assess the credibility of knowledge.
There are many things going on in the space that help with this problem. Catalogs. Observability solutions. The dbt Semantic Layer. I don’t think any one of these things is a silver bullet to this problem. I do have some thoughts on what characteristics a solution would need to possess, and some ideas on what good looks like here. But for now I’m going to leave this as simply a description of the problem:
How do we empower data consumers to assess the credibility of MDS-generated data products?
If we can do this well, all of the success this community has seen to-date will be as nothing compared with what is yet to come.
FYI everything below that I’m claiming but do not explicitly cite comes from The Secret of our Success. It felt annoying cite the same book over and over again when trying to summarize its key points.