How to break your data

And how to get good at fixing it

Jul 09, 2023

You never forget the first time you brick your laptop.

A long time ago, I was a die hard Linux Desktop convert. Sometime in the mid oughties, I got handed an Ubuntu CD (it might have even been Warty Warthog), and I spent the next 10 years gleefully rummaging under the hood of my operating system. At one point, I had a full on Hacintosh machine — GNOME skin, wallpaper, icons, bootloader the works — because a real Macbook was not even close to affordable.

I had everything on my trusty old Asus, except the ability to put the laptop to sleep when closing the lid. You know, minor stuff. 😛 So I went to the back corners of the internet, and found a hack that required flashing the BIOS. The BIOS is this tiny but important part of computers that lets the operating system control a subset of hardware resources (e.g. physical buttons) … and tells your computer how to find and load the disk containing your operating system.

In other words, messing up your BIOS = bad. A fubared BIOS means your laptop can’t load any operating system because it can’t startup any hardware disk (CD rom, USB or an internal drive).

It’s kind of like losing your car keys when you live in a car-only town. Bad.

Anyway, as you might expect, I bricked the laptop on my very first attempt at flashing the BIOS. Whomp whomp 🎺

That was probably the most panic I’ve experienced in front of a laptop. Definitely the top 3 — I had saved up all this cash to buy the machine, I hadn’t had it for very long, and it was powering not only my college homework but also a side gig I had at the time. Bad. Bad. Bad.

The point of this story isn’t that I fixed it. I did. Nor is it about how I did it — I actually can’t tell you how I did it, because I don’t remember anymore.

The point of this story is that I know I can do it again (brick it and then fix it hehehe). I can figure out how to do it again because in the process of breaking my laptop I learned lots about basic communication between software and hardware. And I know I can figure out how to do it again because I succeeded before.

You never forget the first time you brick your laptop. You may forget how you fixed it, but you will never forget that you did, in fact, fix it.

This is the part where I bring the metaphor back home to data.

Throughout my career, I’ve heard a variety reasons that folks aren’t comfortable working with data independently. My favorite question to ask — “What are you afraid is going to happen?” The answer is usually something like “What if I accidentally drop a production table, or worse, a production database?” or “What if I pull the wrong number and get the findings totally wrong?”

Raise your hand if you have had either of those things happen to you. ✋

Raise your hand if you have had both of those things happen to you. ✋✋✋

Exactly.

Something “bad” is going to happen when someone starts exploring data more deeply on their own. Wading more than knee deep in that proverbial lakehouse. It just will. The more someone gets into the weeds of working with, shaping and interpreting data, the more likely they are to make a mistake on the level of bricking that laptop.

My spicy thought for the week: maybe what folks need from us data humans to feel safe working with data — to self-serve, if you will — isn’t a perfectly manicured data catalog and CI tests that catch every possible error someone can make. These things are important, but they are also labor intensive, impossible to get perfect, and require lots of maintenance time.

What folks need is to feel safer failing — knowing that dropping production is bad, but not that bad because you’ve got backups and version control, and therefore restoring the state of your data model as it was is relatively trivial. Knowing that they’re likely to come across data, or produce charts that aren’t right at some point along their journey, and also knowing how to gut check their findings against well known measurements and sources of truth, or that you have someone on standby to help review their code.

Maybe what folks need from their data team to enable self-service isn’t perfection in data modeling and documentation. Maybe, it is easy avenues to recover from failure and build the self-efficacy to make more mistakes.

Cool stuff from the past week(ish)

I always enjoy seeing purple people references pop up 💜. This week I found one in Carlo Carandang’s post: Building complete and responsible AI systems: the importance of the purple unicorn role. I had fun reading this and imagining what needs to change in the MLOps landscape to make more purple AI unicorns a reality. Is it that we should build more pathways for technical folks who work on AI to get closer to the business? Or is it that we need to build tooling that allows business folks to iterate more easily and safely within an existing AI framework? You could argue that this second thing is happening today with the ChatGPT API — you can now iterate on a number of applications with many guardrails built in. All you have to do now is throw some API credits at it. 💸💸
TIL about Density Based Clustering Validation from Avi Chawla. I’ve come to really appreciate the bite sized bits of data science this blog offers, and the easy to grok examples with visuals and sample code.
Have you checked out Tristan’s episode on the Joe Reis Show yet? It’s got some fresh new takes you haven’t heard from him yet!
This comment from Jasenka on one of John Cutler’s game theory explorations of organizational behavior. First read the first two posts in the series, then the last post and revel in the reality of the interaction John describes. We have all been there at some point — watching the many layers of an organization water down the essence of a problem. Then read Jasenka’s comment and tell me if you get the same thing away from it that I did.

What I got away from it: 1) a reminder of how useful systems thinking is, but more importantly 2) the importance of defining common measures of reality in that system. The pattern Jasenka describes only works because there are common measures that everyone in the company (likely industry!) agrees on as being useful indicators of success.
Let’s say for an engineering team of a SaaS company that’s something like service availability. I believe that the thing that breaks the cycle of watered down communication in a hierarchy is an understanding of inputs to that metric. Those inputs are not “X person shipped some bad code”. Those inputs are things like: the amount of time a team spends keeping the service available relative to the amount of time actually allocated by the business for this over feature work; or the amount of time it takes an average engineering team to identify and recover from an error in a deployment. A shared understanding of those inputs at higher levels of the organization and the ability to give feedback in the same terms bottoms up, is IMO what helps break this cycle. But I’ve just written far more words than Jasenka did.

What do you think?

Share The Analytics Engineering Roundup

John Wessel

Jul 9, 2023

Great post! I remember the first time I broke a production database. I caused a pretty nasty deadlock and the entire production app stopped working for the better part of an hour. It was a great learning experience, and I’m glad I had enough access to be able to actually break it.

I was also reminded of this section of The Lean Startup.

“We routinely asked new engineers to

make a change to the production environment on their first day. For engineers trained in traditional development methods, this was often frightening. They would ask, "What will happen to me if I accidentally disrupt or stop the production process?" In their previous jobs, that was a mistake that could get them fired. At IMVU we told new hires, "If our production process is so fragile that you can break it on your very first day of work, shame on us for making it so easy to do so." If they did manage to break it, we immediately would have them lead the effort to fix the problem as well as the effort to prevent the next person...”

I also keep these ideas in mind when working with new team members.

Expand full comment

The Analytics Engineering Roundup

How to break your data

And how to get good at fixing it

Cool stuff from the past week(ish)

Discussion about this post