Why it's hard to ask "good" questions
In this issue I spend some time connecting my love of wakeboarding to exactly why it’s so hard to learn to ask the “right” data questions
Also in this issue, I’ve made an effort to cite some new voices that I haven’t referenced before in the Roundup :) Most articles are less than a month old, but not everything was released exactly in the past week:
Petrica Leuca: A way to ensure auditability in data processing
Marie Lefevre: Not all data requests are urgent
Mehdio Ouazza: What Open Source Can Do For Your Data Career
Enjoy the issue!
-Anna
PS: Have you seen the Coalesce agenda that was released a few days ago? It’s going to be 🔥🔥🔥. I’m looking forward to seeing you there, wherever your “there” is this year.
Why it's hard to ask "good" questions
“Lean back into the water.”
“Pull your knees in towards your chest.”
“Keep your arms straight.”
“Let the boat pull you up before you try to stand.”
“Use your core, not your arms.”
Most people will tell you that the hardest part of wakeboarding (or wakesurfing or waterskiing — pick your poison!) is getting up on the board for the very first time. Most people wipe out badly at least a handful of times before they get it right. Some folks give up before they ever make it up.
But then once you’ve gotten up successfully just one time, you’re likely to get up nearly every time without thinking too much about it.
Why is that?
I’ve been thinking a lot about that this weekend (it’s over 100 degrees out on the Sacramento river delta in California, and the water metaphors are practically begging to be turned into a data think piece… so here we are).
Backing into frameworks isn’t the same as applying them when you are learning
The instructions you get when you’re first learning to wakeboard are more or less the same regardless of who is teaching you. And yet getting your body to do what your brain already abstractly understands, is an entirely individual and varying experience.
I think it’s because that once you’ve figured it out, it’s easy to back into a neat little framework. Just do these 5 things, and you’ll get up every time.
But while you’re there, sitting in the water, clutching the rope that connects you to the boat, and asking yourself exactly why you thought spending your afternoon falling on your face into the water seemed like a fun activity… the framework only gets you so far.
“You kind of just have to feel it,” everyone says, as you dive headfirst into the water yet again.
What if asking questions of data is the same?
“Feeling” your way around Analytics
We tell ourselves, our new data team recruits, and the stakeholders we work with across a company, that analytics is a craft. And so it is.
Have you ever watched an analyst work? It’s quite fascinating.
It looks a lot more like slowly chiseling a marble sculpture than building a house.
“The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material.”
-Michelangelo
First, when answering questions, timelines are usually imprecise and answers are rarely directly related to the question — if you’ll forgive my mixing metaphors, Michelangelo’s sculpture could take a couple of days, or it could take several months. It could be a man, or a woman, or a deity with wings. It’s hard to know exactly before one begins chiseling.
“Sure, that’s easy, I can pull that number for you real quick” invariably turns into several days going down a rabbit hole. It’s common to come back with an entirely different answer to a different flavor of the original question. That doesn’t make one a bad or a slow analyst. Quite the opposite. The folks most celebrated on an analytics team are usually the folks who start off working on a problem by saying “Now wait just a minute…”.
Second, when you’re working with a talented analytics team, you’ll often hear the phrase, “Something here looks off”. Other variations that mean the same thing: “Are you sure about that number?”, “This isn’t at all what I expected to see”, “Let me look over my code a couple more times before I merge it in”. This isn’t said in doubt of oneself, or of a colleague’s ability to produce a competent and insightful result. It’s said because the trained eye can see that something isn’t yet fully there, but not yet what it is that's missing. There’s still superfluous material on the sculpture.
How do you teach someone to chisel a sculpture? Do you ask them to decide ahead of time what it will look like? Where they will put the head or the arms? Or do you find that as they start chipping away at the rock, the final product begins to drift, evolve, and take its own shape as they uncover more structure inside the rock they couldn’t observe before?
I think this is why most people learn how to do analytics properly after they’ve become an analyst.
To get to an insight (that is to learn the answer to something being asked of you for the very first time), you have to ask a question, yes. But that question is rarely perfectly capturing the shape of the insight you’ll walk away with. After you chisel away at your data, you will uncover more structure that you couldn’t observe before, and then come back with a variation on your starting place.
Becoming an Analyst is less about the tools you use to get to an insight, the language you write most often in (Python, SQL, R….), or a series of mandatory steps to take to arrive at an outcome. It’s more about embracing the fact that the data is going to pull you forward, sometimes in a very different direction than you thought.
It’s about:
leaning back;
letting the
boatdata pull you;keeping your
body position tightyour analytical toolkit full of essentials like descriptive statistics and simple visualizations to explore data;and then, once you can feel that you’re ready,
standing upanswering the question you see emerging in front of you;and trusting that feeling.
The first time you do this successfully, and come up with something new and interesting, it will feel like you’ve been repeatedly slamming your face into the water.
But the second time you do this, you won’t even realize that you’re doing it anymore.
Elsewhere on the internet…
Petrica Leuca writes about ensuring auditability in data processing, a topic very dear to my heart, and one I don’t know that we spend enough time on in this newsletter.
Petrica gives us a set of questions to answer, a framework for the types of information we need to store in order to always be able to answer them about a specific data set, and some sample code that brilliantly illustrates why each type of information is invaluable for understanding what is happening with you data:
Even if you don’t work in a sector that is heavily audited, if you work in a public company or plan to become a public company, you’re going to want to read the advice in this post very carefully and start thinking about how to track this metadata.
Shiny… is now available for Python, Dario Radečić tells us. I’ve always been a big fan of the Shiny project, alongside many other cool R ecosystem packages. An equivalent just didn’t exist in Python that allowed you to deploy an application easily and share it with others, without writing a lot of boilerplate. Dash, the primary Python equivalent, was extremely Pythonic in that it was a very flexible, but highly verbose way to build a data application. Now we just need a python version of the tidyverse, and we’re off to the races!
In Not all data requests are urgent, Marie Lefevre reminds us to ask “Why?”, and then ask “Why?” again. Why are you asking this question? Why are you doing X thing that has let you to ask this question? Or, to put it more simply (and anyone who’s worked with me probably hears this from me a lot): “What is the problem you are looking to solve?” Marie also works through an example of an interaction using this approach 💜 This is another one of those analytics skills that’s kind of like chiseling away at a dataset — after you do it once successfully, you’ll keep doing it without thinking. So go ahead and try it, and then try it again until you’re no longer conscious that you’re doing it.
Finally, a rare treat: an article that speaks to contributing to open source for data professionals. Mehdio Ouazza writes about What Open Source Can Do For Your Data Career. His advice is on-point and I encourage you to just read his well written and concise piece. The one other thing I’ll add is this:
The core of dbt is open source, as are dbt product docs, dbt adapters and dbt packages.
👉 We’re going to be hacking on all of these as part of the Coalesce hackathon and Hacktoberfest 👈
Come join us ;)
👋 Until next time!