Discover more from The Analytics Engineering Roundup
Adopting a truth-seeking stance.
Why it's so hard to tell the truth and how Taylor Murphy tries anyway.
My first issue of the new year! It’s nice to get back into it. Before diving into the topic at hand, though, I’d really appreciate it if you could fill out this feedback form on Season 1 of the Analytics Engineering Podcast. We wrapped Season 1 in early December and are about to start recording Season 2; we’d love to hear both positive and negative feedback. There’s a spot for guest suggestions—would love those too!
Thanks for your help :D
My last issue prompted some truly fantastic responses. My heart sang when I got the below DM from Taylor Murphy. Taylor swears “this was written in between making sure my child eats his snack and doesn’t choke” and, if that is true, his presence of mind as a parent far exceeds my own.
I’ll share his note below in its full form because I couldn’t ask for a better place to continue the conversation I started before. I will say, though, that this issue isn’t going to make a lot of sense if you didn’t read the last one. I ended it by saying this:
The place I want and need to go next is the internal mindset that I’ve cultivated over the course of my career so as to follow my curiosity while quieting my self-interest, and how it feels a bit like mindfulness / meditation when I do it well. How I think it makes me a better strategist, leader, and data professional. I think this is something that a lot of senior data folks do well but don’t know how to talk about.
Well, Taylor does know how to talk about it. Here he is, published with permission.
Your questions brought up some interesting thoughts for me. Nearly a decade ago when I was going through my divorce I made a very conscious decision to value truth and honesty over most everything. Part of this came from reading Sam Harris’ book “Lying” but much of it was the result of deep introspection over every aspect of my life (losing my religion, my relationship, and my academic ambitions simultaneously). The result of this introspection led me to being extremely comfortable being completely alone; I knew that even if everything washed away I’d still be “ok.” This is when the simplest form of “tenure” started for me - I accepted myself fully and was able to build layers on top confident that if they all washed away I’d still be okay.
Fast forward to today and I have an even deeper level of “tenure” with Meltano. My commitment to truth and honesty hasn’t changed but I now have something new that I haven’t experienced in a business context previously: power. I’m on the leadership team and have significant say in Meltano’s direction, culture, and future. At GitLab I had a limited amount of power as a leader of the data team, but it was subsumed under the weight of a hyper-growth organization and everything that comes with that. With Meltano, it’s the first time I can really affect the course of an organization with my philosophy and power - it’s an application of my tenure beyond myself and my immediate family.
What’s quite fun about this is that I’m now able to create an environment where other people have the power to tell the truth as well. Given my overall identity and skillset, I feel fairly confident that I’d be able to find a job and make a living no matter what happens. With that confidence, I can now work to create an environment where others that may not have the same privilege can bring the same truth and honesty to bear in everything they do. Paying it forward, if you will.
Truth and honesty are so essential so we don’t fool ourselves. That’s the one thing I’ve relearned constantly while at Meltano—it’s just so easy to fool yourself. I did it in grad school, I’ve done it at Meltano, and I’m sure I’ll continue to do it until I die. But working to find the truth, primarily via the scientific method, has been the only way I know to get back to reality.
Looking back at your questions now, I realized I never touched on “Do you ever notice yourself being compromised by the structure you operate in?“ I definitely have in the past and have chalked it up to a vague “politics”. Thinking about Meltano specifically, I’d say that earlier this year I was compromised a bit as I looked too much to others as I figured out this new Product role for myself. Coming into 2022 I have a deep sense of confidence in my ability to figure things out - I know what I can expect from my colleagues on the leadership team and I’m getting better at asking for help or reaching out to people in and out of the organization when I come across a problem. Focusing on Product stuff for most of the year has forced me to constantly reevaluate assumptions and validate everything - both with myself and with the product/company. I’m now more likely to be compromised when I’m mentally unhealthy (tired, distracted, depressed, overwhelmed) than by anything external.
This has also made me think about why I’m so enamored with the data profession, and in particular data engineering. Data is a representation of reality - it’s how to connect the real world to the digital. Given that I place the highest value on truth and honesty (meaning understanding reality as best as possible), I want data to be as truthful and honest as possible. This means it needs to be of the highest quality possible. And the systems that generate, process, and consume the data need to be truth tellers in themselves. For me, everything that’s too far removed from the data starts to feel too abstract and disconnected from “the truth”. Thinking even more, this product role I’m in is perfect for me because I’m building a system that’s building a tool that enables people to build better truth systems (so many layers).
There’s just so much good in there. I honestly don’t want to pick it apart and comment on every little piece because I think it stands on its own; I’ve now read this message like half a dozen times and find new things in it each time. I will say that I deeply resonate with a lot of Taylor’s journey and mindset and could not have done a more authentic, human job of writing about my own experiences. I’m deeply grateful, Taylor, that you shared (and were open to me publishing).
Here’s a summary of the conversation to-date:
Truth is inherently “political”—the definition of “truth” has real impacts on the world around you.
As a data professional, it’s impossible to completely stand outside your own context, and your affiliations make it hard to do your own job (tell the truth).
It’s possible to stand further outside your context / cultivate impartiality with a series of organizational and personal strategies (org structure, personal tenure, values), but this inevitably imperfect given our human cognitive biases.
The other approach that I haven’t gone deep into here (it’s not my personal approach and so I’m not the right one to advocate for it!) is very different. The below is from a long Slack conversation with the one and only Jillian Corkin:
That's my response for how you escape our passions, bias and irrationalities. Let's stop pretending we can behave otherwise (or that it would even be a good pursuit to try) and instead focus on how we can observe and experiment with building systems to make our biases visible — which ones are healthy — why? — which ones are toxic — why?
Let's make the invisible visible
There’s a lot to recommend this approach, but I find myself being very resistant to the idea that it is impossible for humans (or groups of humans) to achieve a level of impartiality. Maybe this is idealistic of me, and Jillian’s stance is the only realistic one. 🤷
If anyone wants to take up the pen and write more on this, how you’ve implemented this in practice, or anything related I’d love to share it here as a continuation of this conversation.
Elsewhere on the internet…
🧮 The CTO of ThoughtSpot has a fantastic post out about metrics. It’s meaningfully more descriptive and accessible than anything Drew or I have written on the topic; I continue to be grateful for others in the space for writing such fantastic pieces so that I can point others to them to explain our product roadmap :P
Two points before you read it:
It is so validating to see the six types of metrics Amit outlined here; we’ve explicitly designed dbt metrics to account for 100% of these use cases. This is a long-term gripe of mine with the Looker query model; the solutions Looker has employed (merge results and table calculations) are…suboptimal.
Amit references current metrics functionality launched in 1.0.0 but not the announcements from Coalesce. Good news—we’re well on our way towards building Amit’s “Possibility #3: Encapsulate both the semantic and query generation layers”! This is, IMO, the only version that is long-term interesting. A layer that simply describes the semantic concepts but doesn’t translate them into queries leaves open the door to a tremendous amount of ambiguity and fundamentally doesn’t solve the problem.
Based on this post it looks like ThoughtSpot is currently only imagining partially integrating with the dbt metrics layer. Amit, we’d be excited to talk about getting you involved in the launch of the Real Deal! Ping me if interested :)
🤝 Mode has transitioned CEOs! I like the part about independence:
(…) That puts in front of Mode a big opportunity: to be the independent analytics platform for the modern data stack. This directly benefits you, our customers: we won’t push you to buy another product in our portfolio and we won’t spend our development cycles integrating with the rest of our portfolio. Instead, we can focus all of our efforts on making Mode the best way to do analysis (…)
Really agree with this—independence has huge strategic benefits; it’s something that customers do and should care about.
🤣 This made me laugh a lot (from Reddit).
🏢 Also from Reddit, this post gives a fantastic perspective on why, at least for some data engineers, FAANG companies aren’t actually the professional growth opportunity that many would assume. It’s a short and highly worthwhile read; I’ll quote my favorite bit below:
It's the Wrong Kind of Scale
I think one of the appeals to working as a DE in FAANG is that there is just so much data! The idea of working with petabytes of data brings thoughts of how to work at such a large scale, and it all sounds really exciting. That was certainly the case for me. The problem, though, is that this has all pretty much been solved in FAANG, and it's being solved by SWEs, not DEs. Distributed computing, hyper-efficient query engines, load balancing, etc are all implemented by SWEs, and so "working at scale" means implementing basic common sense in your SQL queries so that you're not going over the 5GB memory limit on any given node. I much prefer "breadth" over "depth" when it comes to scale. I'd much rather work with a large variety of data types, solving a large variety of problems. FAANG doesn't provide this. At least not in my experience.
🤟 David Jayatillake talks interoperability in his new newsletter:
one piece of the stack must stay open source to stop the whole stack fragmenting: the interoperability layer/framework
🔥 Adam Boscarino of Devoted Health wrote about the team’s experiences one year into their dbt journey:
Over the last year, dbt has become a key piece of the data platform at Devoted and lived up to our wildest hopes and dreams. We have gone from a single proof-of-concept to 1,100+ models. It has fully replaced our previous internal SQL+Jinja tooling and powers almost all of our data transformations.
The meat of the post is in the innovations the Devoted Health team has made on top of dbt Core, including CLI extensions, CI/CD tooling, and a unit test framework. Adam and team—if you have any plans to share any of this code, please let me know so that I can promote it!! 🙏
Thinh Ha writes “10 reasons why you are not ready to adopt data mesh,” but it’s not as spicy as the title might make it seem. The way I read this post is: implementing data mesh thinking is a meaningful organizational shift that requires you to be ready both technically and organizationally. If you’re not truly ready to commit what is required to be successful, you shouldn’t go down the path. I agree. The thing the author doesn’t say that I also believe to be true: decentralizing ownership of data is not going to be optional over some time frame. If you’re not ready today, the questions are:
When are you going to be ready?
How much ground are you going to be losing to your competitors in the meantime?