We need to talk about the data analyst
Featuring guest author: Erica Louie! Also in this issue: the technical pay gap, phasing out the SQL interview and the impact of increasing gun violence.
We’ve talked before in this newsletter about amplifying more practitioner voices. As promised a couple of weeks ago, today’s issue is brought to you by our very own Head of Data — Erica Louie!
And speaking of practitioner voices — have you submitted your Coalesce pitch yet? :) Proposals are due this Tuesday, the 31st. Tuesday is also the last day to grab early bird tickets for the in-person experience.
Without further ado, enjoy the issue!
Hi, I’m Erica “ric/riccy” Louie, Head of Data at dbt Labs. Two weeks ago, I wrote what I presumed was an uncontroversial tweet regarding the importance of the Data Analyst. After a dozen notifications and multiple attempts to unbundle handfuls of nested threads later, it became apparent that the “fall” of the data
scientist analyst in the eyes of the data community as the World’s Sexiest Job (and yet, also the most boring person) was the tip of a very complex and interwoven iceberg.
With the rise of the analytics engineer, a shadow began to cast over the data analyst. We can argue that job titles are hand-wavy, just do the thing and give yourself whatever title you want on LinkedIn. But titles give us structure; they offer us a direct path into our responsibilities, how these responsibilities impact the company, and, of course, how much we’re paid.
In this week’s issue, we’re diving into why we should embrace and value the data analyst with the same vigor and vivacity as we have with the analytics engineers told through the lens of my experience building the data team at dbt Labs.
Also in this issue:
The technical pay gap by Benn Stancil
We should phase the SQL interview out by Randy Au
My first hires on the dbt Labs data team were two analytics engineers, one embedded on Product and the other on Marketing. Their first 6 months were spent building the data foundation for their respective teams. We needed dbt models and high-level dashboards that didn’t exist before.
However, we reached a snag.
Now that we had a strong foundation of dbt models, our stakeholders (understandably) wanted to dive into bigger questions and analyses. We were attempting to build new infrastructure to support new initiatives or new entities in the business while also balancing analysis/investigation work. I’ll never forget one of the analytics engineers on my team saying in passing:
”I thought analytics engineers’ responsibilities were around infrastructure and some analysis. This deeper investigation work feels like a dedicated analyst would be more effective at it.”
I was building the team under the assumption that an analytics engineer should have both infrastructure and analytical skills. I still believe that to be true to some degree today, but I also learned that an analytics engineer’s responsibilities cannot realistically be both building and maintaining infrastructure while also running forecast analyses and A/B testing.
Now that the internal data team at dbt Labs is a year old, it’s become increasingly more difficult to continue maintaining/building net-new data infrastructure (re: new dbt models, an experimentation program, the data activation layer, etc) while also having time to dive deeper into analyses from the infrastructure we’ve amassed over the year. And if a data team cannot draw insights from the models they’ve created, then they aren’t contributing value to the company.
This is a tale as old as time: the more questions you can answer with data, the (exponentially) more questions you receive in return which are often more complex and harder to pinpoint an answer. We’re a team of analytics engineers; we’re great at building modular and scalable infrastructure, but often struggle with complex analyses. As time went on, we realized how vital a data analyst is to fill this gap.
Collaboration between roles
Our data team is semi-decentralized. This means we’re a single team, but team members are embedded onto dedicated business functions. Just as we have an analytics engineer on core business functions, we’re adding a data analyst as an investigatory/analytical partner. We’re taking a bet that the data analyst and the analytics engineer should have a symbiotic working relationship where they can both utilize each others’ strengths. Below is an example of what this looks like in practice:
Scenario: Marketing is working on an initiative to bring in new campaigns and they want to build reporting to highlight key metrics while reiterating on their campaign strategies.
All stakeholders will work in an async doc + live conversations to discuss the scope of the work, metrics they care about, the wireframe of the dashboard, and what actions they want to do with insights drawn from launched campaigns
Both the analytics engineer and data analyst will play a role in suggesting/reviewing performance metrics, any caveats with available data, and scope the project within one-week sprints
The analytics engineer will write the dbt models and perhaps backend code in the BI Layer, along with documenting new models and writing up the analytical framework if needed. They will note any caveats of the data throughout the process and consult their fellow data analyst.
The data analyst will create the dashboard, go deep on the analysis, and write up the explanatory narrative. They would run an analysis on the campaign performance metrics while consulting their fellow analytics engineer. And even after the initial analysis delivery, the data analyst will continue to track the metrics in the campaign dashboard
Note: All deliverables should have mutual sign-off. This means reviewing PRs and signing off on the explanatory narrative post-dashboard/analysis delivery.
Both are equally important roles. Both are impactful in their own right. And both rely on the other.
Living in the shadow of Silicon Valley’s suffixes
As I interview folks for analytics engineer roles, candidates are often data analysts who feel that, in order to get a sizable pay increase, they should switch roles. But when I ask them, “What do you enjoy the most about your role?” Their responses usually align with:
Working directly with stakeholders
Leading an analysis that leads to a shift in a team’s strategy
Forging into the unknown with data and answering big problems
I often worry that we’ve put analytics engineering on a pedestal and, via the law of equivalent exchange, began undervaluing the data analyst. Benn Stancil’s article, The technical pay gap, touches on a more blatantly controversial topic: data analysts should actually be paid more than analytics engineers.
The work analysts do, especially the non-technical, interpersonal parts, is valuable and exhausting. If the natural tilt of Silicon Valley encourages us to pay analytics engineers more, we’ll pull analysts, including those who are uniquely talented in that role, to move into a different one for higher pay and more prestige.
As mentioned in the beginning, the fall of the data analyst is the tip of a very complex and interwoven iceberg within the data industry. Some that come to mind (though there are surely eons more):
Inconsistent standardization of titles that muddy the responsibilities of the roles, the skills required, and career ladders
The tech industry’s trend to value technical skills (i.e. the “engineer” suffix) over valuing impact
Data teams often struggle to measure impact. If we decide to weigh salaries on the role’s impact to the company, how do we measure that?
Our team is welcoming our first data analyst in two weeks and I’m looking forward to sharing how this partnership unfolds, including the wins, the pitfalls, and what we’ve learned :)
Other morning coffee readings ☕
We should phase the SQL interview out
by Randy Au
In this article, Randy raises an interesting argument: we should phase out the SQL interview and replace it with a task centered around relational data? He argues that SQL dialects vary (i.e. the subtle but annoying differences of date_trunc(’datepart’, field) vs date_trunc(field, datepart) or window function syntaxes — to ignore null or to not ignore null) and if candidates are asked to live code, then they spend more time on translating the dialects over getting to the value of the task itself. As long as they understand how to work with relational data, then they could easily learn SQL.
The skill that I think we’re actually looking for when testing with SQL is “the ability to work with relational data”. We want to see data sets joined together, rows filtered and aggregated. We’d like to see some understanding about designing tables and normalizing data to reduce (unnecessary) duplication of information. Maybe if you need to check for deeper understanding, you’d like to see more complex things like self joins, joins using inequalities.
When we began hiring our first data analyst, I think the hardest part was determining what the technical assessment should comprise of. Should they know SQL? Python? R? What if they’re able to effectively forecast in Excel?
I believe writing (and publicly posting) the 30/60/90 and long-term success for the role makes a huge difference when hiring. You can ask yourself what do they need to accomplish within this time period and what technical skills will they have to be comfortable in to accomplish these goals?
I’m very excited about this new series by Bobby Pinero and the Equals team. We’re often caught up in reading opinions or playbooks on how we utilize platforms to solve data problems, but rarely do we hear about the journey (i.e. the business problems and context, the process, the pitfalls) and the impact of the solutions. If data practitioners are storytellers, then why don’t we talk about our team’s journeys and decisions in the form of stories?
To take a snippet from the first episode with Ray Ko:
Ray started our chat with a simple story. During his time at Facebook, he and his team woke to find that account creations had plummeted. …After diving into the data they found the cause: email confirmations. Users signed up, but their confirmation email never arrived so their account was never fully created. It turned out that a few major email providers started throttling Facebook, causing a significant number of emails to never arrive…They fixed that specific issue and sign-ups returned to normal, but this helped Ray and team understand something they’d never considered - the importance of email delivery to Facebook’s growth.
When I read an article, I often crave the thought-process and chronological series of events. I want to hear about the context, the problems, and ideal solutions agnostic of tooling. In the end, we’re all trying to solve business problems as quickly as possible. And while following playbooks and tooling help expedite the process, learning how others approach these problems (e.g. their thinking process) is arguably more advantageous and offers a chance for more innovation beyond available tooling.
Reading Petr’s description of the data team in the beginning closely mirrored how the data team at dbt Labs is growing (next time just @ us, Petr!). What I love about this article is how universal this problem is when scaling data and how well the outcome lends to self-service analytics. In Pete’s proposed structure, he builds off of Stephen Bailey’s Towards Modular Data Products Coalesce 2021 talk.
If we want to follow the Data-as-a-Product principle, we should remove the noise of defining our data architecture around technologies rather than organizational structure. Rather, he argues that data and non-data folks can easily identify which models are most relevant to their business function.
This view allows the…team to overview their data product, regardless of technology layers (indicated in dotted lines within their data product). They could get more directed alerts, usage statistics, explicit dependencies, and the ability to hide the details
The data team at dbt Labs has been heads down this quarter thinking about entities, how they connect to each other, and, most importantly, how we represent and communicate the way these intersect and what is relevant to each team. I’m excited hearing more conversations around the semantic layer and how the data community continues to develop the ways we can better serve our stakeholders :)
Amidst the heartbreaking news in the past few days, I feel silly sitting at my desk writing this roundup when there are bigger and significantly more important conversations to be had. As we use data to tell the stories of our business with charts to evoke a feeling, I hope the charts in this article will do the same. They’re humbling, saddening, and incredibly frustrating.
Nikk references an article from the New England Journal of Medicine which researched the leading cause of death in the United States among children:
Between 2000 and 2020, the number of firearm-related deaths among children, adolescents, and young adults increased from 6998 (7.30 per 100,000 persons) to 10,186 (10.28 per 100,000 persons)…In 2000, motor vehicle–related injuries resulted in 13,049 deaths among young people (13.62 per 100,000 persons). Twenty years later, there has been a nearly 40% decrease, with 8234 motor vehicle traffic deaths (8.31 per 100,000 persons) recorded in 2020.