There is no right way to build a data team

Also in this issue: what your SQL tells you about your organization, confusing motion and progress, and what software engineers can learn from data humans

Jan 08, 2023

It’s January, a new year, and hopefully everyone is feeling well rested after the holiday season and ready to take on some spicy data topics!

This week I’m covering:

SQL tells a human story, by Laura Ellis
Don’t confuse motion with progress, by Sean Byrnes
Should software teams start learning from analytics engineers?, by Petr Janda

But first, a quick think piece inspired by the joyful season of annual planning 🤓

-Anna

There is no right way to build a data team

We spend a fair bit of time in this newsletter covering conversations about running your data team ‘the right way’ — like a product team, perhaps, or based on constraints; or maybe even in full on proactive mode driving the business forward.

What if everybody is right? Even the humans to whom we are making these arguments?

Anna’s theory of everything #267 (no, not really. I’m not counting. Are you?)

The design of your data team today is a reflection of the current challenges of your business. And that thing that you feel you absolutely must do to evolve your data team and be successful — it’s a window into the next phase of growth your organization is heading towards.

To illustrate my point, I’m going to use a tidy management framework from 19721 by Larry E. Greiner:

The essence of Greiner’s argument isn’t that there is a ‘right way’ to grow an organization at a certain size. It is that no approach to growth is immune from an eventual organizational crisis, and you can largely predict the upcoming crisis based on observing the present philosophy the organization is adopting as it grows. This happens to be quite handy for data team leaders ;)

Let’s pick a mid-stage example that might sound familiar to this audience: delegation/control.

Stage of Growth: delegation.

In this phase, a hypothetical organization (let’s call them the Prata Partners) has just emerged from a crisis of autonomy precipitated by the inability of a small number of organization leaders to centrally manage the priorities of a fast growing workforce. The organization has successfully figured out how to delegate decision making to humans who are closer to the problem (either in a specific regional market, or within a department). Budgets are being assigned at this delegated level, and as long as the numbers work and the divisions are reporting success, things feel like they’re going pretty well.

❓How might you deploy a data team within our friends’ the Prata Partners’ organization?

💡 You’re probably going to converge on some form of embedded team structure, with each division trading off their respective budget so that you can offer them an in-house data expert in exchange, who helps accelerate their part of the business. “No budget - no data”, might become your mantra for the foreseeable future.

❓What problems are likely to pop up for you, the data function leader, if you take this approach?

You’re likely going to find it more difficult to invest in platform improvements that will benefit everyone, but are unlikely to be funded by any one division alone.
If you didn’t succeed in convincing all divisions to give you budget to hire data humans, you’ll likely see some divisions spinning up their own data resources, systems and tooling.
Data sprawl will start to propagate across the organization no matter what you do: multiple variations of metrics will abound, different frameworks will be used, results will be stored in random places and formats. Most annoyingly for you, the trained analytics engineer turned data leader, the bringer of order to chaos… divisions will be solving the same problems independently of one another, over and over again, unaware that others are asking the same questions and facing the same challenges.

Upcoming crisis stage: control

Now, check out what Mr. Greiner has to say about the next crisis stage our Prata Partners are about to face:

A serious problem eventually emerges, however, as top-level executives sense that they are losing control over a highly diversified field operation. Autonomous field managers prefer to run their own shows without coordinating plans, money, technology, and personnel with the rest of the organization. Freedom breeds a parochial attitude.

Sound familiar? :)

The “So what?”

It can be easy for data humans to forget that the problems we see in the way the organization uses data are mirror reflections of the broader challenges the organization is going through, or about to go through.

The really cool thing about being a data leader is that no matter how your team is organized, you are usually able to observe a large surface area of an organization through the experiences of your respective team. You’ll likely be one of the first humans to see across the divisions of Prata Partners and recognize the common threads in the challenges those divisions are going through.

This same position can also make a data leader feel like they’re swimming upstream because the things they are asking for and need to be successful (e.g. in the case of the Prata Partners data leader, some central platform resources), and the things that are most challenging for their teams (information sprawl and lack of coordination), are the very first taste of the next crisis stage the whole organization is going to go through. Put another way — the data team is the perpetual canary in your organization’s coal mine.

The bad version of this experience is a data team that is spending more of their time in crisis mode than the rest of the business, because they’re the first to feel its effects. If you have high turnover on your data team, it’s possible that perpetual crisis mode is what it feels like to be on the ground on that team.

The good news

There is, however, a good version of this. If you pay attention to what your data team is telling you, you’ll find this a fantastic signal for what’s coming up ahead for the rest of the business. And if you respond quickly with needed changes, it is a solid investment in making sure your data team is one step ahead of the problem and your resident chaos busters are walking in step with you to solve it when it does come up for the rest of the business.

Like what you read? Share this post!

Elsewhere on the internet…

Continuing with the theme of sociotechnical systems in this issue, Laura Ellis shows us how much those scary 1000 line SQL scripts that no one in the organization wants to mess with are really a reflection of other parts of organizational process or communication that are broken.

The one that most hit home for me: manually filtering out records that are “wrong” because they mess with an important metric or report.

What was the human exchange that made these lines of code come to be? How were these erroneous records discovered? Was there a reason that the system team did not clean the records at the source? Did the data team feel empowered to ask the system team to clean the records up?

Right in the feels, this one.

In Don’t Confuse Motion with Progress, Sean Byrnes gives one of the most real and accessible descriptions of what it means to drive impact in a business, regardless of your business function. This should be required reading in 2023.

It is never a good time to fall into the trap of thinking that being busy = making forward progress. But this year is likely to be particularly unforgiving of anyone stuck in this place for too long.

In times of greater economic pressure, it can be hard to know how much is enough, and when to stop. For this reason, I’m a big fan of career ladders that distinguish between expected behaviors at a given career level and the expected level of impact. It is immensely helpful to negotiate these expectations up front with your team as a manager, and with your manager as a direct report. Ask yourself, “How does our team/function evaluate our impact? What direct ways do we impact the business?” and push yourself to measure them in 2023.

It isn’t possible to always be successful at making forward progress with every bet one makes, or every project/program one works on. But make 2023 the year to get in the habit of evaluating how well you’re doing at making at impact on the business, if you want to become more consistent at making an impact over time.

Petr Janda shares some engineering a-ha moments from his new startup that came up because of his data background, and the resulting identity crisis that ensued. The a-ha moment involved adding data runtime tests to operational systems to improve reliability of the product.

When using analytics engineering techniques in software, I started wondering what job I was doing. Was it software, data, or analytics engineering?

A very banal version of this came up for me recently as well — I found myself explaining CTEs to a software engineer. Turns out, CTEs aren’t common practice in OLTP style SQL writing, which makes a ton of sense — you’re operating on one or a few records at a time at most. It’s generally a bad idea to encode too much logic into a single query.

In analytics, however, CTEs are invaluable to simplify the necessary logic that goes into shaping your data models. When teaching folks with a software engineering background how to read, understand and modify analytics code, I start with the CTE. I’m always pleasantly surprised by how quickly folks who already know how to code pick up on the basic principles — “this with statement, that’s like defining a variable?” Yes! “The select * from table thing you do at the end of every data model, is that so you can easily print out any given CTE and debug it without rewriting the rest of the query?” Also yes!

There will be much more practices, processes, tools, and knowledge to share between teams building operational and analytics systems than we might realize today. Especially as practices around the engineering of large analytics systems mature, we should expect and be open to learnings that go both ways.
In the end, it’s all just engineering.

Hear hear!

Before you balk at the age of the piece, consider its enduring relevance: Greiner’s descriptions of organizational growth and crisis phases need relatively minor modification for work practices, cultural standards and technology today.

The Analytics Engineering Roundup

Discussion about this post