Discussion about this post

User's avatar
Alex's avatar

Hi Tristan. Interesting post and I am happy to see practitioners trying to tackle the full lifecycle of analytics.

There are many good parts to this, and for people new to the space, your piece summarizes many important aspects of the modern data stack. Since it is already a good preliminary summary, I will instead add some constructive criticism on a handful of areas where I think more explanation could be useful.

In the `Intro`, you mention 4 types of truth claims: descriptive, causal, predictive and prescriptive. What I think is important to note, and to spend more time on at some point (though today it evades much of the modern data stack), is the correlative nature of the last 3. Causation (“if this occurs, then that will occur”), prediction (“given this, we will observe that”) and prescription (“if I do this, then that will occur”) are all correlative in nature. In particular - and in contrast to univariate descriptive statistics (“reporting”) - we are looking at the relationship between two or more variables. We know the answer to this lies in statistics/data science/machine learning/causal inference, but your piece does not really make mention of these.

When I think of “analytics”, I think of developing an understanding of the data. Certainly, this includes univariate, descriptive reporting. Business intelligence. Dashboards. This is where dbt shines. But I also think of “if-then” causality. “Why did this go up?” “Will that go up if we do this?” Many, many analytical questions are attempts to infer causality (though, in practice, this is quite difficult, so we settle for correlation, and thus the vast majority of statistics) - in other words, how a system works. What levers affect which outputs. As data practitioners, we are stuck with the data, but the analytical nature of our job is to statistically infer the “data generating process” (DGP) - that is, the shape of the system itself.

As a result, I think it’s a stretch to reach for the mantle of “analytics” when, so far, we’re really just talking about reporting. Displaying data is admittedly, in its own right, very difficult to do correctly/reliably/durably/comprehensibly/”auditably”/performantly/resiliently, and I think you emphasize more or less the right points here.

In your `Requirements of a mature analytics workflow`, some additional requirements - which maybe you are placing into existing categories - could include: Durability (what if someone deletes the staging table in Snowflake?), Semantics (Governance generally includes this, though your description mostly focuses instead on regulatory compliance), Discoverability (also under Governance), and Monitoring/Observability (perhaps this falls into Reliability, though monitoring is such an important space - DataDog, Splunk, New Relic, etc. - that the keyword I believe deserves a mention).

In the `Stakeholders of the ADLC`, I mostly agree with these, though I do think there is a “researcher” persona which is different from the others. Perhaps you are placing it under the analyst - which is reasonable - though in practice the skills of a researcher (statistics, econometrics, data science) overlap almost not at all with the typical data/business analyst, who instead needs business context, dashboard skills, PowerPoint/Excel, and maybe a bit of pandas. For the 3 roles as you have them, I personally differentiate them by what they are trying to accomplish: to assemble the data, to understand it, and finally, to act on it.

In the `Hats, not badges` section, the word that jumped to mind to me was “full-stack analytics” - i.e. we ought not to be strictly confined to our title’s narrow domain.

In the `ADLC model` section, I don’t necessarily agree that “analytical systems are software systems.” PowerPoint for example is an integral part of how we communicate our analyses to stakeholders. It is absolutely the case that an “analysis” today is memorialized in a 10-slide deck, and frankly, I think this is the best output for it. Perhaps it is debatable whether that should be the case, but I think it is unobjectionable that most analyses today are conveyed through presentation format. In fact, one distinction I find between a dashboard (live) and a PowerPoint (static) is precisely due to the fact that a PowerPoint is static: we are taking a stance (snapshot) at a particular point of time (of the data) and making a recommendation (today) because of it. Looking at a live dashboard does not achieve this; the narrative nature of a PowerPoint does.

Exploratory data analysis, in my experience, is also not really a software system. I fully anticipate that 90% of our research is throwaway work. So it is the case with all science. Of course the last 10% we do indeed memorialize as code and it is checked into VCS, but 90% is not, and it does not adhere to most software principles. I often differentiate between “research code” and “software code”, and research code exists because it provides a useful, albeit transitory, purpose. I would say that “reporting systems” or “BI systems” are indeed software systems, but analytical systems more generally seems like a bit of a far reach.

In the `Discover and Analyze` section, you discuss `Discover` in detail (mostly a lot of governance items) but I feel like the `Analyze` section is lacking.

Overall, I think it is a good introductory post, although I imagine a comprehensive explanation of the “Analytics Development Lifecycle” would likely necessitate more of a book-length review. I look forward to reading more!

Expand full comment
lewis34's avatar

Announcing the Analytics Development Lifecycle (ADLC), a structured approach to building and managing data solutions. For <a href="https://busimulatorultimate.com/">Bus Simulator Ultimate</a>Mod Apk, ADLC can streamline the process of collecting, analyzing, and visualizing gameplay data. It helps developers optimize features, track player behavior, and enhance the game by following a clear, iterative analytics workflow.

Expand full comment
3 more comments...

No posts