Apr 25, 2022Liked by Tristan Handy

I think your thoughts on what-if analysis are spot on. Innovation in that area is still happening, even if it is on the fringes of the MDS ecosystem.

I have been tinkering with Summit (https://usesummit.com/)?

It is an event-based simulation tool for building out forecasts. Initial assumptions can be tweaked to model out what-if scenarios.

The company positions itself as more "whiteboard" over spreadsheet (https://summit.readme.io/docs/what-it-is-why).

The app even has input hooks for feeding in metrics and output hooks for bringing the forecasts back into a database (https://summit.readme.io/docs/model-output-api), although it could certainly benefit from a tighter integration.

Expand full comment
Apr 25, 2022Liked by Tristan Handy

Really enjoyed this post!

One piece of the imagined "great scenario analysis tool" I thought was missing was functionality to *easily combine events that have probability distributions*.

I think that the statistical wherewithal to do this is surprisingly rare in our collective analyst's toolkit.

For example - let's say two teams each have "new ARR" forecasts that come with Bear/Base/Bull scenarios, and they both tell you their Bull scenarios have just a 10% chance of being met or exceeded.

What's the probability of an "Overall Bull" scenario? (I.e. that new ARR >= the sum of both teams' Bull scenarios.)

Naively we might expect it's 10%^2, if the plans are independent. But this is not the case, since there are many ways we could get to the same overall new ARR. (E.g. a big overhit in one team, and a miss in the other). Plus in reality, the plans are not independent.

I had this question and put it to an actuary friend, for whom the answer was obvious - this type of question is complicated to solve analytically, but easy to solve with a (stochastic) simulation model.

I.e. we input the Bear/Base/Bull probabilities & ARR impacts for each team's plan, define the "correlation factor" between plans if needed, run N simulations, and receive an output in the form of a probability distribution of overall new ARR exceeding certain values.

So would love to throw "easy simulation modelling in an analytics context" into the mix here :)

Expand full comment

Thank you for the mention!

Expand full comment

Older anecdotes from my career:

(1) When I worked at Motivate (Bikeshare industry - using DBT + Looker) - first we built Tables and Looker models oriented towards different "parts of the business", e.g. Looker Explores like: Rides, Members, Sales, etc. Eventually, we supplemented those with an "overall business metrics" model, which was less oriented towards exploring interactively in Looker, and more purpose built for data dumps that people could then paste into Excel. We jammed all of the relevant aggregated data (metrics) into a lowest common denominator schema, something like: {Date Period, City/Program, Metric (Label), Metric (Value)}.

This was by no means perfect, but met a lot of the needs of those "business users". Note: I also don't like that designation - I was on both sides, both managing the transformation / modeling, but also doing data dumps and building spreadsheets for forecast / budgeting etc.

(2) Earlier in my career a company I worked at used SQL Server and an SAP Product. The way I remember that working was that scheduled jobs would run and pre-aggregate data from our warehouse tables into some common hierarchical dimensional schema, and that data was available to pull into Excel via an Add-In. The syntax / formulas were no so unlike those in your post!

I mention (1) just as confirmation / evidence - I agree with your take that the modern data stack helps more efficiently and elegantly explore "the facts" - "what happened" - but spreadsheets (and notebooks) are still the realm where that "what if" scenario analysis happens. The tools / tech described in (2) are interesting because of your description for what a modern spreadsheet interface could look like. It's interesting to see how certain solutions did exist pre-modern-data-stack (and some of this stuff is cyclical, maybe it will come around again). I don't have a coherent description of it, but there is probably something to be said about the shift from legacy data stack vs. modern data stack, where the practice of building conformed dimension cube tables became less necessary to produce reporting.

Expand full comment