8 Comments
Apr 25, 2022Liked by Tristan Handy

I think your thoughts on what-if analysis are spot on. Innovation in that area is still happening, even if it is on the fringes of the MDS ecosystem.

I have been tinkering with Summit (https://usesummit.com/)?

It is an event-based simulation tool for building out forecasts. Initial assumptions can be tweaked to model out what-if scenarios.

The company positions itself as more "whiteboard" over spreadsheet (https://summit.readme.io/docs/what-it-is-why).

The app even has input hooks for feeding in metrics and output hooks for bringing the forecasts back into a database (https://summit.readme.io/docs/model-output-api), although it could certainly benefit from a tighter integration.

Expand full comment
author

Whoah--was just checking Summit out. Seems *very* cool. Thanks for sharing!

Expand full comment

Thanks, Ben! In terms of your wishlist, Tristan:

"A metric() function." - completely agree. Summit uses data references like so: ${recurring_revenue_202203} which injects metrics into your functions. We think of metrics as environmental variables for your model, essentially.

"A native understanding of actuals vs. forecasts." Yes, a Summit model inherently understands time, and will auto-progress to the next calendar month unless you declare a static/fixed time on the objects (events).

"A human-readable file format." Yep! Summit models can be represented as YAML (tentative), which can then be committed/diff'd/etc -- 'just like real code.' Which brings me to ...

"The ability to compile the entire model into a function ..." So we don't do the 'language of your choosing', but Summit is based on a DSL written on top of Python. It's called SEL (Summit Event Language), which means models _are_ code. By this summer, model developers will be able to publish an API specific to their model so others can treat their model as a lambda function (Summit models are idempotent, unlike spreadsheets).

Happy to share more anytime. We've been at this for about 2 years and couldn't be more aligned in terms of the hole-in-market and the vision of this post. :)

Expand full comment
Apr 25, 2022Liked by Tristan Handy

Really enjoyed this post!

One piece of the imagined "great scenario analysis tool" I thought was missing was functionality to *easily combine events that have probability distributions*.

I think that the statistical wherewithal to do this is surprisingly rare in our collective analyst's toolkit.

For example - let's say two teams each have "new ARR" forecasts that come with Bear/Base/Bull scenarios, and they both tell you their Bull scenarios have just a 10% chance of being met or exceeded.

What's the probability of an "Overall Bull" scenario? (I.e. that new ARR >= the sum of both teams' Bull scenarios.)

Naively we might expect it's 10%^2, if the plans are independent. But this is not the case, since there are many ways we could get to the same overall new ARR. (E.g. a big overhit in one team, and a miss in the other). Plus in reality, the plans are not independent.

I had this question and put it to an actuary friend, for whom the answer was obvious - this type of question is complicated to solve analytically, but easy to solve with a (stochastic) simulation model.

I.e. we input the Bear/Base/Bull probabilities & ARR impacts for each team's plan, define the "correlation factor" between plans if needed, run N simulations, and receive an output in the form of a probability distribution of overall new ARR exceeding certain values.

So would love to throw "easy simulation modelling in an analytics context" into the mix here :)

Expand full comment
author

I love this--I've only done a bit of this type of modeling before and was curious about how to draw this into the mix but intentionally left it out so as not to get too far out over my skis. If you or others have links that folks can learn more about how to do this that would be A++!

Expand full comment

This seems like a good overview: https://www.vosesoftware.com/Monte-Carlo-simulation.php. For Excel there's also https://www.palisade.com/risk/

Expand full comment

Thank you for the mention!

Expand full comment

Older anecdotes from my career:

(1) When I worked at Motivate (Bikeshare industry - using DBT + Looker) - first we built Tables and Looker models oriented towards different "parts of the business", e.g. Looker Explores like: Rides, Members, Sales, etc. Eventually, we supplemented those with an "overall business metrics" model, which was less oriented towards exploring interactively in Looker, and more purpose built for data dumps that people could then paste into Excel. We jammed all of the relevant aggregated data (metrics) into a lowest common denominator schema, something like: {Date Period, City/Program, Metric (Label), Metric (Value)}.

This was by no means perfect, but met a lot of the needs of those "business users". Note: I also don't like that designation - I was on both sides, both managing the transformation / modeling, but also doing data dumps and building spreadsheets for forecast / budgeting etc.

(2) Earlier in my career a company I worked at used SQL Server and an SAP Product. The way I remember that working was that scheduled jobs would run and pre-aggregate data from our warehouse tables into some common hierarchical dimensional schema, and that data was available to pull into Excel via an Add-In. The syntax / formulas were no so unlike those in your post!

I mention (1) just as confirmation / evidence - I agree with your take that the modern data stack helps more efficiently and elegantly explore "the facts" - "what happened" - but spreadsheets (and notebooks) are still the realm where that "what if" scenario analysis happens. The tools / tech described in (2) are interesting because of your description for what a modern spreadsheet interface could look like. It's interesting to see how certain solutions did exist pre-modern-data-stack (and some of this stuff is cyclical, maybe it will come around again). I don't have a coherent description of it, but there is probably something to be said about the shift from legacy data stack vs. modern data stack, where the practice of building conformed dimension cube tables became less necessary to produce reporting.

Expand full comment