Interesting perspectives. I think everyone agrees that Snowflake and Databricks and to a lesser extent, AWS, GCP and Azure, have become the defacto data platform for most companies.
I am curious to know your opinion on the direction industry is likely to take amongst the following options:
a)Databricks and Snowflake buy more companies and integrate and customers start adopting everything from them thereby destroying the industry and create an Duopoly Data Stack rather than MDS
b) The current state of needing 5-6 products in the MDS coming from various vendors continue as is in majority of customer deployments
c) New startups like my current employer, The Modern Data Company, succeed in a big way to satisfy the shrinking budgets and skill set with their MDS offering
d) The existing relatively successful single product startups keep expanding their wings and start competing until the number of alternatives come down
Next up, Modern AI Stack. As we will evolve to domain-specific LLM's, we'll need a cloud stack that is optimized for LLM fine tuning and RAG's. We'll need data domains to finetune domain specific LLM's. To reduce hallucinations, your data needs to be of high quality and needs semantic meaning. To reduce security risks, you need dynamic data access controls.
Business terms related to technology have a pretty short half-life. As soon as the Sales and Marketing teams get a hold of a catchy phrase, you can be sure that the definition is morphing into something less useful and descriptive.
My big question in all this is actually around engineers / developers. Will they adopt MDS into their applications? For example - I hear rumors people are using snowflake as a backend for full stack applications. Also - there does still seem to be some gaps in the analytics space that developers have figure out… like I wonder what the analytics version of a feature flag (like a launch darkly for analytics) might be - or using IaC stuff like terraform (I see some doing that - but not many). Will these disciplines wonder together in terms of tools and ways of working or drift apart?
Ok THIS is I think the best argument _against_ everything I just said. A lot of the tech involved here powers analytics (and is primarily _bought_ for analytics use cases) but sometimes it powers non-analytics use cases. Will this be _more true in the future_? This has been predicted for 3-4 years now and it hasn't fully materialized, but I agree if it does that will require updating my priors.
Thanks, Tristan. My last job was leading the data org at an insurance company, where we put this into practice, with a lot of success. I think there's huge potential for companies with a lot of operational complexity to build automation using no-/low-code tooling on top of the global view of the business that the analytics engineers maintain, as opposed to building a lot of bespoke software to do this. This is probably especially applicable in businesses that operate at "human-scale" transaction volume, where it's OK for the system to have some latency and the volume of operations against the warehouse is reasonable.
I wish I had kept a copy of the presentation, but I did a talk at a Reuters Events conference on this topic last year.
@Tristan, Great post and enjoyed reading. I fully agree with the conclusion.. Rising waves are what new businesses ride on. Lasting businesses are build sans waves. Having less attention might not be a bad thing. Like this post, Sanjeev Mohan's and my post on trends a few months back covered similar conclusions and was also based on ground evidence.
Interesting ETL to ELT comment. Many of us starting doing this by 2006-7 long before moving to cloud. Also metadata based auto code generation before cloud venders moved us backward to manual engineering!
Thanks for sharing. By focusing on Analytics Stack, data teams can align better with user needs to drive analytical outcomes such as GenAI, AI and BI, assuming they have modular (easy to use) data capabilities - transform, compute, qualify etc. to plug-n-play with. This has been the promise of cloud native capabilities, but the ease of use part is long way to go.
This creates three opportunities for existing or new data software companies - 1. collapse the stack to offer modular plug-n-play data capabilities, 2. focus on integrated analytics stack - GenAI, AI and BI, 3. do both 1 and 2.
Good reading, thanks. I feel instead of MDS we will be hearing more about Data Intelligence Platforms.
Thank you for this! It's time! I am rather enjoying the "companies put their heads down and focus on the fundamentals" phase.
:D :D
Interesting perspectives. I think everyone agrees that Snowflake and Databricks and to a lesser extent, AWS, GCP and Azure, have become the defacto data platform for most companies.
I am curious to know your opinion on the direction industry is likely to take amongst the following options:
a)Databricks and Snowflake buy more companies and integrate and customers start adopting everything from them thereby destroying the industry and create an Duopoly Data Stack rather than MDS
b) The current state of needing 5-6 products in the MDS coming from various vendors continue as is in majority of customer deployments
c) New startups like my current employer, The Modern Data Company, succeed in a big way to satisfy the shrinking budgets and skill set with their MDS offering
d) The existing relatively successful single product startups keep expanding their wings and start competing until the number of alternatives come down
e) Something else?
isn't that the question! i don't think it's crazy to say that there is $100b or more in market cap that is dependent on this answer.
And the answer is?
Next up, Modern AI Stack. As we will evolve to domain-specific LLM's, we'll need a cloud stack that is optimized for LLM fine tuning and RAG's. We'll need data domains to finetune domain specific LLM's. To reduce hallucinations, your data needs to be of high quality and needs semantic meaning. To reduce security risks, you need dynamic data access controls.
Business terms related to technology have a pretty short half-life. As soon as the Sales and Marketing teams get a hold of a catchy phrase, you can be sure that the definition is morphing into something less useful and descriptive.
Here is the blog from December 2023 that Rajesh mentioned where we said MDS is declining and an Intelligent Data Platform that infuses AI into MDS is rising - https://sanjmo.medium.com/unveiling-the-crystal-ball-2024-data-and-ai-trends-74164da31cf8
My big question in all this is actually around engineers / developers. Will they adopt MDS into their applications? For example - I hear rumors people are using snowflake as a backend for full stack applications. Also - there does still seem to be some gaps in the analytics space that developers have figure out… like I wonder what the analytics version of a feature flag (like a launch darkly for analytics) might be - or using IaC stuff like terraform (I see some doing that - but not many). Will these disciplines wonder together in terms of tools and ways of working or drift apart?
As a former investment banker turned software engineer, that anecdote illustrates why I changed careers 😅
What about the fact that the data stack increasingly serves critical functions beyond analytics, such as AI/ML and automation?
Ok THIS is I think the best argument _against_ everything I just said. A lot of the tech involved here powers analytics (and is primarily _bought_ for analytics use cases) but sometimes it powers non-analytics use cases. Will this be _more true in the future_? This has been predicted for 3-4 years now and it hasn't fully materialized, but I agree if it does that will require updating my priors.
Thanks, Tristan. My last job was leading the data org at an insurance company, where we put this into practice, with a lot of success. I think there's huge potential for companies with a lot of operational complexity to build automation using no-/low-code tooling on top of the global view of the business that the analytics engineers maintain, as opposed to building a lot of bespoke software to do this. This is probably especially applicable in businesses that operate at "human-scale" transaction volume, where it's OK for the system to have some latency and the volume of operations against the warehouse is reasonable.
I wish I had kept a copy of the presentation, but I did a talk at a Reuters Events conference on this topic last year.
Concept handling.. it's always in the concept handling [0]
Out with the old concepts, in with the new..
[0] https://slatestarcodex.com/2016/02/20/writing-advice/
!! love this. I have read SSC from time to time but never this post. Fantastic. Thanks for linking.
@Tristan, Great post and enjoyed reading. I fully agree with the conclusion.. Rising waves are what new businesses ride on. Lasting businesses are build sans waves. Having less attention might not be a bad thing. Like this post, Sanjeev Mohan's and my post on trends a few months back covered similar conclusions and was also based on ground evidence.
Modern Analytics Stack
What does "modern" mean? I think that's the most problematic word. At some point it meant "post-cloud" but now we're all post-cloud.
What about just "analytics stack"?
Interesting ETL to ELT comment. Many of us starting doing this by 2006-7 long before moving to cloud. Also metadata based auto code generation before cloud venders moved us backward to manual engineering!
Thanks for sharing. By focusing on Analytics Stack, data teams can align better with user needs to drive analytical outcomes such as GenAI, AI and BI, assuming they have modular (easy to use) data capabilities - transform, compute, qualify etc. to plug-n-play with. This has been the promise of cloud native capabilities, but the ease of use part is long way to go.
This creates three opportunities for existing or new data software companies - 1. collapse the stack to offer modular plug-n-play data capabilities, 2. focus on integrated analytics stack - GenAI, AI and BI, 3. do both 1 and 2.
What a great piece!