The history and future of the data ecosystem (w/ Lonne Jaffe)
Mainframes, relational databases, ETL, Hadoop, the cloud, and all of it
In this decades-spanning episode, Tristan talks with Lonne Jaffe, Managing Director at Insight Partners and former CEO of Syncsort (now Precisely), to trace the history of the data ecosystemâfrom its mainframe origins to its AI-infused future.
Lonne reflects on the evolution of ETL, the unexpected staying power of legacy tech, and why AI may finally erode the switching costs that have long protected incumbents. The future of the AI and standards era is bright.
Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.
Listen & subscribe from:
Episode chapters
00:46 â Meet Lonne Jaffe: background & career jurney
Lonne shares his career highlights from Insight Partners, Syncsort/Precisely, and IBM, including major acquisitions and tech focus areas.
04:20 â The origins of Syncsort & sorting in mainframes
Discussion on why sorting was a critical early problem in hierarchical databases and how early systems like IMS worked.
07:00 â M&A as innovation strategy
How Syncsort used inorganic growth to modernize its platform, including an early example of migrating data from IMS to DB2 without rewriting apps.
09:35 â Technical vs. strategic experience
Tristan probes Lonneâs technical depth despite his business titles; Lonne shares his background in programming and a fun fact about juggling.
11:55 â Why this history matters
Tristan sets up the key question: what lessons from 1970s-2000s ETL tooling still shape the modern data stack?
13:00 â Proto-ETL: The real OGs
Lonne traces the origins of ETL to 1970s CDC, JCL, and early IBM tools. Prism Solutions in 1988 gets credit as the first real ETL startup.
15:40 â Rise of the ETL market (1990s)
From Prism to Informatica and DataStageâearly 90s vendors brought visual development to what was once COBOL-heavy backend work.
18:00 â Why people offloaded Teradata to Hadoop
Exploring how cost, contention, and capacity drove ETL out of the warehouse and into Hadoop in the 2000s.
20:00 â Performance vs. price: Jevons Paradox in ETL
Why lower compute and storage costs led to more ETL, not lessâand how parallelization changed the game.
22:30 â Evolution of data management suites
How ETL expanded into app-to-app integration, catalogs, metadata management, and why these bundles got bloated.
25:00 â Rise of data prep & self-service analytics
Tools like Kettle, Pentaho, and Tableau mirrored ETL for business usersâspawning a whole âdata prepâ category.
27:30 â Clickstream, logs & big data chaos
How clickstream and log data changed the ETL landscape, and the hope (and letdown) of zero-copy analytics.
29:10 â Why is old software so sticky?
Tristan and Lonne explore the economics of switching costs, the illusion of freedom, and whether GenAI could break the lock-in.
33:30 â Are old tools actually⊠good?
Defending mainframes and 30-year-old databases like Cache. Sometimes the mature option is betterâjust not sexy.
36:00 â The new vs. the durable
Modern tools must prove themselves against decades of reliability and robustness in finance, healthcare, and compliance.
38:20 â GenAI in data: The early movers
Lonne highlights why companies like Atlan and dbt Labs are in the best position to winâdistribution, trust, and product maturity.
41:00 â TAM and the Jevons Paradox, again
Revisiting how price drops expand TAM. Some categories vanish, others explodeâdepending on elasticity of demand.
43:15 â Unlocking new personas with LLMs
Structured data access for non-technical users is finally viable, but âit has to be rightââtrust and quality remain the barrier.
46:00 â Real-world examples: dbtâs MCP server win
Tristan shares how dbtâs Metadata API became a catalog replacement for a traditional financial institutionâan unplanned AI GTM success.
48:30 â Agents, not interfaces
New pattern: LLMs as agents interacting directly with infrastructure via APIs. Tool use is becoming table stakes for AI integration.
50:30 â Are LLMs birthright tools yet?
Discussion around adoption of ChatGPT Enterprise, Claude, etc. Lonne suggests adoption is accelerating fastâand the usage model matters.
52:00 â Looking ahead
The conversation ends with a reflection on GenAIâs near future in data workflows, TAM expansion, and what the next episode might tackle.
Key takeaways from this episode
Tristan Handy: You've had a long career in tech. Maybe start by giving us the 30,000-foot view of what you've been up to over the last couple decades?
Lonne Jaffe: Iâve been at Insight Partners for about eight years now, working mostly on deep tech investmentsâAI infrastructure companies like Run AI and deci.ai, both acquired by Nvidia. Iâve also done work with data infrastructure companies like SingleStore. Before Insight, I was CEO of a portfolio company called Syncsort, now Precisely. It was founded in 1968.
Prior to that, I was at IBM for 13 years, working in middleware and mainframe technologies. Products like WebSphere, CICS, and TPFâfoundational systems for enterprise computing.
Tristan Handy: And Syncsort's origin was in sorting, right? Literally sorting files?
Lonne Jaffe: Exactly. In the early days of computing, sorting was a huge part of what you did. Much of the data was hierarchicalâstored in IMSâand had to be flattened into files to process. The algorithms were optimized to run in extremely resource-constrained environments.
Tristan Handy: Fascinating. And I assume as compute and storage improved, the data integration landscape evolved?
Lonne Jaffe: Yes. We saw a move from hierarchical to relational databases, then toward ETL tools in the 80s and 90s. The first real ETL startup was probably Prism Solutions in 1988. Informatica and DataStage showed up in the early 90s, followed by Talend and others.
Tristan Handy: It seems like we got a whole bundle of tools over timeâETL, CDC, app integration, metadata, and so on.
Lonne Jaffe: Yes, often bundled together, even though data prep and app integration were treated separately. That persisted for longer than you'd expect. At Syncsort, we acquired a company with a "transparency" solution that allowed IMS applications to use data stored in DB2 without rewriting codeâa clever way to manage switching costs.
Tristan Handy: Speaking of switching costsâwhy are these legacy tools so sticky?
Lonne Jaffe: Great question. In many cases, no customer loves the product. Theyâd switch in a heartbeatâif it were easy. But rewriting jobs and ensuring reliability is a heavy lift. The best outcome is a new system that replicates old functionality. And for many organizations, thatâs not worth the risk.
Tristan Handy: But if generative AI could reduce those switching costs?
Lonne Jaffe: Thatâs the potential. Code generation, agents that explore and iterateâthose could erode the moat thatâs protected these incumbents for decades. Not tomorrow, but itâs a real possibility.
Tristan Handy: It also seems like some of these systems are more robust than people give them credit for.
Lonne Jaffe: Absolutely. Mainframes are IO supercomputers. Products like InterSystems Cache, used by Epic, are incredibly performant. But new systems must match or exceed those capabilities in reliability and scale, which is a high bar.
Tristan Handy: As you look at the evolution of the modern data stack, how do you think about its impact on the market?
Lonne Jaffe: In the 2010s, we saw disaggregationâtools like Fivetran, dbt, and Snowflake each tackled a slice of the old enterprise bundle. But the TAM isnât infinite. Some categories may compress or vanish entirely if price drops arenât offset by new demand.
Tristan Handy: Do you think AI expands or compresses the data stack?
Lonne Jaffe: It depends. High elasticity of demandâlike with dashboards or analyticsâcan drive massive TAM expansion. But some categories, like logo redesign or simple data movement, might get commoditized. For more complex workflows, AI agents accessing platforms like dbt or Atlan could dramatically increase value by automating common tasks and enabling new personas.
Tristan Handy: Weâve seen an example alreadyâa customer replaced their data catalog with our dbt Cloud metadata server and AI interface.
Lonne Jaffe: Thatâs telling. If AI interfaces can connect to tools like dbt and generate valueâself-service, documentation, lineageâit changes the game. Especially for organizations already standardized on those platforms.
Tristan Handy: Whatâs your view on how these AI interfaces get distributed?
Lonne Jaffe: ChatGPT Enterprise, Claude, and others are spreading fast. Eventually, youâll want those tools to search files, access internal metadata, and interact with your data stackânot just answer questions from the open web.
Tristan Handy: It makes a lot of sense. If AI is going to serve enterprise users, it needs access to the real data. Otherwise, itâs just a toy.
Lonne Jaffe: Exactly. A model that canât query or verify against your actual environment wonât be reliable. And data quality and observabilityâsomething dbt Cloud is already good atâbecome foundational.
This newsletter is sponsored by dbt Labs. Discover why more than 50,000 companies use dbt to accelerate their data development.