Ep 50: Roche’s Data Transformation Journey (w/ Yannick Misteli)
Moving to cloud to build better data infrastructure
[Join data practitioners worldwide at Coalesce 2023—built by data people, for data people. Register today.]
Yannick Misteli is the head of engineering for the go-to-market domain at Roche, a $250 billion multinational pharmaceutical and diagnostics company.
Roche was an early supporter of dbt Cloud, and Yannick helped move his team of 120+ engineers to a modern data stack. He always finds a way to push the boundaries to make a large company founded in 1896 incredibly modern and innovative. We wanted to know more about the "how" of the work—the people, process, and technology.
Read more about Roche's data journey here: https://docs.getdbt.com/blog/dbt-squared
Listen & subscribe from:
Key takeaways from this episode:
What was the starting point of Roche and the go-to-market data stack when you showed up and where are you trying to head to?
Yannick: When I first started, I realized very quickly that we didn't have the infrastructure that was suitable to do all this cool stuff that we wanted to do. I think it was also a good time because there was an initial push to start doing things in the cloud. For me, it was really nice because I could bring in all the experience that I already had in the cloud so we could start to leverage that. What happened, as a result, was that we leveraged use cases as a driving force to construct the appropriate data and analytics platform.
We almost took it as an excuse to build the platform. Equally important, we also recognized the necessity of demonstrating business value.
Julia: Did you have a cloud data warehouse three and a half years ago?
Yannick: We started building that from scratch. For me, it was important to come up with a good strategy to start to solve those use cases, but at the same time build something that could scale.
That's also what the cloud is built to scale. That's what we did. To be honest, the ultimate objective was to establish a robust engineering foundation. It's having this foundation and also the people. It's like building a strong muscle that allows you to construct virtually everything.
It's not only the use cases or the platform, it's to build out that engineering muscle to solve for the future use cases that you want to solve.
Julia: What was the hardest part of moving your data to the cloud? Was it getting funded?
Yannick: Technically speaking, the toughest part was dealing with the new things that are already in the cloud, but also the legacy systems that are not and how to bring that together.
This was the hardest part. We needed to find some shortcuts to get the data from legacy systems. I would say that was probably not so easy. The funding is something that will come naturally, but that's what I meant before that.
It's very important that you can continuously demonstrate the added value and solving, picking the right use cases for the right time to demonstrate that.
How do you structure your data team at Roche?
When I started to build out the team, the 40 first engineers I interviewed myself and I paid a lot of attention to who I brought on board and who fit in. Ultimately, we grew to 150 engineers now in the metrics organization.
On one side, we have the data engineers. Those guys are only responsible for bringing the data to the platform. They're very good in CDK CloudFormation and JDBC APIs. Their only job is to bring the data in. They don't necessarily understand the data, but they bring the data in.
We have data analysts or business analysts. They work very closely together with the business to define the transformation rules that then are handed over to the analytics engineers that implement the data pipelines and in dbt, of course, also the information engineers. It's another capability. They work closely together. Some other people might call them data architects. They're also very important for building reusable data products.
How do you think about bringing new tools into your stack and making sure that the investment is worthwhile for the business?
Unfortunately, the appetite is too big. I always push back. When we talk about technology, my first question will always be, "What problem do you really want to solve here?" Give me all the business context to help you define the problem. There's this famous saying from Einstein: “If I had one hour to save the world, I would spend 55 minutes defining the problem and five minutes finding the solution.”
Unfortunately, tech people, including me, jump on the technology too quickly. That's also one criteria that I use. When we decide about new technology or new architecture we think, "How difficult is it to migrate away from that potential solution?"
One criteria is how to not use that technology anymore. The second criteria is how complicated it is to integrate with the rest of your ecosystem. Unfortunately, those two things are not considered enough. You go for the best in class or whatever, and then you end up with a lot of integration problems. Or you go for best in class, and then you realize it's extremely hard to migrate away from it. That's why I'm a bit hesitant to introduce new tools into the technology stack.
Looking 10 years out, what do you hope will be true for the data industry?
Going again with people, processes and technology. On the people side, I hope that in 10 years we will have more data literacy and a stronger engineering culture across companies. Also, I hope that there is no discussion about business versus IT anymore. Everybody is one team. Everybody has an IT background, a business background.
On the process side, I hope that we continue the evolution from ETL to ELT to what people call zero ETL. We don't need to move data around anymore so that the transactional and the analytical system get closer together.
On the technology part, I hope that we will have more open standards in 10 years. I'm a big fan of the open table format. I hope we're moving more in that direction, that we have more interoperability and more options to do stuff with different technology stacks.
[Join data practitioners worldwide at Coalesce 2023—built by data people, for data people. Register today.]
Headquarter of Roche is in Basel NOT Zürich.