Wide Tables or Star Schemas? The Turing Award. Data Science @ Uber. Geo Experiments in Marketing. [DSR #180]
❤️ Want to support this project? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
Michael Kaminsky of Gradient Metrics takes us through a benchmark comparison of data architectures.
I’m so glad this post exists. I’ve witnessed (and been a part of) so many conversations about data modeling practices that have gotten almost religious: there are Kimball advocates who will clutch their star schema ERDs to the end. My belief is that there is plenty to recommend traditional star-schema-style modeling, but that modern data tech allows us much more flexibility in our design choices. Often times, performance considerations outweigh the need for the tidy-ness of a good star schema.
And that’s where this post comes in. It does a solid job of benchmarking performance of the two common design patterns for data models and does so in the three leading data warehouse platforms. Turns out, it’s faster to denormalize the data into a single large table on each database platform.
This doesn’t necessarily mean that you should only make completely denormalized tables, but it should weigh into your design thinking.
At The Economist, we take data visualisation seriously. Every week we publish around 40 charts across print, the website and our apps. With every single one, we try our best to visualise the numbers accurately and in a way that best supports the story. But sometimes we get it wrong. We can do better in future if we learn from our mistakes — and other people may be able to learn from them, too.
For their work on neural networks, Geoffrey Hinton, Yann LeCun and Yoshua Bengio will share $1 million for what many consider the Nobel Prize of computing.
Geoff Hinton looks like he sees the coming doom of our entire species every time I see a photo taken of him. Tell me I’m wrong.
That aside, I’m not sure that this is particularly surprising—these are names you’re likely well-familiar with already, and there couldn’t really be more popular attention on AI today than there already is. Still, I didn’t want you to be left out in this week’s water cooler conversations.
I’m a bit surprised that this topic doesn’t get covered more often: everyone loves A/B tests but there is very little knowledge shared in the community on geo experimentation. There are many instances where marketing use cases can only be solved by designing geo experiments! If you touch marketing analytics and don’t have this tool in your tool kit, this post is highly recommended.
We spoke to Data Science Director Fran Bell about machine learning at Uber and what she finds most challenging—and rewarding—about her work.
Fran lead’s Uber’s Data Science Platform team. I don’t often link to interviews, but I found this to be an unusually worthwhile combination of personal insights and company insights. Solid read.
Great benchmarks comparing the performance of various different types of operations with different file formats. This is something that has recently become more relevant for me as dbt now supports Spark and we are also finding more companies directly querying raw files in S3 via both Snowflake and Redshift Spectrum. Format matters a lot.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123