❤️ Want to support this project? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
This is a Very Good Post. If you’re already living in the dbt ecosystem, many of these topics won’t feel brand new to you (“data should be treated as code,” etc.) but the complete picture provided by the post is really fantastic. It hits on everything from explicit ownership of every data asset to clear SLAs to…well, a lot.
What’s perhaps most interesting to me in this whole wonderful post is what’s not said. Specifically: the problems that Uber is experiencing around data are the same effing problems that the rest of us are experiencing. Never mind that they have some of the highest-horsepower data talent in the world and have built incredible, industry-leading tooling in many categories… they’re still trying to get “the basics” right.
The hard problems of data today are fundamentally collaborative, are fundamentally about enabling arbitrarily-large groups of humans to build up and interact with a shared body of knowledge with confidence. This is as much about culture and process as it is about tooling, and Uber is in the mud trying to figure it out right along with the rest of us.
When we talk about getting better at programming, we often talk about testing, writing reusable code, design patterns, and readability.
All of those things are important. But in this blog post, I want to talk about a different way to get better at programming: learning how the systems you’re using work! This is the main way I approach getting better at programming.
Two Julia Evans posts in two issues! 🔥🔥
This is something I always focus on for data analysts / analytics engineers that I train. Data people have a real tendency to want to get a thing done and if “it works” then “why it works” is less interesting (this is a generalization of course and thus lossy). One of my favorite interview questions for senior data hires is “why is Redshift faster for analytical queries than Postgres?” I’ve found that it so perfectly tests for this instinct to understand one’s underlying systems. Here are some other questions that highlight absolutely critical conceptual frameworks for any data person.
How fast are network, disk, and memory access relative to one another?
How does SQL translate into discrete physical processing operations that a database must undertake? How do changes in your SQL map to different operations?
What’s the difference between implementing the same functionality with a loop vs with a map function in Python?
Why is it harder to calculate a statistic across a cluster of computers vs. on a single computer?
If these are questions you can’t answer for yourself, they’re worth some Googling. For each, answer the question itself then follow that up by asking: “Given that, how should I write my code differently?”
Ok…I apologize in advance for linking to something that’s clearly Airbnb engineering recruitment propaganda (gosh those photos are good!). But this is a much-under-discussed topic and I think there’s actually quite a lot to learn here.
There are two human problems that come up _all the time_ in our field: 1) there are insufficient humans with relevant skills industry-wide, and 2) there is a real diversity problem. Apprenticeship programs have the potential to help with both.
There’s a lot to say here, I’ll hit a couple of quick bullets and then encourage you to read the rest for yourself:
An apprenticeship program is different than an internship program. The wording choice is intentional and I love it. Internships are a form of finishing school for already-credentialed candidates, whereas apprenticeships aim to forge mature tradespeople from amazing raw materials. Most companies have internship programs, not apprenticeship programs.
My belief is that most companies don’t do this because they don’t have the freedom or willingness to think long-term. A program like this is a real investment of time more than it is an investment in money—it will take a minute for apprentices to become real contributors. But long-term thinking is the only way to build real moats for your business.
This is about software engineering but it is equally relevant for data. What would it take for your company to start an apprenticeship program?
It’s been, what, nine months since OpenAI launched GPT-3 and put access to it behind an API. We all saw the fun toy examples that people cranked out at the time, but ultimately this thing had to turn into real products. Ever wondered what have people been doing with it (I had)? This post has you covered.
I find each of the three use cases highlighted to be really very interesting…and definitely just the beginning.
AI in a game? Been there, done that. AI to design a game? That’s new to me.
I found this article super-fascinating. AI-as-design-aid is, IMO, a very interesting trend. We’re all well-familiar with AI-as-product at this point, but there are very few instances where AI is currently being used as a force multiplier for the creative process.
Contributing to open source codebases is often more achievable than you think, and it’s a fantastic way to build technical skills, roots in a community, and a sense of purpose (honestly!). In this post, Niall Woodward narrates his journey from first-time-contributor to maintainer of a promising open source project and gives plenty of pointers to other folks hoping to go on this path.
Thanks for writing this Niall—really great stuff. And thanks for your contributions!!
Thanks to our sponsor!
Analytics engineering is the data transformation work that happens between loading data into your warehouse and analyzing it. dbt allows anyone comfortable with SQL to own that workflow.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123