Generalists or Specialists? Transformer Models. Time to Migrate your PDTs. Designing Resilient KPIs. [DSR #178]
❤️ Want to support this project? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
Wow. Short, incredibly well-written, insightful. Here’s a short snippet, but it’s really worth the 4 minutes required to read the whole thing:
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
This post takes an important stance: it’s arguing that hiring a team of generalist data scientists is better than hiring a team of specialists. The author, head of algorithms at Stitch Fix and previously of Netflix, is obviously very qualified to make this case.
In the post, though, I think he alludes to one of the biggest arguments against it:
Finally, the full-stack data science model relies on the assumption of great people. They are not unicorns; they can be found as well as made. But they are in high demand and it will require competitive compensation, strong company values, and interesting work to attract and retain them. Be sure your company culture can support this.
First, there are a fairly small number of companies who can provide this set of things. Second, and more importantly, making a generalist data scientist is quite hard: it requires lots of time and support from an experienced team. And in the process of becoming a generalist, it often makes sense to do “tours of duty” as a specialist in various areas.
I’d reframe this piece by saying that as an individual data scientist, your goal should be to evolve into a generalist over time. You will likely command a higher salary, will have more fulfilling experiences, etc. But if you’re building a team of data scientists, you’ll probably need to hire a diverse set of folks. Unless you’re leading a team at Stitch Fix or Netflix…then you can make decisions largely free of constraints 😉
Transformers are a type of neural network architecture that have been gaining popularity. Transformers were recently used by OpenAI in their language models, and also used recently by DeepMind for AlphaStar — their program to defeat a top professional Starcraft player.
Transformers were developed to solve the problem of sequence transduction, or neural machine translation. That means any task that transforms an input sequence to an output sequence. This includes speech recognition, text-to-speech transformation, etc.
This article is impressive—having never dug in on attention models or transformers before, I learned a lot. It’s long, though, so maybe save it to digest when you have some time.
If you use Looker and dbt together—and more and more companies do—this post is a must-read. It’s by Dylan Baker, a member of the dbt community who has been using both products together since dbt’s earliest days. Dylan outlines exactly why you should be transitioning your existing Persistent Derived Tables over to dbt.
If you haven’t yet made the move, this post lays out why you should.
As the stewards of good data practice within a company, we data professionals are often asked to help set metrics, [and] there’s a whole meta-game around the process that we’re playing(…)
This topic—designing metrics that are less susceptible to “gaming"—is one that I’ve actually never seen covered before. And it’s a very good one: often, it’s actually the most important issue in setting metrics that will be used to assess an organization’s (and thereby multiple individuals’) performance. Why? Goodhart’s Law, of course:
“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”
This is an excellent read. Highly recommended.
As always, Nathan Yau does an amazing job of highlighting the point he wants you to take away from a visualization. There’s a particular story he’s telling in the data above, and it couldn’t be more clear: from nothing, online dating has become the #1 way that we meet our romantic partners over just a few decades.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123