The State of AI. Data Engineering @ Airbnb. Agile Analytics. SQL Window Functions. [DSR #143]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

Three Posts on AI/ML

I don’t do a ton of top-down, big-think pieces (you get bombarded by so many already!) but there were three on AI/ML in the past couple of weeks that were just really excellent. Enjoy.

State of AI

In this report, we set out to capture a snapshot of the exponential progress in AI with a focus on developments in the past 12 months. Consider this report as a compilation of the most interesting things we’ve seen that seeks to trigger informed conversation about the state of AI and its implication for the future.

Really very excellent. It’s a 150-slide-long presentation, so either reserve plenty of time to go through it or flip through to the parts you’re particularly interested in. If you’re paying attention to industry developments you know some of this stuff, but you definitely don’t know it all. Exhaustive.


Ways to Think About Machine Learning

Andrew Ng has suggested that ML will be able to do anything you could do in less than one second. Talking about ML does tend to be a hunt for metaphors, but I prefer the metaphor that this gives you infinite interns, or, perhaps, infinite ten year olds.

Ben Evans, of Andreessen Horowitz, always has clarifying thoughts. In this post he keeps the reader focused on what ML will actually enable over the coming 10-20 years.


Rebooting AI – Postulates

This post lists 10 things the author thinks need to change about the current AI industry. It’s a well-thought-out and cohesive perspective, and will likely challenge some conventional wisdom you’ve absorbed. Here’s the first one to whet your whistle:

We are trapped by Turing’s definition of intelligence. In his famous formulation Turing confined intelligence as a solution to a verbal game played against humans. This in particular sets intelligence as a (1) solution to a game, and (2) puts human in the judgement position. This definition is extremely deceptive and has not served the field well. Dogs, monkeys, elephants and even rodents are very intelligent creatures but are not verbal and hence would fail the Turing test.

And, my favorite line:

…it is reality itself rather than a committee of humans that makes ultimate judgements on the intelligence of actors.


More Practical Fare

A Beginner’s Guide to Data Engineering — The Series Finale

Wow. This piece is a must-read. More than any other blog post that I’ve read in the recent past, this post gives outsiders a glimpse into what it’s like to work in data at a large company with a deep investment in data.

Companies like Airbnb and Spotify and Uber have invested heavily in their own proprietary data tooling that data scientists use to do things like building tables incrementally, backfilling data, and experimentation. The author is from Airbnb, and the post focuses on the the tooling that all data scientists at Airbnb have access to.

Having this type of infrastructure is a massive leg up for data scientists who want to make a big impact, and it’s a major reason why data jobs at these companies are so highly sought-after.


Agile Analytics, Part 3: The Adjustments

This is the finale to another series that I’ve followed closely. Part 1 outlined why Agile works great in an analytics context and Part 2 outlined why sometimes it’s not a perfect fit.

This final post talks about the adjustments that the author (the head of analytics at Harry’s) suggests to standard Scrum when applying it to analytics. The bullet points:

  • Time-bound spikes for research

  • Build in slack time for exploration

  • Acceptance Criteria includes “write the next story”

  • Peer-review instead of sprint-review

I’m definitely going to think hard about operationalizing some of these practices in our own Agile workflow.


SQL Window Functions to Pass a Data Analytics Interview

Alternate title: Window Functions Can Do More Than You Thought. There’s a lot of good stuff in this post, but my favorite was the trailing 4-week confidence interval. That’s right: you can calculate confidence intervals in analytic SQL!


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123