Steam-powered ML. Jeff Dean. AI Job Impacts. Model size. Data org maturity. [DSR #208]

Short issue this week for the holiday! Hope all of you Americans out there had a happy Thanksgiving! 🦃

- Tristan

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

We're still in the steam-powered days of machine learning

I have no desire to add anything to this post other than to say that it’s fantastic and you should absolutely read it. Vicki Boykis continues to put out fantastic work.

vicki.substack.comShare

Jeff Dean: The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

It’s not often Jeff Dean puts out new work. This paper is brand new, from a talk at a recent conference, and is jam-packed with interesting stuff if you care about the intersection of chip design and ML. I was particularly interested in the section Machine-Learning-Specialized Hardware, which was the best overview of the differences between a classic microprocessor and a TPU that I’ve read.

In related news: Microsoft is now offering Graphcore processors in Azure.

arxiv.orgShare

Deep learning has a size problem

The size of some of the recently-released language models are intense. This is problematic for two reasons:

First, it hinders democratization. If we believe in a world where millions of engineers are going to use deep learning to make every application and device better, we won’t get there with massive models that take large amounts of time and money to train.

Second, it restricts scale. There are probably less than 100 million processors in every public and private cloud in the world. But there are already 3 billion mobile phones, 12 billion IoT devices, and 150 billion micro-controllers out there. In the long term, it’s these small, low power devices that will consume the most deep learning, and massive models simply won’t be an option.

This is the best post I’ve read on model efficiency. It goes deep in certain tactical areas but remains extremely accessible at all points.

heartbeat.fritz.aiShare

Brookings: What Jobs are Affected by AI?

My feelings on this report by the Brookings Institute: ¯\_(ツ)_/¯

The main takeaway is that white-collar jobs are likely to be more impacted than blue-collar jobs from the widespread deployment of AI, and they got there via a bunch of NLP work using a couple of different datasets. Here’s the big problem with the analysis though:

…the exposure measure employed here only suggests that in particular occupations some kind of impact can be expected, whether positive or negative.

The report is just saying that certain fields are more “AI-exposed” than others. For instance, software engineers are listed as being very highly AI-exposed. That seems quite obvious, given that software engineers literally…build AI systems. Other top areas listed also fall under “I could have just told you that without needing to do a bunch of language NLP”.

I include this link here because it is going certainly made the rounds in the last couple of weeks, worth a scan just to have the water cooler conversation.

www.brookings.eduShare

The Three Levels of Data Analysis- A Framework for Assessing Data Organization Maturity

There are three tiers of data analysis: reporting, insights, and prediction.

dbt community member Emilie Schario has an excellent post up on the Gitlab blog about the stages of organizational maturity. As an industry, we’re still having a hard time getting reporting right.

about.gitlab.comShare

Thanks to our sponsors!

dbt: Your Entire Analytics Engineering Workflow

Analytics engineering is the data transformation work that happens between loading data into your warehouse and analyzing it. dbt allows anyone comfortable with SQL to own that workflow.

getdbt.comShare

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.comShare

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123