Data Science Roundup #71: The Best Job in America, Rules for Strong AI, and some Really Boring Stuff

140+ blogs and publications and 500+ posts distilled down to the best six. Do you like reading the Data Science Roundup? Please share with your network.

Twitter | Facebook | Linkedin | forward this email

Referred by a friend? Sign up.

This week's best data science articles

Data Scientist: The Best Job in America, Again

The popular job site Glassdoor published a list of 50 Best Jobs in America, and Data Scientist is again the no. 1 job in the US, with a job score 4.8 out of 5, $110,000 median base salary, and 4,000 job openings. What’s more, 5 of the top 10 US jobs are related to analytics, data engineering, and data science.

To those of us actively working in the space, this isn’t surprising: it’s clear that data is an amazing place to build a career right now. To those of you waiting to make the jump, there’s never been a better time.


Unlearning Descriptive Statistics

I spend my days largely doing descriptive statistics. Sometimes it’s really important to build a predictive model, but frequently what your data consumers actually need is some really well-though-out descriptive statistics. Which is why I love this post. Here’s the intro:

Statistics professors tend to gloss over basic descriptive statistics because they want to spend as much time as possible on margins of error and t-tests and regression. Fair enough, but the result is that it’s easier to find a machine learning expert than someone who can talk about numbers. Forget what you think you know about descriptives and let me give you a whirlwind tour of the real stuff.


Data Science and DevOps: A Success Story

This is an excellent article. It actually has very little to do with devops, rather, it talks about the challenges of integrating a data science team effectively into a larger organization. Here is a wonderful observation, that after you read it will seem obvious:

…[data science] needs to be deeply integrated into the business processes in order to be effective as a decision making system. This is by far the biggest source for the troubles created by data science efforts. In order to successfully integrate data science, one needs to transform and modify the core business processes, which is a difficult task.

This is a must-read for both data scientists and any manager who interacts with a data science team.

CommAI: Evaluating the First Steps Towards a Useful General AI

With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal. However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In order to fill this gap, we propose here a set of concrete desiderata for general AI, together with a platform to test machines on how well they satisfy such desiderata, while keeping all further complexities to a minimum.

From the paper, their four desiderata:

  • Communication through natural language

  • Learning to learn

  • Feedback

  • Interface

This paper is very readable / scannable. If you are at all interested in the topic of general-purpose AI, this is a must-read.


Focus on: The Very Boring

Not everything written about data is exciting—often it’s the boring stuff that’s the most important. Here are two posts that you really should read, but probably won’t.

How Joins Work

How Joins Work

The SQL join operation is one of the most powerful and commonly used SQL operations, but little attention is paid to how the internal SQL engine breaks down the tasks of join operations.

I’ll be the first to admit that this article is really quite boring (it got literally 0 recommends on Feedly), but let me just say that really, truly understanding the query planner and knowing how to read an explain plan are just so unbelievably important in being day-to-day effective. Invest the time to understand this stuff. This post is the best resource I’ve found on the topic.


Julia – A Fresh Approach to Numerical Computing

Julia first appeared in 2012 and has since become popular in academic environments. While its inclusion into the Jupyter project in 2014 (it’s the Ju- in Jupyter) marked a significant increase in awareness and adoption, Julia still isn’t particularly common in commercial environments.

This post, written by one of the co-creators of the language, makes the case that core elements of Julia’s design make it a superior choice for performance-intensive numerical computing applications. Read this post, and put Julia on your list of things to play around with. It’s maturing quickly.

Data viz of the week

Absolutely f***ing hideous, but wow, love the data.

Absolutely f***ing hideous, but wow, love the data.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123