To Self-Serve or Not to Self-Serve? DS in Production. Google Acquires Alooma. [DSR #175]

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

The Problem With Hands-Off Analytics

I love this post. It’s by the Chief Analytics Officer @ Mode, Benn Stancil, whom I highly respect and frequently disagree with. I think I disagree with him here, but he’s highlighting a super-important topic:

Many analytics teams aspire to enable as much “self-serve” as possible - in other words, to remove themselves from as many decision-making processes as they can.

What’s the role of the analytics team: to conduct, or to facilitate, analytics? It is a 100% legitimate question, and it’s also one that is not discussed enough within individual companies or industry-wide.

My own personal belief is that 10-20 years ago “business users” were also data analysts. If they needed data to make their decisions, they got their hands on it and crunched it in Excel. This was just a part of the job of being a professional. It’s only in the past 5-10 or so years that “business user” has started to mean “someone who can’t self-serve”, and this change has come about due to the level of technical proficiency required to analyze data in the modern ecosystem. Excel is no longer enough, and so business users are no longer able to accomplish what had been a core part of their jobs.

My belief is that there is a shift underway to give control back to those users. That the analyst’s primary job is not to do someone’s job for them but rather to enable (via tooling and training) them to do it themselves. There are certain extremely critical questions that it will always make sense to have a data analyst / scientist analyze directly, but most answers to most questions need to be in the hands of the owners of those business functions.


Data Versioning

Great summary by the author:

Data science is hard to productionize, and one of the reasons it is hard is because it has so many moving parts. The notion of a “version” of a smart/AI/machine learning application has (at least) four possible axes on which it can drift. This poses a challenge in continuous delivery practices. These challenges can be addressed, but there are benefits and drawbacks to the various ways I’ve seen people try to address this in practice.

The post goes into the various approaches in a decent amount of detail. It’s a thorough yet accessible treatment of this important topic.


Google announces intent to acquire Alooma to simplify cloud migration

We’re announcing our intent to acquire Alooma, a leader in data migration, to help businesses streamline database migration in the cloud.

What is the next big thing in AI and ML?

If you’re following the industry closely, these trends will be fairly well-known. If you only drop in from time-to-time, this is an excellent writeup of the past 12 months and what you should know. It’s not filled with benchmarks, it’s analysis.


Data Science Foundations: Know Your Data. Really, Really, Know It.

Know your data, where it comes from, what’s in it, what it means. It all starts from there.

If you haven’t spent six months with a dataset, you probably aren’t deeply familiar with it. How was it collected? What are the quirky business rules? Etc.

This is a rarely-discussed topic topic: it feels pedantic to tell someone that they should get to know their data better. But it is true. The first time you get to know a dataset on this level you will feel like a ballet dancer, lightly jumping from question to question with grace and mastery.


Why I abandoned online data courses for project-based learning

Finding motivation is harder than finding information.

This is a great post on why you should rely on MOOCs less and projects more. Completely agree.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123