11 Reasons Notebooks Suck. Tuning Redshift Performance. Code as Configuration. [DSR #153]

Sep 16, 2018

Do you have 3+ years of experience working in a modern data team? Fishtown Analytics is hiring experienced data analysts. Ping me.

- Tristan

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This Week's Most Useful Posts

Top 14 Performance Tuning Tips for Amazon Redshift

This is the single best post I’ve read on Redshift optimization. It’s written by the co-founder of Intermix, a company that specializes in (you guessed it) helping companies optimize their Redshift performance. It covers everything from table design to workload management config to query optimization. Very comprehensive advice you won’t find elsewhere.

My only critique of this post is that it doesn’t mention my favorite trick for optimizing Redshift performance: migrating to Snowflake! 😉

www.intermix.io • Share

The First Notebook War: 11 Reasons Notebooks Suck

Last week I learned about an interesting JupyterCon talk given by Joel Grus titled “I Don’t Like Notebooks”.

And thus begins the “2018 Notebook War”. Hillary Parker and Roger Peng both weighed in on Twitter, and Hadley Wickham called it “the spaces vs tabs of data science.” Touchè.

Notebooks have in turn been lionized as the next great scientific communication revolution and been decried as a shit way to write code. Both are probably true to some extent. Netflix just wrote about their extensive infrastructure to support productionizing notebooks. Is this a good idea? The industry is still deciding on how data analysis work should be conducted and it’s fascinating to watch this conversation play out.

yihui.name • Share

Code as Configuration

(…) the optimal pattern for collaboration relies on architecting and building systems where I (and the other data folks on my team) can write and deploy code / scripts without: a) needing to get that code approved by software engineers, b) having to deal with hosting or networking concerns, c) having to interface with non-familiar languages and paradigms.

Another great post by Michael Kaminsky. I absolutely 100% agree with his viewpoint: engineers build frameworks, analysts write code that runs in a framework and implements business logic. This allows both to do what they are best at, have direct knowledge of, and most incentivized to do.

The catch? Building good frameworks is hard. Expect this pattern to continue to be deployed as frameworks get built, generalized to work across environments, and open sourced.

www.locallyoptimistic.com • Share

Anatomy of an AI System

Or: “The Amazon Echo as an anatomical map of human labor, data and planetary resources.”

Just wow. Here’s how the authors set the context:

[in fulfilling a single Alexa request], a vast matrix of capacities is invoked: interlaced chains of resource extraction, human labor and algorithmic processing across networks of mining, logistics, distribution, prediction and optimization. The scale of this system is almost beyond human imagining. How can we begin to see it, to grasp its immensity and complexity as a connected form?

This one-of-a-kind microsite is perspective-changing. Click through to see what I mean.

anatomyof.ai • Share

Google AI: Code-Free Probing of ML Models

Building effective machine learning (ML) systems means asking a lot of questions. It’s not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better: How would changes to a datapoint affect my model’s prediction? Does it perform differently for various groups–for example, historically marginalized people? How diverse is the dataset I am testing my model on?

Today, we are launching the What-If Tool, a new feature of the open-source TensorBoard web application, which let users analyze an ML model without writing code. Given pointers to a TensorFlow model and a dataset, the What-If Tool offers an interactive visual interface for exploring model results.

This problem has gotten a lot of attention in recent years; it’s great to see Google investing resources in exploring solutions.

ai.googleblog.com • Share

"Storm Surge Like You've Never Experienced it Before"

Realistic Storm Surge Depicted in Weather Channel Forecast

The new Weather Channel storm surge visualization is really something—it’s a fascinating use case of how visualization can help people understand things that are outside their experience. A bar graph just wouldn’t have communicated the impact of the rising water.

Watch the video, well worth the 1 minute.

flowingdata.com • Share

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.

www.fishtownanalytics.com • Share

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post

Ready for more?