Broken Neural Networks, Real-Time SQL, & the State of the Data Science Industry [DSR #102]

Typically I try to keep to 6-10 links, but this week I just kept coming across more stuff that I found worthwhile. No images, all links, light narrative. Have at it! :)

- Tristan

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

Neural Networks

My Neural Network isn't working! What should I do?

11 things you probably screwed up and how to fix them.


An Introduction to different Types of Convolutions in Deep Learning

Behind the “C” in “CNN”. If you’re not super-familiar with how the internals of image classifiers work, this is a useful intro.


How to train your own Object Detector with TensorFlow’s Object Detector API

Great start-to-finish walkthrough on TF. I hope you like raccoons.


What's New in SQL

Introducing KSQL: Open Source Streaming SQL for Apache Kafka

Introducing KSQL, a streaming SQL engine for Apache Kafka. KSQL provides a simple and completely interactive SQL interface for processing data in Kafka.

This is incredibly cool. SQL-based analytics lives in a batch-based world today. Moving towards SQL syntax for querying real-time data streams is a major development. If you could write SQL queries on data and get answers with < 1s latency, that would unlock a completely new set of capabilities for data analysts and scientists.

It remains to be seen exactly what the performance characteristics are of KSQL are, but this worth paying attention to.


`Outer Join on False`

This post got forwarded around internally at Fishtown Analytics this past week; it goes through a SQL design pattern that none of us had ever considered before. I love that after spending thousands and thousands of hours writing SQL, there are still plenty of unexplored ideas.


Data Journalism

Hurricane How-To

The NYTimes made a very impressive visualization of Harvey this past week. This post goes through the process, including a dead end or two, that the author went through to get to the final version. Quite a lot of work went into making the image and on a tight time horizon.


What we learned from three years of interviews with data journalists, web developers and interactive editors at leading digital newsrooms

Over the last three years, Storybench, a website from Northeastern University’s School of Journalism’s Media Innovation graduate program, has interviewed 72 data journalists, web developers, interactive graphics editors, and project managers from around the world to provide an “under the hood” look at the ingredients and best practices that go into today’s most compelling digital storytelling projects.


From the Unicorns

How StitchFix packs boxes more efficiently using machine learning

If you can control the layout of the warehouse and the routes chosen by workers, you can make some serious optimizations in efficiency.


Meet Michelangelo: Uber's Machine Learning Platform

Michelangelo enables internal teams to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. The system also supports traditional ML models, time series forecasting, and deep learning.


Everything Else

The Current State of Applied Data Science

The author, Ben Lorica, is one of the most plugged-in people in the field. This is the best “state of the industry” post I’ve read this year.


Becoming a 10x Data Scientist

Borrowing tips and tricks from software developers, learn how to create a more productive workflow on the journey to becoming a 10X Data Scientist.

So good! If there are things on this list that you’re not doing today, you should stop, question all of your priorities in life, and then change your workflow.


A Tale of Two Industries: How Programming Languages Differ Between Wealthy and Developing Countries

Cool piece of data journalism. Languages focused on web and mobile development are relatively overrepresented in developing economies and languages focused on data processing are overrepresented in wealthy economies.


Data Alone Isn’t Ground Truth

The government [is] extremely fond of amassing great quantities of statistics. These are raised to the nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases.- Sir Josiah Charles Stamp, 1880-1941

An excellent reminder to treat all data with skepticism.


How I replicated an $86 million project in 57 lines of code

It turns out it’s quite easy to make an application that reads license plates and does a real-time registration lookup. There’s never been a better time to be writing software.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123