Broken Neural Networks, Real-Time SQL, & the State of the Data Science Industry [DSR #102]
Typically I try to keep to 6-10 links, but this week I just kept coming across more stuff that I found worthwhile. No images, all links, light narrative. Have at it! :)
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
11 things you probably screwed up and how to fix them.
Behind the “C” in “CNN”. If you’re not super-familiar with how the internals of image classifiers work, this is a useful intro.
Great start-to-finish walkthrough on TF. I hope you like raccoons.
What's New in SQL
Introducing KSQL, a streaming SQL engine for Apache Kafka. KSQL provides a simple and completely interactive SQL interface for processing data in Kafka.
This is incredibly cool. SQL-based analytics lives in a batch-based world today. Moving towards SQL syntax for querying real-time data streams is a major development. If you could write SQL queries on data and get answers with < 1s latency, that would unlock a completely new set of capabilities for data analysts and scientists.
It remains to be seen exactly what the performance characteristics are of KSQL are, but this worth paying attention to.
This post got forwarded around internally at Fishtown Analytics this past week; it goes through a SQL design pattern that none of us had ever considered before. I love that after spending thousands and thousands of hours writing SQL, there are still plenty of unexplored ideas.
The NYTimes made a very impressive visualization of Harvey this past week. This post goes through the process, including a dead end or two, that the author went through to get to the final version. Quite a lot of work went into making the image and on a tight time horizon.
Over the last three years, Storybench, a website from Northeastern University’s School of Journalism’s Media Innovation graduate program, has interviewed 72 data journalists, web developers, interactive graphics editors, and project managers from around the world to provide an “under the hood” look at the ingredients and best practices that go into today’s most compelling digital storytelling projects.
From the Unicorns
If you can control the layout of the warehouse and the routes chosen by workers, you can make some serious optimizations in efficiency.
Michelangelo enables internal teams to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. The system also supports traditional ML models, time series forecasting, and deep learning.
The author, Ben Lorica, is one of the most plugged-in people in the field. This is the best “state of the industry” post I’ve read this year.
Borrowing tips and tricks from software developers, learn how to create a more productive workflow on the journey to becoming a 10X Data Scientist.
So good! If there are things on this list that you’re not doing today, you should stop, question all of your priorities in life, and then change your workflow.
Cool piece of data journalism. Languages focused on web and mobile development are relatively overrepresented in developing economies and languages focused on data processing are overrepresented in wealthy economies.
The government [is] extremely fond of amassing great quantities of statistics. These are raised to the nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases.- Sir Josiah Charles Stamp, 1880-1941
An excellent reminder to treat all data with skepticism.
It turns out it’s quite easy to make an application that reads license plates and does a real-time registration lookup. There’s never been a better time to be writing software.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123