Broken Neural Networks, Real-Time SQL, & the State of the Data Science Industry [DSR #102]
Typically I try to keep to 6-10 links, but this week I just kept coming across more stuff that I found worthwhile. No images, all links, light narrative. Have at it! :)
- Tristan
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Neural Networks
My Neural Network isn't working! What should I do?
11 things you probably screwed up and how to fix them.
An Introduction to different Types of Convolutions in Deep Learning
Behind the “C” in “CNN”. If you’re not super-familiar with how the internals of image classifiers work, this is a useful intro.
How to train your own Object Detector with TensorFlow’s Object Detector API
Great start-to-finish walkthrough on TF. I hope you like raccoons.
What's New in SQL
Introducing KSQL: Open Source Streaming SQL for Apache Kafka
Introducing KSQL, a streaming SQL engine for Apache Kafka. KSQL provides a simple and completely interactive SQL interface for processing data in Kafka.
This is incredibly cool. SQL-based analytics lives in a batch-based world today. Moving towards SQL syntax for querying real-time data streams is a major development. If you could write SQL queries on data and get answers with < 1s latency, that would unlock a completely new set of capabilities for data analysts and scientists.
It remains to be seen exactly what the performance characteristics are of KSQL are, but this worth paying attention to.
This post got forwarded around internally at Fishtown Analytics this past week; it goes through a SQL design pattern that none of us had ever considered before. I love that after spending thousands and thousands of hours writing SQL, there are still plenty of unexplored ideas.
Data Journalism
The NYTimes made a very impressive visualization of Harvey this past week. This post goes through the process, including a dead end or two, that the author went through to get to the final version. Quite a lot of work went into making the image and on a tight time horizon.
Over the last three years, Storybench, a website from Northeastern University’s School of Journalism’s Media Innovation graduate program, has interviewed 72 data journalists, web developers, interactive graphics editors, and project managers from around the world to provide an “under the hood” look at the ingredients and best practices that go into today’s most compelling digital storytelling projects.
From the Unicorns
How StitchFix packs boxes more efficiently using machine learning
If you can control the layout of the warehouse and the routes chosen by workers, you can make some serious optimizations in efficiency.
multithreaded.stitchfix.com • Share
Meet Michelangelo: Uber's Machine Learning Platform
Michelangelo enables internal teams to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. The system also supports traditional ML models, time series forecasting, and deep learning.
Everything Else
The Current State of Applied Data Science
The author, Ben Lorica, is one of the most plugged-in people in the field. This is the best “state of the industry” post I’ve read this year.
Borrowing tips and tricks from software developers, learn how to create a more productive workflow on the journey to becoming a 10X Data Scientist.
So good! If there are things on this list that you’re not doing today, you should stop, question all of your priorities in life, and then change your workflow.
A Tale of Two Industries: How Programming Languages Differ Between Wealthy and Developing Countries
Cool piece of data journalism. Languages focused on web and mobile development are relatively overrepresented in developing economies and languages focused on data processing are overrepresented in wealthy economies.
The government [is] extremely fond of amassing great quantities of statistics. These are raised to the nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases.- Sir Josiah Charles Stamp, 1880-1941
An excellent reminder to treat all data with skepticism.
How I replicated an $86 million project in 57 lines of code
It turns out it’s quite easy to make an application that reads license plates and does a real-time registration lookup. There’s never been a better time to be writing software.
medium.freecodecamp.org • Share
Thanks to our sponsors!
Fishtown Analytics: Analytics Consulting for Startups
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Stitch: Simple, Powerful ETL Built for Developers
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123