Data Science Roundup #53: Analytics Tech, How to Interpret a Poll, and All the Algorithms!

The ideal analytics tech stack, building election models and the inherent subjectivity of polls, the problems with pixellation, detecting logos with deep learning, and a mega-roundup of algorithms. Enjoy! 😂 😂 😂

– Tristan

This week's best data science articles

What's your analytics tech stack?

Business intelligence tech has changed really dramatically over the past 3–4 years, and the most common question I get from folks in the industry is “What’s your analytics tech stack?” This post lays out my recommendations, from ETL to data warehousing to data modeling to analysis. There are surprisingly few people doing this right.


Predicting The 2016 US Presidential Election

Election forecasting has become a big deal since Nate Silver’s success in 2008. Not to be outdone, many major publications now not only report on polling data, they have their own statisticians building proprietary election forecast models. This post walks through the details of how to implement such an election model, step by step. Fascinating.


We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results.

This is the best article on polling I’ve ever read. From the article: “Pollsters usually make statistical adjustments to make sure that their sample represents the population. They usually do so by giving more weight to respondents from underrepresented groups.” Read this to learn what’s behind the polling numbers in the news.

Also in data in the mainstream press: you should really read this post about Skittles.


The Great Algorithm Tutorial Roundup

KDNuggets recently did a poll where the asked “Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?” The 844 respondents’ most often used algorithm? Regression, of course. Hard to beat it. This post is the followup article, where they walk through every top algorithm and provide amazing resources. Want to know more about PCA? Random forests? This is the post.


None of your pixelated or blurred information will stay safe on the internet

Pixellation is no longer an effective way to obscure visual information. Look at the image below, blurred with YouTube’s blur feature. Using deep learning, researchers were able to identify blurred faces like these at a shocking rate: “On an industry standard dataset where humans had 0.19% chance of identifying a face, the algorithm had 71% accuracy (or 83% if allowed to guess five times).” Wow.


Can you tell they're the same face?

Can you tell they're the same face?

Improving Brand Analytics with an Image Logo Detection Convolutional Neural Net in TensorFlow

What did you do for the final project in your data science boot camp? A recent Metis grad developed a DCNN that classified images scraped from Instagram into two classifications: those with, and those without, a Patagonia logo in them. This is an excellent walkthrough of the process the author went through, including links to resources like this TensorFlow transfer learning retraining script. Valuable.


Data viz of the week

Information-dense Kickstarter data viz. Click through for more great analysis.

Information-dense Kickstarter data viz. Click through for more great analysis.

Thanks to our sponsors!

Fishtown Analytics

Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.



Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123