Data Science Roundup #53: Analytics Tech, How to Interpret a Poll, and All the Algorithms!

Sep 25, 2016

The ideal analytics tech stack, building election models and the inherent subjectivity of polls, the problems with pixellation, detecting logos with deep learning, and a mega-roundup of algorithms. Enjoy! 😂 😂 😂

– Tristan

This week's best data science articles

What's your analytics tech stack?

Business intelligence tech has changed really dramatically over the past 3–4 years, and the most common question I get from folks in the industry is “What’s your analytics tech stack?” This post lays out my recommendations, from ETL to data warehousing to data modeling to analysis. There are surprisingly few people doing this right.

blog.fishtownanalytics.com • Share

Predicting The 2016 US Presidential Election

Election forecasting has become a big deal since Nate Silver’s success in 2008. Not to be outdone, many major publications now not only report on polling data, they have their own statisticians building proprietary election forecast models. This post walks through the details of how to implement such an election model, step by step. Fascinating.

www.probabilisticworld.com • Share

We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results.

This is the best article on polling I’ve ever read. From the article: “Pollsters usually make statistical adjustments to make sure that their sample represents the population. They usually do so by giving more weight to respondents from underrepresented groups.” Read this to learn what’s behind the polling numbers in the news.

Also in data in the mainstream press: you should really read this post about Skittles.

www.nytimes.com • Share

The Great Algorithm Tutorial Roundup

KDNuggets recently did a poll where the asked “Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?” The 844 respondents’ most often used algorithm? Regression, of course. Hard to beat it. This post is the followup article, where they walk through every top algorithm and provide amazing resources. Want to know more about PCA? Random forests? This is the post.

www.kdnuggets.com • Share

None of your pixelated or blurred information will stay safe on the internet

Pixellation is no longer an effective way to obscure visual information. Look at the image below, blurred with YouTube’s blur feature. Using deep learning, researchers were able to identify blurred faces like these at a shocking rate: “On an industry standard dataset where humans had 0.19% chance of identifying a face, the algorithm had 71% accuracy (or 83% if allowed to guess five times).” Wow.

qz.com • Share

Can you tell they're the same face?

Improving Brand Analytics with an Image Logo Detection Convolutional Neural Net in TensorFlow

What did you do for the final project in your data science boot camp? A recent Metis grad developed a DCNN that classified images scraped from Instagram into two classifications: those with, and those without, a Patagonia logo in them. This is an excellent walkthrough of the process the author went through, including links to resources like this TensorFlow transfer learning retraining script. Valuable.

maxmelnick.com • Share

Data viz of the week

Information-dense Kickstarter data viz. Click through for more great analysis.

Thanks to our sponsors!

Fishtown Analytics

Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.

fishtownanalytics.com • Share

Stitch

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post

Ready for more?