The ideal analytics tech stack, building election models and the inherent subjectivity of polls, the problems with pixellation, detecting logos with deep learning, and a mega-roundup of algorithms. Enjoy! 😂 😂 😂
This week's best data science articles
Business intelligence tech has changed really dramatically over the past 3–4 years, and the most common question I get from folks in the industry is “What’s your analytics tech stack?” This post lays out my recommendations, from ETL to data warehousing to data modeling to analysis. There are surprisingly few people doing this right.
Election forecasting has become a big deal since Nate Silver’s success in 2008. Not to be outdone, many major publications now not only report on polling data, they have their own statisticians building proprietary election forecast models. This post walks through the details of how to implement such an election model, step by step. Fascinating.
This is the best article on polling I’ve ever read. From the article: “Pollsters usually make statistical adjustments to make sure that their sample represents the population. They usually do so by giving more weight to respondents from underrepresented groups.” Read this to learn what’s behind the polling numbers in the news.
Also in data in the mainstream press: you should really read this post about Skittles.
KDNuggets recently did a poll where the asked “Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?” The 844 respondents’ most often used algorithm? Regression, of course. Hard to beat it. This post is the followup article, where they walk through every top algorithm and provide amazing resources. Want to know more about PCA? Random forests? This is the post.
Pixellation is no longer an effective way to obscure visual information. Look at the image below, blurred with YouTube’s blur feature. Using deep learning, researchers were able to identify blurred faces like these at a shocking rate: “On an industry standard dataset where humans had 0.19% chance of identifying a face, the algorithm had 71% accuracy (or 83% if allowed to guess five times).” Wow.
Can you tell they're the same face?
What did you do for the final project in your data science boot camp? A recent Metis grad developed a DCNN that classified images scraped from Instagram into two classifications: those with, and those without, a Patagonia logo in them. This is an excellent walkthrough of the process the author went through, including links to resources like this TensorFlow transfer learning retraining script. Valuable.
Data viz of the week
Information-dense Kickstarter data viz. Click through for more great analysis.
Thanks to our sponsors!
Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123