Data Science Roundup #72: Data Viz, Intro to LDA, and Pizzagate!

140+ blogs and publications and 500+ posts distilled down to the best six. Do you ❤️️ the Data Science Roundup? Please share with your network.

Twitter | Facebook | Linkedin | forward this email

Referred by a friend? Sign up.

Focus on: Visualization

The State of Information Visualization, 2017

Robert Kosara writes a brilliant post every year on the advances in data visualization over the prior year. This year’s post focuses on the visualization of uncertainty (a hot topic with the US elections), sketching & personal data, and trends in storytelling.

I’m amazed at how quickly the field of data visualization is moving. This gets surprisingly little attention amidst the constant barrage of ML milestones.


Hans Rosling: An Appreciation

Hans Rosling passed away this week.

I’ve been working with data since I was in high school, but had never been excited about that fact until 2006. In the early days of video podcasts, I was a subscriber to TED Talks and watched Hans Rosling give his famous talk. It was like nothing I’ve ever seen. From the article:

That TED video has been watched over 11 million times. Eleven. million. times. A video of a geeky old professor talking about public health numbers!

I remember being blown away—Rosling had an amazing talent for both building brilliant visualizations and telling engaging stories with them. If you’ve never seen the video, take a moment to watch it. The article is an excellent summary of Rosling’s impact.

We’ll miss you.


Visualizing Time-Series Change

There are a finite number of primary ways to view time series data. This post does a great job at presenting your options and laying out intuitive guidelines to help you make the choice.

Short, impactful.


This week's best data science articles

Clustering Similar Stories Using LDA @ Flipboard

Flipboard, a popular news reading application, just released a “related stories” feature. From the article:

Although there are many sophisticated automatic clustering algorithms, story clustering is a non-trivial problem. Because each text document can contain any word from our vocabulary, most text document representations are extremely high-dimensional. In high-dimensional spaces, even basic clustering or similarity measures fail or are very slow.

This post on their engineering blog goes deep into the details of their implementation. Extremely useful.


Intro to Data Science for Academics

Are you coming from an graduate school background and looking to get into data science? This article was written for you. Many of the skill sets used in PhD programs are incredibly relevant in data science jobs, but adjusting to a different context can be challenging.


Pizzagate, or the curious incident of the researcher in response to people pointing out 150 errors in four of his papers

This piece by Andrew Gelman starts off by roasting a researcher who seems to commit about every possible statistical / scientific sin. But it gets better:

I continue writing about this story because of the insight it gives into the inner workings of the famous self-correcting nature of science. The process of self correction is much more involved than people seem to realize. Sometimes people demand retractions, but as I’ve written before, I don’t see retraction as a serious solution for reform of poor research and publication practices, or as a way of cleaning the public record. The numbers just don’t add up: there are just too many hopelessly flawed papers, and retraction is done so rarely.

I am deeply interested in the social process of determining what the truth is. In both society and in science today it seems like we’re having some fundamental challenges agreeing on exactly how this should work.


Data viz of the week

Who's speaking in Middle Earth? Excellent interactivity.

Who's speaking in Middle Earth? Excellent interactivity.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123