Data Science Roundup #49: Text Summarization, Open Data, and Fantasy Maps!

Quick note: I’ll be attending Strata NYC from September 26-29 and would love to meet any readers who are also attending! Respond to this email so that we can meet up. Also: retweet this for the chance to win a free pass! Or get 20% off with this link.

See you there :)

- Tristan

This week's best data science articles

Text summarization with TensorFlow

Extractive summarization (finding the most important words in a document and using those as a summary) is easier than abstractive summarization (using different words that convey the overall meaning). Google’s recently-released TF model represents their state-of-the-art work in abstractive summarization, and it’s impressive. Get the code here. Also, read the Fast Forward Labs take on it here.


Generating Fantasy Maps from Randomness

You have to try this. It’s an interesting concept, but the demo is just neat. Within this post, interactively, you start with randomness and then adjust several parameters, eventually ending up with a beautiful fantasy map. The three cities in my map are already vying for the throne.


The "Joel Test" for Data Science

Is your data science team high-functioning? Domino Data Lab recently updated Joel Spolsky’s 16-year-old test for software engineering teams and applied it to data science, and the results are worth paying attention to. This is short and sweet.


Visualizing Clusters of Clickbait Headlines Using Spark, Word2vec, and Plotly

Facebook recently announced that they will punish Facebook Posts which link to articles using clickbait headlines by limiting their exposure on the News Feed. This article walks through a basic but powerful clustering of headlines with some really fun results.


Making Kaggle the Home of Open Data

Kaggle initially released its open dataset functionality in January, and they promised at the time that they’d be releasing more in the future. Well, they recently opened the doors: now everyone can post open datasets to Kaggle, version them, collaborate on them, and analyze them, all in public. Very, very cool. I’m already imagining so many applications…


Five Python Tutorials You Just Have to Try

(Speaking of clickbait headlines..!) This post is a quick-hitter, with reviews of five excellent tutorials you may want to check out: Composing Music With Recurrent Neural Networks, Page dewarping using OpenCV, and…well, just click the link.


Data viz of the week

An absolutely gorgeous interactive visualization of wind.

An absolutely gorgeous interactive visualization of wind.

Pay it forward!

I curate the Roundup on my nights and weekends because of the amazing support I get from readers. Know any data scientists that would enjoy reading? Please send them here (or forward this email). Thanks!

Thanks to our sponsors :D

Fishtown Analytics

Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.



Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123