Data Science Roundup #62: Reproducible Research @ Stripe, plus easy mapping viz, image recognition & more!
The coming age of image recognition. How to avoid getting overwhelmed as you learn. Reproducible research @ Stripe. Avoiding common statistical mistakes. Easy mapping in R. And the world’s biggest ML & AI resource guide.
A Thanksgiving ask: The Roundup is forwarded to hundreds of new people each week! If you’ve been sent this newsletter by a friend, do me a favor and sign up. It’s your subscriptions that keep The Data Science Roundup growing!
This week's best data science articles
This week’s “big think” article focuses on image recognition and its implications. “We should expect that every image ever taken can be searched or analyzed, and some kind of insight extracted, at massive scale. (…) When we can turn images into data, we’ll find lots of sets of images that we never really thought of as data before, and lots of problems that didn’t look like image recognition problems.”
Great, short, Quora answer from an experienced data scientist. “Instead of adding everything that we stumble upon to our reading lists, I’d say that it makes more sense to be absolutely clear about personal goals first. Since there’s so much material out there, it’s become necessary to be a bit more selective when choosing learning material and exploring different tools. Of course, it sometimes feels like we are missing out on something, but I think that getting used to this feeling really helps to stay focussed and to make steady progress.”
The data team at Stripe has heavily invested in reproducibility, with great results. In this post, they share how their team publishes internal research that is then reproducible from scratch by any member of the team, current or future. Git, Jupyter, and internally-built tools are all at the heart of this workflow.
This is a must-read. Data teams need to think of their outputs as research, and need to be focused on building high-quality mechanisms by which this research gets produced and maintained.
If you find yourself in an analytics role but aren’t heavy on stats, read this post. In it, the author provides guidance on the bare minimum statistics you need to know to produce reliable analytics. Significance testing, confidence intervals, and (my favorite!) how to deal with the multiple comparisons problem.
Producing high-quality mapping visualizations used to be hard, but at this point, if you’re visualizing data that has a spatial component to it and you’re not using a map, you’re doing it wrong. This article uses R’s ggmap to draw several different maps of gas price data in Germany, and each map takes 3-5 lines of code.
Um. Wow. This is a collection of every blog, every company, every person, and every conference focused on ML & AI. I can only imagine what a massive effort this was to pull together, and to my knowledge it’s the most extensive resource of its kind. I highly recommend browsing through; I’ve added a bunch of new resources to my regular feeds.
Data viz of the week
Introducing the "Troll Hair Chart". Great way to show many stacked time series.
Thanks to our sponsors!
Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123