Distributing Data Globally @ Facebook. Google Pixel 3. Don't Follow the Herd. [DSR #158]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This Week's Most Useful Posts
The author founded SharpestMinds, a platform to help data scientists get hired. The entire article is great, but here’s my favorite part:
Learn boring things. Other people aren’t doing this because no one likes boring things. But learning a proper Git flow, how to use Docker, how to build an app using Flask, and how to deploy models on AWS or Google Cloud, are skills that companies desperately want applicants to have, but that are under-appreciated by a solid majority of applicants.
If you like that, there’s plenty more good advice (and snark) in the full post.
Author Genevieve Hayes did an analysis using Stack Overflow’s survey data (Jupyter notebook here) to look at a bunch of measures for data scientists vs software engineers as a whole. By and large the distributions were similar—data scientists, like software engineers, are not exactly a diverse bunch. This is not exactly surprising.
The analysis leaves questions that I’d be very interested to learn more about. For example: my guess is that the self-described data analyst population is both more age and gender diverse than the self-described data scientist population. Just a hunch, but the data is there to look. Anyone have time? I’d be happy to link to a followup piece in coming weeks.
To the best of our knowledge, Akkio is the first dynamic locality management service for geo-distributed data store systems that migrates data at microshard granularity, offers strong consistency, and operates at Facebook scale.
…wow. This is like nothing I’ve ever read. It delves deep into Facebook’s solution for how to map a file to a datacenter, and the results are impressive:
(Akkio has resulted in a) 50 percent reduction of the corresponding WAN traffic and an approximately 50 percent reduction in perceived latency.
Facebook has written a paper on Akkio and will be presenting it at OSDI 2018.
Pete Warden has been writing great stuff recently. Here’s his latest:
When I talk to people about machine learning on phones and devices I often get asked “What’s the killer application?“. I have a lot of different answers, everything from voice interfaces to entirely new ways of using sensor data, but the one I’m most excited about in the near-team is compression. Despite being fairly well-known in the research community, this seems to surprise a lot of people, so I wanted to share some of my personal thoughts on why I see compression as so promising.
The whole post is short and fascinating. This is an entire area that I hadn’t spent literally any time thinking about, but it makes a ton of sense.
Most marketers don’t know how to develop websites, but they still use CMSes that allow them to make constructive edits to live ones. Similarly, this post proposes methods for getting data scientists and analysts to impact customers directly, while doing so safely.
It’s a great topic: data teams will have far more impact if their code can be more customer-facing.
Another FlowingData gem. In this post, Nathan takes a fairly straightforward population dataset, asks it 10 different questions, and visualizes the answer to each. Each viz is distinct and perfectly designed to answer that particular question.
In order to create great visualizations, you have to be clear about the question you’re asking.
This post does an amazing job of walking through the new Pixel 3’s “Super Res Zoom” feature, an AI-enabled zoom that analyzes multiple shots from a burst to come up with a single enhanced image. The post goes through the multiple techniques they use the account for different image imperfections and shows detailed examples (as above) of the improvement.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123