❤️ Want to support this project? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
I remember sitting in a board meeting in ~2014 where a VC said “Mobile is huge—we need to make sure that our dashboards look great on mobile.” Five years later, there are still very few dashboards that look great on mobile, and most of those that do are meticulously crafted by hand. In an increasingly mobile world, data is still something we largely consume at our desktops.
This post is phenomenal. It discusses the work required to effectively get a data visualization to work in both desktop and mobile contexts. Sometimes, the best answer is to produce an entirely different visualization to communicate the same data, as in the image above. Sometimes the best answer is to replace the dynamic version with an image.
The author discusses a variety of strategies, sharing examples for each. I’ve never before seen a post that treats this topic with such clarity and depth. Maybe this is why most of our data viz is still desktop only:
Making a data visualization look good and be effective on both mobile and desktop is one of the most difficult aspects of my work.
Databricks’ announcement of a new file format, Delta, promises things like ACID transactions and data versioning. If you’re a Snowflake user, you’ll recognize these as both major selling points of that platform, so it’s neat to see these features make their way into an open source context.
I’ve been spending more time in the Spark / Databricks ecosystem recently on a client project and have had some file format challenges that have made me develop a new appreciation for their criticality. Delta is already solving real problems for us on that project.
This may not be the absolute most riveting topic in the world but I actually think this release could be quite important for the industry. We need to move away from data engineering jobs that have to write an entire 100-MB parquet file if they need to update a single row.
This is a bad title—the post is an analysis of Indeed job postings and what skills hiring managers are looking for in Data Scientist, Data Engineer, and Machine Learning Engineer roles. What I found interesting is that there have been some clear changes since I began following the space: Python is much more sought-after than R in the data scientist job postings, for example.
Good data if you haven’t read an analysis like this in a little while.
The tweetstorm started by the tweet above is quite good—worth checking out if you’re using (or interested in using) Facebook’s Prophet forecasting library.
I love this post! I spend a ton of time building charts and I’ve certainly run into each of the topics that the author points out, yet I never even thought of them as problems to be solved. Here’s an example from a section on Scalability:
When you analyze and visualize data from nontrivial data sets, it happens surprisingly often that a categorical attribute has too many values to be visualized effectively. How do you deal with that? And of course the same is true for when you have too many data objects. For instance, how do you visualize 100 time series in a line chart?
Google has started the process of open sourcing ZetaSQL, a SQL front-end that consists of a parser and analyzer. It is designed to work with a variety of back ends (…)
The fact that ZetaSQL is used as parser and analyzer for Google’s BigQuery’s Standard SQL dialect is what makes this release interesting.
This is kind of a big deal: Google has open-sourced the C++ code that parses BigQuery SQL and will soon become the parser for other SQL products it operates as well. This allows other products in the ecosystem to not just be able to treat SQL as a chunk of opaque text, but to actually understand it. This could result in better SQL editor front-ends, better query auto-formatters, better SQL error-handling…and much more. Very very cool.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123