How Prophet Works. Mobile Data Viz. ZetaSQL. Delta Lake. [DSR #185]

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

Techniques for Data Visualization on Mobile & Desktop

I remember sitting in a board meeting in ~2014 where a VC said “Mobile is huge—we need to make sure that our dashboards look great on mobile.” Five years later, there are still very few dashboards that look great on mobile, and most of those that do are meticulously crafted by hand. In an increasingly mobile world, data is still something we largely consume at our desktops.

This post is phenomenal. It discusses the work required to effectively get a data visualization to work in both desktop and mobile contexts. Sometimes, the best answer is to produce an entirely different visualization to communicate the same data, as in the image above. Sometimes the best answer is to replace the dynamic version with an image.

The author discusses a variety of strategies, sharing examples for each. I’ve never before seen a post that treats this topic with such clarity and depth. Maybe this is why most of our data viz is still desktop only:

Making a data visualization look good and be effective on both mobile and desktop is one of the most difficult aspects of my work.


Databricks: Open Sourcing Delta Lake

Databricks’ announcement of a new file format, Delta, promises things like ACID transactions and data versioning. If you’re a Snowflake user, you’ll recognize these as both major selling points of that platform, so it’s neat to see these features make their way into an open source context.

I’ve been spending more time in the Spark / Databricks ecosystem recently on a client project and have had some file format challenges that have made me develop a new appreciation for their criticality. Delta is already solving real problems for us on that project.

This may not be the absolute most riveting topic in the world but I actually think this release could be quite important for the industry. We need to move away from data engineering jobs that have to write an entire 100-MB parquet file if they need to update a single row.


What Does an Ideal Data Scientist’s Profile Look Like?

This is a bad title—the post is an analysis of Indeed job postings and what skills hiring managers are looking for in Data Scientist, Data Engineer, and Machine Learning Engineer roles. What I found interesting is that there have been some clear changes since I began following the space: Python is much more sought-after than R in the data scientist job postings, for example.

Good data if you haven’t read an analysis like this in a little while.


Sean J. Taylor


📈Long thread on how Prophet works📈

- Instead of sharing slides I'm transcribing my talk to tweets with lots of GIFs :)
- Thanks to @_bletham_ for all his help.

1:30 PM - 30 Apr 2019

The tweetstorm started by the tweet above is quite good—worth checking out if you’re using (or interested in using) Facebook’s Prophet forecasting library.

Neglected (Yet Foundational) Concepts in the Pedagogy of Data Visualization

I love this post! I spend a ton of time building charts and I’ve certainly run into each of the topics that the author points out, yet I never even thought of them as problems to be solved. Here’s an example from a section on Scalability:

When you analyze and visualize data from nontrivial data sets, it happens surprisingly often that a categorical attribute has too many values to be visualized effectively. How do you deal with that? And of course the same is true for when you have too many data objects. For instance, how do you visualize 100 time series in a line chart?


ZetaSQL Parser & Analyzer Code Released

Google has started the process of open sourcing ZetaSQL, a SQL front-end that consists of a parser and analyzer. It is designed to work with a variety of back ends (…)

The fact that ZetaSQL is used as parser and analyzer for Google’s BigQuery’s Standard SQL dialect is what makes this release interesting.

This is kind of a big deal: Google has open-sourced the C++ code that parses BigQuery SQL and will soon become the parser for other SQL products it operates as well. This allows other products in the ecosystem to not just be able to treat SQL as a chunk of opaque text, but to actually understand it. This could result in better SQL editor front-ends, better query auto-formatters, better SQL error-handling…and much more. Very very cool.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123