Huge Jupyter Announcement. Getting Hired: What Most Junior Data Scientists Do Wrong. Things You Don't Need. [DSR #191]

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

Jupyter: And voilà!

Voilà turns Jupyter notebooks into standalone web applications.

Voilà is Project Jupyter’s attempt to make notebooks work better for the “presentation to non-technical data consumers” part of the analysis lifecycle. I won’t attempt to summarize the release here, suffice it to say that I think this is a big, important problem that has to-date been very much unsolved, and I think this development is an important one.

I believe that the eventual solution to dashboarding looks something like a flexible system where users can write “widgets"—charts, tables, interactive elements—in the language of their choosing, and then publish those widgets to a common dashboard where they can all interact (e.g. share the same dynamic parameters, respond to the same user interactions). A product like this could act as a common framework, allowing iteration / evolution within the framework instead of requiring users to constantly move from framework to framework in search of the next set of new features. I think Jupyter is a very natural place for something like this to come from, and it seems like the folks at Project Jupyter agree.


Why you’re not a job-ready data scientist (yet)

…there aren’t actually that many reasons why applicants get turned down from data science roles, and there’s a lot you can do to cover those bases.

This is an amazing article to help make sure you’re ready for your first set of data science interviews. It’s a list of the top four reasons why junior data scientists get turned down from jobs; use it as a gut check and fill in gaps as needed.


What 70% of Data Science Learners Do Wrong

Corporate data science is still a new field. Many academics haven’t worked on real problems for real businesses yet. So they teach textbook algorithms in a way that’s separated from data and business context. This can be intellectually fun. But, students are mistaken if they assume these courses prepare them well to work as data scientists.

Short, solid read. Good practical advice if you’re making your way into the field.


You Probably Don't Need a Data Dictionary

Great article by Michael Kaminsky on an important / hot topic:

While efforts to build a data dictionary are often undertaken out of a zeal for documentation that we would normally applaud, in practice data dictionaries and data catalogs end up being a large maintenance burden for little actual value, and tend to very quickly become out of date. Instead of investing in building out traditional data dictionaries, we recommend a few different approaches for achieving the same goals in ways that are less burdensome to maintain and better serve the original objectives as well.

I am 100% ideologically aligned with this post. The one thing I’d add is that projects like Marquez and Amundsen are redefining what is possible when it comes to metadata management / discovery / curation. I believe this is a big trend and has the potential to upend what is currently considered the state-of-the-art BI experience.


You Don't Need Kafka

Also in the department of “things you don’t need”: Kafka! From Vicki Boykis’ amazing new newsletter Normcore Tech:

Most startups (and big companies) don’t need the tech stack they have.

She outlines the many sociological reasons for this, which reduce down to a kind of tech FOMO / group think. This general dynamic is something I’ve absolutely observed in practice: tech companies love technology and want to use “cutting edge” stuff. But adding tech to your stack is expensive in terms of time / complexity. My favorite post on this topic is called Choose Boring Technology and I highly recommend it—it’s formed the basis of my thinking on this topic since it was written.

What technology are you using that you don’t actually need?


The BS-Industrial Complex of Phony A.I.

How hyping A.I. enriched investors, fooled the media, and confused the hell out of the rest of us

I’m not 100% confident that this link will still work when you get this newsletter; it’s to a Google cached version of the article. The author deleted the original (not entirely surprising given how controversial it was), but it was too good to miss.

The author used to work at Dynamic Yield, a company that was widely identified as an AI startup by the press. He outlines the slow process by which forces, both internal to a company and throughout the industry, conspired to push them to accept and embrace this identity: customers, investors, and the press all want to tell or be a part of an AI storyline while

the true geeks ignore the noise and build the future.

Strong agree.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123