Getting Hired. More Books! Data Governance. Big-O Complexity. [DSR #129]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

The Week's Most Useful Posts

Why so many Data Scientists are Leaving their Jobs


  1. Many people want to become data scientists and data science jobs can be quite hard to get.

  2. Data scientist turnover is quite high.

How do those two facts reconcile? This author believes—and I totally agree—that the problem is one of mismatched expectations. This article goes deep into the dynamics that cause this.

This post has really caught fire since its publication just a few days ago with almost 5,000 “claps” on Medium at the time of my writing this. Apparently it has hit a nerve :) Highly recommended.


Aspiring Data Scientists: Start to learn Statistics with these 6 books!

Aspiring Data Scientists: Start to learn Statistics with these 6 books!

Last week’s post on the top 5 business books for data scientists got more clicks than anything I’ve ever linked to. Here’s another one published on Hacker Noon this past week, this time focused on building your stats foundations. I’ve had these books recommended to me before; it’s time I put in the time to read them.


What Getting A Job In Data Science Might Look Like

I love this post! Here’s the first couple of lines:

I’ve read a number of articles stating how hard it was to get into Analytics and Data Science. This hasn’t been my experience(…)

That immediately caught my attention—everyone likes to write about how hard it can be to get into data science, how many applications, how many technologies one needs to know, how impersonal and flawed the interviewing process can be. And here’s someone saying “this wasn’t that hard”. So…what did the author do differently?

Turns out: she just made good decisions, did good work, and stuck with it. She got a master’s degree, built relationships with professors, used them to get a first job, did good work, learned a lot, and built from there. She repeatedly emphasizes the importance of core skills—Excel, SQL, Tableau, storytelling—over the more “exciting” skills she’s used (neural nets, Hive).


Data Governance and the Death of Schema on Read

Comcast’s system of storing schemas and metadata enables data scientists to find, understand, and join data of interest.

Data governance is a topic that has traditionally been dominated by massive companies and high-priced vendor solutions that were not particularly innovative. But that’s changing: I’m seeing data practitioners at companies large and small start to really care about and innovate around data governance. This is for a couple of reasons, IMO:

  1. Practitioners truly want their business users to self-serve, and they realize that can’t realistically happen without good documentation.

  2. Regulations like GDPR will grind entire data organizations to a halt without real governance in place.

This post is a solid introduction to the topic and how things have evolved over the past decade or so. Comcast is clearly doing some cool things (which they present at the end) but even if you’re just checking your analytic code into git and accompanying it with markdown files you’re already ahead of 80% of organizations.


Probabilistic Filters By Example: Cuckoo Filter and Bloom Filters

This post made the front page of Hacker News this past week. Bloom filters were a new topic for me—definitely a gap in my own knowledge—and I’m betting they will be for many of you as well.

If you’ve not heard of probabilistic filters, this is a must-read. So many uses.


Google Brain: Can Agents Learn Inside of their own Dreams?

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment.

This is a full online paper, complete with lots of interactive elements that illustrate the points that it’s making. I found the entire read fascinating; here’s my favorite part:

…in our initial experiments, we noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment never shoots a single fireball. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.

Or, in simpler terms: the AI discovered cheat codes.


Linear Algebra for Deep Learning

Linear Algebra for Deep Learning

Need to brush up on your matrix multiplication? This post presents a straightforward refresher, including concepts, examples, and diagrams.


What Every Statistician Should Know About Computer Science

Are you familiar with the concept of Big O algorithm complexity? How about P ≠ NP?

No? Read this. Short and sweet.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123