Getting Hired. More Books! Data Governance. Big-O Complexity. [DSR #129]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
The Week's Most Useful Posts
Many people want to become data scientists and data science jobs can be quite hard to get.
Data scientist turnover is quite high.
How do those two facts reconcile? This author believes—and I totally agree—that the problem is one of mismatched expectations. This article goes deep into the dynamics that cause this.
This post has really caught fire since its publication just a few days ago with almost 5,000 “claps” on Medium at the time of my writing this. Apparently it has hit a nerve :) Highly recommended.
Last week’s post on the top 5 business books for data scientists got more clicks than anything I’ve ever linked to. Here’s another one published on Hacker Noon this past week, this time focused on building your stats foundations. I’ve had these books recommended to me before; it’s time I put in the time to read them.
I love this post! Here’s the first couple of lines:
I’ve read a number of articles stating how hard it was to get into Analytics and Data Science. This hasn’t been my experience(…)
That immediately caught my attention—everyone likes to write about how hard it can be to get into data science, how many applications, how many technologies one needs to know, how impersonal and flawed the interviewing process can be. And here’s someone saying “this wasn’t that hard”. So…what did the author do differently?
Turns out: she just made good decisions, did good work, and stuck with it. She got a master’s degree, built relationships with professors, used them to get a first job, did good work, learned a lot, and built from there. She repeatedly emphasizes the importance of core skills—Excel, SQL, Tableau, storytelling—over the more “exciting” skills she’s used (neural nets, Hive).
Comcast’s system of storing schemas and metadata enables data scientists to find, understand, and join data of interest.
Data governance is a topic that has traditionally been dominated by massive companies and high-priced vendor solutions that were not particularly innovative. But that’s changing: I’m seeing data practitioners at companies large and small start to really care about and innovate around data governance. This is for a couple of reasons, IMO:
Practitioners truly want their business users to self-serve, and they realize that can’t realistically happen without good documentation.
Regulations like GDPR will grind entire data organizations to a halt without real governance in place.
This post is a solid introduction to the topic and how things have evolved over the past decade or so. Comcast is clearly doing some cool things (which they present at the end) but even if you’re just checking your analytic code into git and accompanying it with markdown files you’re already ahead of 80% of organizations.
This post made the front page of Hacker News this past week. Bloom filters were a new topic for me—definitely a gap in my own knowledge—and I’m betting they will be for many of you as well.
If you’ve not heard of probabilistic filters, this is a must-read. So many uses.
We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment.
This is a full online paper, complete with lots of interactive elements that illustrate the points that it’s making. I found the entire read fascinating; here’s my favorite part:
…in our initial experiments, we noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment never shoots a single fireball. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.
Or, in simpler terms: the AI discovered cheat codes.
Need to brush up on your matrix multiplication? This post presents a straightforward refresher, including concepts, examples, and diagrams.
Are you familiar with the concept of Big O algorithm complexity? How about P ≠ NP?
No? Read this. Short and sweet.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123