Wow! Four years and 200 issues. 🥳 While the Data Science Roundup started as more of a marketing effort for my fledgling company, I continue to do it for a completely different reason: you, dear reader, hold me accountable.
The data ecosystem moves very quickly, and in my role as the CEO of Fishtown Analytics it’s critical that I have broad visibility into the entire space. But reading hundreds of headlines and dozens of articles every single week would be all-too-easy to deprioritize behind closing the next customer or building the next feature. It’s the 8,000 of you who make sure I put in the work. So thanks.
The contents of this newsletter are the things I find strategically valuable for me to know—they are the things that inform my model of the world. If you find them valuable, and if they impact your model as well, then so much the better.
Thanks for all of the support over the last four years. Here’s to another four.
This week's best data science articles
In my experience, every nontrivial machine learning project is eventually stitched together with bug-ridden and unmaintainable internal tools. These tools — often a patchwork of Jupyter Notebooks and Flask apps — are difficult to deploy, require reasoning about client-server architecture, and don’t integrate well with machine learning constructs like Tensorflow GPU sessions.
This new open core company is founded by a who’s who of ML from GoogleX and Zoox and follows the “productizing internal tooling we built” playbook (which often produces fantastic results). The linked post is from the team, here’s the TechCrunch post about the launch.
My thoughts: Streamlit is not actually solving a data science problem, it’s solving a web development problem that data science teams have. This is an under-invested area; I could imagine lots of shitty internal tools being wiped away in favor of this.
Fantastic. From Julia Evans. All data teams should have this printed.
Oh wow. You’ve likely seen a career ladder before—roughly, a set of stages that employees are expected to progress through as they develop their careers. Most companies’ engineering-focused career ladders are…uninspired…shall we say. And companies that are world-class at developing engineering talent don’t tend to share theirs.
Which is what makes this particular release unique. I haven’t seen a company of Etsy’s caliber release their internal ladder publicly. It will be a tremendous resource for other companies attempting to build high-performance technical teams.
Share this with your CTO and VP Data.
As anyone who’s read the Roundup for any time knows, I believe that data analysts should work more like software engineers. And while we’re already watching that change percolate through the industry, there are plenty of practices that haven’t made the leap. Doing RFCs (Requests for Comment) is one of these—I’ve never seen a data team do an RFC in the way that a software engineering team would.
This is an excellent piece on how Squarespace improved their RFC process. As I read it I couldn’t help but think about the times when it would / would not be a good fit for data projects. I think there is plenty of potential applicability.
…[A]lways think about how to build a better product for users — think about users’ needs and experience and try to build the data model that will best serve those considerations.
Solid overview article on the topic. If you’re new to analytics engineering this is a great way to get exposed.
Chances are you’re not working in quantum computing given that there are, what, less than a thousand people on the planet that are. That said, I can’t help but be interested in quantum computing given the potentially massive implications and the level of progress in recent years. This post outlines how one Google team designed a novel ML-based algorithm for their quantum control systems (which run on classical computers).
There are widespread rumors that Google has achieved quantum supremacy. If this is a topic that you’ve been interested in the past but haven’t caught up on recently, now is a good time to dive back in. Lots happening.
Thanks to our sponsors!
Analytics engineering is the data transformation work that happens between loading data into your warehouse and analyzing it. dbt allows anyone comfortable with SQL to own that workflow.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123