Discover more from The Analytics Engineering Roundup
Engineering Career Paths. Peeking at Etsy. Uncertainty Estimates. CLI Apps. [DSR #157]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This Week's Most Useful Posts
Brilliant post by Julia Evans: this gets to a topic that I’ve been thinking a lot about these days.
At Fishtown Analytics, we believe that analytics has become a subfield of software engineering. This might not have been the case 10-30 years ago when analysis was conducted primarily in proprietary GUIs, but today most sophisticated analysis is conducted using code in open-source tools languages. As a result, modern data analysts / scientists / engineers are fundamentally different jobs from those of decades past, and that means they have different career trajectories.
But what is the career trajectory of a data analyst or scientist today? That is largely an unanswered question: we just don’t have enough years of experience with this new model to have a consensus answer. But the industry needs a good answer if it’s going to grow the next generation of analytical talent.
As always, the best place to look for inspiration is within software engineering—the field has had decades to think about this question. This post is one of the best I’ve seen on the topic. Senior engineers actually do different work than junior engineers, and it’s critical to recognize exactly what that work is and have shared expectations.
Do you know what your career path is? I’d be very curious to hear your thoughts: drop me a line.
In this post, we investigate (…) how to peek at experimental results early in order to increase the velocity of our decision-making without sacrificing the integrity of our results.
Have you ever run an A/B test? Have you ever seen the lines in your A/B testing tool cross over the threshold and wanted to immediately stop the test and declare victory? I have. It’s so tempting—you desperately want your hypothesis to be right!
Looking at the results before the experiment is over—peeking—can ruin the statistical basis of your experiment if you’re not thoughtful about how you do it. This article goes into how Etsy enables peeking (there are real reasons you might want to end an experiment early!). They’ve designed this into their internal A/B testing tool and show screenshots.
Very interesting—new to me.
Stop installing Tensorflow using pip! Use conda instead.
Hah. Short, very useful. 5-8x performance boost. Read the 2 minute article for more info.
I’m trying very hard to sound all calm and sangfroid with you right now, and I am, intellectually. But I’m as emotional as anyone else. People at AQR would laugh at me for trying to sound calm, because they get my emails: “Another frickin’ down day!” When we have a bad period, I want to figure out why, and I want an answer.
Clifford Asness runs $226B quant fund AQR, and AQR hasn’t had a good year. This interview focuses on a topic I find fascinating: the emotional challenges of following your model even when it’s having an off period. Even good forecasts are wrong, and during particularly tough periods its easy to question your model, your assumptions, your data.
Maybe an update is warranted. Maybe it’s just an off month. It’s a hard (and stressful) question to try to answer. At least you (probably!) don’t have $226B riding on it.
I never studied statistics and learned it kind of “backwards” through machine learning, so I consider myself more as a hacker who picked up statistics along the way. Earlier this year I had some basic knowledge of bootstrapping and confidence intervals, but along the way I had to pick up a whole arsenal of tricks going all the way to Monte Carlo methods and inverse Hessians. It seemed useful to share some of the methods I’ve used the most, so I wrote this post!
At Heroku, we’ve come up with a methodology called the 12 factor app. It’s a set of principles designed to make great web applications that are easy to maintain. In that spirit, here are 12 CLI factors to keep in mind when building your next CLI application.
CLIs are the most-often used way for data engineers to build tooling for their internal teams. If you’re building a CLI app, this post is a must-read.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123