Discover more from The Analytics Engineering Roundup
A/B Testing @ Stack Overflow. Interviewing Data Scientists. Testing ML. [DSR #109]
Light week this week—there was a surprisingly short list of stuff I found worth bringing to your attention. A couple of good tidbits though!
Also: if you’re a dbt user and haven’t already upgraded to 0.9.0, do it. It’s big.
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Data viz of the week
Julia Silge writes a nuanced, realistic take on A/B testing at Stack Overflow. My favorite paragraph comes at the end of the post:
Sometimes the results of an A/B test can be inconclusive, with no measurable difference between the baseline and new version, either positive or negative. What should we do then? Often we stay with the original version of our feature, but in some situations, we still decide to make a change to a new version, depending on other product considerations.
My experiences in running A/B tests are similar: the clear-cut cases are easy, but it’s hard when you’re running a business and the test outcome isn’t definitive. This isn’t covered in your stats textbook.
There are three key areas to screen for a data scientist. Most interviews only cover one.
This is the best single article I’ve read on conducting interviews for data science roles. This is an extremely hard role to hire for, and it takes a lot of work to get your interview process right. If you’re considering hiring in the coming months, this is a must-read.
ML code is really, really hard to test. But it’s worth it:
One of the main principles I learned during my time at Google Brain was that unit tests can make or break your algorithm and can save you weeks of debugging and training time.
This article goes deep into techniques for writing unit tests on your algorithms. Highly recommended.
Definitely worth the read, but if you want to save a click:
Taking the default loss function for granted
Using one algorithm/method for all problems
Forget about outliers
Not properly dealing with cyclical features
L1/L2 Regularization without standardization
Interpreting absolute value of coefficients from linear or logistic regression as feature importance
This is an important piece of knowledge to have in your brain:
Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.
This fact steadily becomes worrisome as more and more systems are powered by artificial intelligence — and many of them are crucial for our safe and comfortable life. Banks, surveillance systems, ATMs, face recognition on your laptop — and very very soon, self-driving cars.
You probably use some variation of random() in at least one language. If you’ve never dug beneath the surface to figure out how those random numbers were generated, this is a fun (if mostly useless) read 😊
Data Viz of the Week
"Congestion in London is driving people off the buses." Wow!
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123