A/B Testing @ Stack Overflow. Interviewing Data Scientists. Testing ML. [DSR #109]

Light week this week—there was a surprisingly short list of stuff I found worth bringing to your attention. A couple of good tidbits though!

Also: if you’re a dbt user and haven’t already upgraded to 0.9.0, do it. It’s big.

- Tristan

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

Data viz of the week

A/B Testing at Stack Overflow

Julia Silge writes a nuanced, realistic take on A/B testing at Stack Overflow. My favorite paragraph comes at the end of the post:

Sometimes the results of an A/B test can be inconclusive, with no measurable difference between the baseline and new version, either positive or negative. What should we do then? Often we stay with the original version of our feature, but in some situations, we still decide to make a change to a new version, depending on other product considerations.

My experiences in running A/B tests are similar: the clear-cut cases are easy, but it’s hard when you’re running a business and the test outcome isn’t definitive. This isn’t covered in your stats textbook.

stackoverflow.blogShare

How to Job Interview a Data Scientist

There are three key areas to screen for a data scientist. Most interviews only cover one.

This is the best single article I’ve read on conducting interviews for data science roles. This is an extremely hard role to hire for, and it takes a lot of work to get your interview process right. If you’re considering hiring in the coming months, this is a must-read.

blog.modeanalytics.comShare

How to Unit Test Machine Learning Code

ML code is really, really hard to test. But it’s worth it:

One of the main principles I learned during my time at Google Brain was that unit tests can make or break your algorithm and can save you weeks of debugging and training time.

This article goes deep into techniques for writing unit tests on your algorithms. Highly recommended.

medium.comShare

Top 6 errors novice machine learning engineers make

Definitely worth the read, but if you want to save a click:

  1. Taking the default loss function for granted

  2. Using one algorithm/method for all problems

  3. Forget about outliers

  4. Not properly dealing with cyclical features

  5. L1/L2 Regularization without standardization

  6. Interpreting absolute value of coefficients from linear or logistic regression as feature importance

medium.comShare

How Adversarial Attacks Work

This is an important piece of knowledge to have in your brain:

Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.

This fact steadily becomes worrisome as more and more systems are powered by artificial intelligence — and many of them are crucial for our safe and comfortable life. Banks, surveillance systems, ATMs, face recognition on your laptop — and very very soon, self-driving cars.

blog.xix.aiShare

How Computers Make Random Numbers

You probably use some variation of random() in at least one language. If you’ve never dug beneath the surface to figure out how those random numbers were generated, this is a fun (if mostly useless) read 😊

medium.comShare

Data Viz of the Week

"Congestion in London is driving people off the buses." Wow!

"Congestion in London is driving people off the buses." Wow!

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.

fishtownanalytics.comShare

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.comShare

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123