Hiring. Getting Hired. The History of Postgres. Deep Learning for Coders. Kubeflow. [DSR #171]

Jan 27, 2019

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

The engineering team at Artsy has thought deeply about how they craft their interviewing process and they’ve downplayed traditional approaches:

Recent trends in hiring are white-boarding sessions, trivia questions, and hours of take-home assignments. At Artsy, we don’t use any of these. We often get asked why not - and how we assess technical skill without them.

Interviews for data positions typically suck. I think it’s often just a lack of time, creativity, and focus—having candidates solve problems on a whiteboard is very easy (for the interviewer). If you’re in a position where you hire data talent, I highly recommend this post, as well as Laszlo Bock’s book Work Rules!.

artsy.github.io • Share

Quality Over Quantity: Building the Perfect Data Science Project

The author works at SharpestMinds, a data science mentorship program. His advice draws on a large sample from the real world.

From what I’ve seen, a large fraction of aspiring data scientists get stuck in a spiral that involves building more and more data science projects with tools like sklearn and pandas, each featuring only incremental improvements over the last.

What I want to explore here is a way to break out of that spiral, by using some key lessons we’ve learned from seeing dozens of mentees leverage their projects into job interviews and ultimately, into offers.

If you’re in the portfolio-building process right now, this is a must-read.

towardsdatascience.com • Share

Looking Back at Postgres

Postgres has been around for a long time, and it has formed the core of many many analytical databases that you likely use today: Netezza, Greenplum, Aster Data, Par Accel (and thus Redshift), CitusDB. The history of the project goes back to UC Berkeley and Michael Stonebreaker.

You don’t need to know anything from this article in order to do your job, but Postgres is the air that we all breathe today. It’s useful to know your history!

Don’t let the arXiv link dissuade you—it’s an easy read.

arxiv.org • Share

Scaling Jupyter notebooks with Kubernetes and Tensorflow

One of the most common hurdles with developing AI and deep learning models is to design data pipelines that can operate at scale and in real-time. (…) In this article, you will explore how you can leverage Kubernetes, Tensorflow and Kubeflow to scale your models without having to worry about scaling the infrastructure.

This article actually introduced me an open source project called Kubeflow. From the docs:

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.

I haven’t dug in on this yet but I plan to over the coming week. If you have used Kubeflow before I’d love to hear about your experiences.

learnk8s.io • Share

fast.ai — Practical Deep Learning for Coders 2019

Launching today, the 2019 edition of Practical Deep Learning for Coders, the third iteration of the course, is 100% new material, including applications that have never been covered by an introductory deep learning course before (…)

fast.ai has become known as the MOOC to get an intro to deep learning. You need pre-existing Python experience but little else. If “intro to deep learning” is on your 2019 bucket list, check this out.

www.fast.ai • Share

rstudio::conf 2019 Takeaways

rstudio::conf 2019 just wrapped. This post is a great summary of the event, gathering it into three core themes:

Shiny improvements (super-cool to see)
R in production
Data science skill growth

I’m particularly interested in Shiny’s development. I don’t use it day-to-day but really do find it to be a novel tool in the analyst’s tool kit. I’m very glad to hear that it’s continuing to mature.

medium.com • Share

Thanks to our sponsors!

Mode Studio: A Free Toolkit for Every Analyst

Mode Studio combines a SQL editor, Python & R notebooks, and a visualization builder in one platform. And it’s free forever. Connect data from anywhere and analyze with your preferred language. Build custom visualizations or use our out-of-the-box charts.

mode.com • Share

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post

Ready for more?