Hiring. Getting Hired. The History of Postgres. Deep Learning for Coders. Kubeflow. [DSR #171]
❤️ Want to support this project? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
The engineering team at Artsy has thought deeply about how they craft their interviewing process and they’ve downplayed traditional approaches:
Recent trends in hiring are white-boarding sessions, trivia questions, and hours of take-home assignments. At Artsy, we don’t use any of these. We often get asked why not - and how we assess technical skill without them.
Interviews for data positions typically suck. I think it’s often just a lack of time, creativity, and focus—having candidates solve problems on a whiteboard is very easy (for the interviewer). If you’re in a position where you hire data talent, I highly recommend this post, as well as Laszlo Bock’s book Work Rules!.
Quality Over Quantity: Building the Perfect Data Science Project
The author works at SharpestMinds, a data science mentorship program. His advice draws on a large sample from the real world.
From what I’ve seen, a large fraction of aspiring data scientists get stuck in a spiral that involves building more and more data science projects with tools like sklearn and pandas, each featuring only incremental improvements over the last.
What I want to explore here is a way to break out of that spiral, by using some key lessons we’ve learned from seeing dozens of mentees leverage their projects into job interviews and ultimately, into offers.
If you’re in the portfolio-building process right now, this is a must-read.
towardsdatascience.com • Share
Postgres has been around for a long time, and it has formed the core of many many analytical databases that you likely use today: Netezza, Greenplum, Aster Data, Par Accel (and thus Redshift), CitusDB. The history of the project goes back to UC Berkeley and Michael Stonebreaker.
You don’t need to know anything from this article in order to do your job, but Postgres is the air that we all breathe today. It’s useful to know your history!
Don’t let the arXiv link dissuade you—it’s an easy read.
Scaling Jupyter notebooks with Kubernetes and Tensorflow
One of the most common hurdles with developing AI and deep learning models is to design data pipelines that can operate at scale and in real-time. (…) In this article, you will explore how you can leverage Kubernetes, Tensorflow and Kubeflow to scale your models without having to worry about scaling the infrastructure.
This article actually introduced me an open source project called Kubeflow. From the docs:
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
I haven’t dug in on this yet but I plan to over the coming week. If you have used Kubeflow before I’d love to hear about your experiences.
fast.ai — Practical Deep Learning for Coders 2019
Launching today, the 2019 edition of Practical Deep Learning for Coders, the third iteration of the course, is 100% new material, including applications that have never been covered by an introductory deep learning course before (…)
fast.ai has become known as the MOOC to get an intro to deep learning. You need pre-existing Python experience but little else. If “intro to deep learning” is on your 2019 bucket list, check this out.
rstudio::conf 2019 just wrapped. This post is a great summary of the event, gathering it into three core themes:
Shiny improvements (super-cool to see)
R in production
Data science skill growth
I’m particularly interested in Shiny’s development. I don’t use it day-to-day but really do find it to be a novel tool in the analyst’s tool kit. I’m very glad to hear that it’s continuing to mature.
Thanks to our sponsors!
Mode Studio: A Free Toolkit for Every Analyst
Mode Studio combines a SQL editor, Python & R notebooks, and a visualization builder in one platform. And it’s free forever. Connect data from anywhere and analyze with your preferred language. Build custom visualizations or use our out-of-the-box charts.
Stitch: Simple, Powerful ETL Built for Developers
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123