Data Science Roundup #76: Google buys Kaggle, AutoML, and 3 Great Posts on Hiring!

Are you looking for a job (or hiring for one)? Make sure not to miss the three great posts below on hiring and getting hired. Also, Fishtown Analytics is looking for a data scientist!

- Tristan

Referred by a friend? Sign up here!

Two Posts You Can't Miss

The Current State of Automated Machine Learning

This post will provide a brief explanation of AutoML, argue for its justification and adoption, present a pair of contemporary tools for its pursuit, and discuss AutoML’s anticipated future and direction.

Randy Olson, whose research focuses on hyperparameter optimization, says the following:

In the near future, I see automated machine learning (AutoML) taking over the machine learning model-building process: once a data set is in a (relatively) clean format, the AutoML system will be able to design and optimize a machine learning pipeline faster than 99% of the humans out there.

There was a lot of new information in this post for me—I hadn’t realized how far along some of this R&D had gotten. Highly recommended if you’re not already familiar with this world.


Google confirms its acquisition of data science community Kaggle

Google today said it is acquiring Kaggle, an online service that hosts data science and machine learning competitions…

There are a lot of reasons that Google, whose future increasingly depends on being the the leader in AI, would want to buy the site that hosts the largest community of data scientists in the world. So far, sources state that there are no major plans to change aspects of the community (including its name). Kaggle’s CEO seems excited about new resources:

Making Google Cloud technology available to our community will allow us to offer access to powerful infrastructure, scalable training and deployment services and the ability to store and query large data sets.


This Week's Top Posts

The Most Underutilized Function in SQL

The first post I’ve written in a long time.

In this post I’m going to show you two uses for md5() that make it one of the most powerful tools in my SQL kit.



Machine-Learning Algorithm Predicts Laboratory Earthquakes

Earthquakes were always considered one of the best examples of difficult (or impossible) to predict phenomena, but:

The breakthrough has astonished geologists and raises the possibility that real earthquake prediction could be next.


How to change careers and become a data scientist - one quant's experience

This post is packed with tons of great advice. My favorite: “Do whatever you can to move to the Bay Area!” People outside the bay area often don’t want to admit it, but this advice is spot-on.

My first year in San Francisco was a period of intense learning for me: I attended tons of meetups, completed several online courses, participated in numerous workshops and conferences, learned a lot by working at a data-focused start-up, and most importantly met scores of people who I was able to ask questions of. I completely under-estimated how amazing it is to be able to interact regularly with the people who are building the tools and technology that excite me most.

Hiring a data scientist

This is the most comprehensive guide I’ve read for how to hire a data scientist. It delves into job descriptions, take-home tasks, interview questions, and how to attract diverse talent. If you’re thinking about growing your team, this is a must-read.


Some Reflections on Being Turned Down for a Lot of Data Science Jobs

There are a ton of reasons why you might have gotten turned down for a job, and many of them have more to do with the company than with you. Short, insightful.


Introducing Similarity Search at Flickr

Flickr certainly isn’t the first to implement image similarity search, but I haven’t seen Google, Facebook, or Apple publish quite as much about their respective approaches.

A Simple Trending Products Recommendation Engine in Python

One engineer’s journey towards making his product recommendations less boring.


Stopping GAN Violence: Generative Unadversarial Networks

This is…special:

While the costs of human violence have attracted a great deal of attention from the research community, the effects of the network-on-network (NoN) violence popularised by Generative Adversarial Networks have yet to be addressed. In this work, we quantify the financial, social, spiritual, cultural, grammatical and dermatological impact of this aggression and address the issue by proposing a more peaceful approach which we term Generative Unadversarial Networks (GUNs).


Data viz of the week

Non-trivial visualization challenge handled elegantly.

Non-trivial visualization challenge handled elegantly.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Growth

Fishtown Analytics works with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123