Discover more from The Analytics Engineering Roundup
Getting Hired. Data Scientists at Airbnb. ML in Bigquery. Tufte-Worthy Charts in R. [DSR #146]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Getting Your First Data Science Job
I just started my new job at Airbnb as a data scientist a month ago, and I still feel that I’m too lucky to be here.
You’ve read versions of this post before: how I got my first data science job. Most of these posts focus on the practical aspects: what to study, what books to read, etc. This post is different. It focuses on the emotional experience of trying to break into the industry. And the stakes for this author are high: she was down to $600 in her bank account and her US work authorization was running out.
If you’re in the process right now, this post might be just what you need.
Most data science portfolios suck. Which is not that surprising: most humans aren’t highly motivated type-A strivers and creating an amazing portfolio takes a lot of work for an uncertain reward. But if you’re absolutely obsessed with data science, it’ll shine through in your public work. Having a great portfolio is the single best way to cut through the resume noise.
This is the most detailed post I’ve seen on how to put together a great data science portfolio. Get on it.
This Week's Most Useful Posts
The head of data science at Airbnb talks about the way their team, and the functional areas within it, has evolved over the lifetime of the company. The short version: Airbnb now has three teams within Data Science: Analytics, Algorithms, and Inference. They set clear expectations with the rest of the org on what members of these teams are experts in and how they can help the business.
This is not brand new territory, but it’s another step towards the industry creating some much-needed clarity around the term data scientist.
Today we’re announcing BigQuery ML, a capability inside BigQuery that allows data scientists and analysts to build and deploy machine learning models on massive structured or semi-structured datasets. BigQuery ML is a set of simple SQL language extensions which enables users to utilize popular ML capabilities, performing predictive analytics like forecasting sales and creating customer segmentations right at the source, where they already store their data.
I’m really fascinated by this. My belief is that more and more types of data processing will be made accessible via SQL language extensions in this way. The modern cloud analytic databases are incredibly powerful and more and more companies are dumping their data in to allow interactive analysis for growing analyst teams whose toolsets are primarily driven by SQL. Those analysts want to use ML algorithms, but frequently the piping of data from a SQL environment to a Python environment and back again is too much of a barrier. Plus, all the data transport is inefficient.
This product launch itself is interesting in and of itself, but moreso because of what it portends for the future. My bet is that in a year this feature supports more than linear and logistic regression.
Also from Google:
Today we’re launching Seedbank, a place to discover interactive machine learning examples which you can run from your browser, no set-up required.
The integration with Colab is cool—go from browsing a directory of models to a working notebook in a single click. This is a great tool to explore areas that you haven’t yet had the chance to work with yourself.
Elijah Meeks proves that 1) it is possible to create well-designed charts with pure R, and 2) shows (via survey) that practitioners largely don’t spend the time required to do this. My favorite bit:
You can learn a new dimensional reduction technique, which has concrete steps and measurable ways that it helps with your practice, or you can read about information design and come away with more questions than answers. It’s hard to evaluate and hard to reward good design in a field that doesn’t value it, and so it’s difficult to ask professionals to invest in it.
I agree that it is an incentives problem. Internal analytics is typically a cost center, which encourages a “get it done” approach, not a design-centric one.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123