Discover more from The Analytics Engineering Roundup
Google AdaNet: Automating Ensembles. Growing ICs into Managers @ Lyft. Building Fast Models @ Taboola. [DSR #160]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
Today, we’re excited to share AdaNet, a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet builds on our recent reinforcement learning and evolutionary-based AutoML efforts to be fast and flexible while providing learning guarantees. Importantly, AdaNet provides a general framework for not only learning a neural network architecture, but also for learning to ensemble to obtain even better models.
Google is investing heavily in automated model development, which makes all-too-much sense given their desire to sell as much TPU time as possible. These ensembles often require significantly more compute horsepower to develop and train than simpler models developed by hand.
The war between AWS, GCP, and Azure is one of the biggest competitions in tech today, and Google’s biggest edge is in ML. Expect them to continue to push on this hard.
(Sponsored by Domino Data Lab–thanks for supporting the Roundup!)
In this post, Ricky Chachra, Research Science Manager at Lyft, provides insight for companies looking to home-grow their promising individual contributors (ICs) into effective managers. He reflects on his journey at Lyft, where he started as a data/research scientist and transitioned into a science management role.
This is an important topic. The author, one of the original data scientists at Lyft, shares how manager / IC relationships hit a 1:15 ratio at one point—hardly a ratio that allows for any real interaction or growth on either side. The industry as a whole needs more managers to develop the constant influx of new talent, and the only way to create them is to grow them from ICs.
Uber open source projects leads give updates on seven of our projects, all of which will be showcased at the upcoming Uber Open Summit 2018.
Uber is turning up the intensity of its open source efforts—they’re featuring seven projects at their first annual Uber Open Summit event. Get the preview of each project in this post.
Horovod, in particular, seems quite mature with contributions from Amazon and IBM and being deployed to solve massive problems in production today.
A portrait created using an AI program has fetched $435,000 in auction at Christie’s, blowing the expected price of $7,000 to $10,000 out of the water. It’s the first auction for an AI-generated portrait, sold to an anonymous bidder. It signals “the arrival of AI art on the world auction stage,” Christie’s said.
That’s…about all there is to know unless you want to hop into the debate about crediting original authors of open source code. I’m a bit shocked at the price tag given the infinitely-reproducible nature of the work. Someone please explain this to me.
If you’re not familiar with Taboola, they serve a tremendous number of impressions around the web in the “You might also like…” sections at the bottom of your favorite articles. In this post, a recommendations engineer talks about how they engineered a sophisticated model to return predictions in less than 200ms:
Sometimes using state of the art models can be problematic due to their computational demands. By caching intermediate results (embeddings) we were able to overcome this challenge, and still enjoy state of the art results.
Model prediction performance a great topic, and one rarely written about. If you’ve seen interesting stuff written on this topic, shoot me a link.
It’s been about a year and a half since I joined Automattic as a remote data scientist. This is the longest I’ve been in one position since finishing my PhD in 2012. In this post, I briefly discuss some of the top pluses and minuses of remote work, based on my experience so far.
If you’re an experienced data scientist, working remotely should definitely be on the menu. It obviously doesn’t work for everyone (and every company!) but, as this post presents, it has serious strengths to recommend it.
Potentially the most valuable part of the article is the link to the list of remote companies. If you’re hunting, this list is golden. I personally wouldn’t recommend this to less experienced data professionals.
This episode of the Data Engineering Podcast was excellent. I’ve been listening to the podcast for the past six months or so and I always learn something from the no-nonsense, fairly technical conversations that take place on it.
This is by far my favorite episode so far. Matthew Seal is involved in deploying tens of thousands of notebooks at Netflix. I’ve covered Netflix’s investment in notebooks before, and this conversation digs deeper into that topic. My favorite part was Matthew’s thoughts on where the notebook ecosystem is currently weak and where extensions were needed. There were insights that could spawn meaningful projects in there.
Thanks to our sponsors!
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123