Discover more from The Analytics Engineering Roundup
Software 2.0. Five Essential ML Algorithms. Data Science @ Booking.com. TensorFlow Lite. [DSR #112]
I’m doing a webinar with the folks at Mode Analytics called Future-Proof your Analytics Stack. Should be a good time—would love to have you there!
Enjoy this week’s roundup :)
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Andrej Karpathy, previously of OpenAI and now Director of AI at Tesla, is trying to have his Marc Andreessen moment, wherein he creates the mental model for the coming decade in tech. Decide for yourself:
The “classical stack” of Software 1.0 is what we’re all familiar with — it is written in languages such as Python, C++, etc. It consists of explicit instructions to the computer written by a programmer. By writing each line of code, the programmer is identifying a specific point in program space with some desirable behavior. In contrast, Software 2.0 is written in neural network weights.
Pete Warden (Google) adds in his own thoughts in an excellent follow-on post:
I know this will all sound like more deep learning hype, and if I wasn’t in the position of seeing the process happening every day I’d find it hard to swallow too, but this is real. Bill Gates is supposed to have said “Most people overestimate what they can do in one year and underestimate what they can do in ten years“, and this is how I feel about the replacement of traditional software with deep learning. There will be a long ramp-up as knowledge diffuses through the developer community, but in ten years I predict most software jobs won’t involve programming.
These posts are both must-reads.
Machine learning as a field has been around for a long time before deep neural networks took over the scene. Here are a list of the algorithms you need to know, so you can tackle any problem that comes your way.
Great behind-the-scenes look at a sophisticated data org:
Booking.com has 120+ data scientists and the community is growing bigger every day. Every one of us has a very different profile, background and working preference. For some, it’s their first job after PhD, whereas others come with a lot of work experience; some are Bayesian, some are Frequentist; some like R, others prefer Python; some strongly vouch for out-of-core learning (Vowpal Wabbit), while others prefer distributed computing using Spark and H2O. This diversity allows for continuous growth and learning from each other.
Eugene Wei was an analyst at Amazon.com back when the company was still called Amazon.com. In this post, he talks about his experiences producing the “Analytics Package"—the core piece of BI distributed to leaders in Amazon in 1997. It’s a fun trip down memory lane, and also presents excellent practical advice for how to optimize chart formatting for readability.
If only everyone thought this hard about their Excel charts.
Zocdoc now automatically detects your insurance information when you take a picture of your card. This post walks through their process of building the feature:
We were able to develop a proof of concept (and justify the effort) very quickly, with the availability of cloud-based GPU servers, pre-trained models, and open sourced architectures and code.
Very useful if you’re considering developing a product feature that relies on deep learning.
…as the adoption of machine learning models has grown exponentially over the last few years, so has the need to deploy them on mobile and embedded devices. TensorFlow Lite enables low-latency inference of on-device machine learning models.
Last week, Geoffrey Hinton and his team published two papers that introduced a completely new type of neural network based on so-called capsules. In addition to that, the team published an algorithm, called dynamic routing between capsules, that allows to train such a network.
For everyone in the deep learning community, this is huge news, and for several reasons. First of all, Hinton is one of the founders of deep learning and an inventor of numerous models and algorithms that are widely used today. Secondly, these papers introduce something completely new, and this is very exciting because it will most likely stimulate additional wave of research and very cool applications.
This is really very cool. Pairing is such a useful way to transmit knowledge and collaborate on challenging problems, but it’s always been hard for remote teams. We use Atom for most of our coding internally and I’m excited to put Teletype to work.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123