Discover more from The Analytics Engineering Roundup
Machine Learning Cheatsheet! Change Data Capture @ Airbnb. Devops. Two Fascinating Visualizations. [DSR #150]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Focus on: Machine Learning
I’m super-happy with this find! I came across it in a random Google search this past week and couldn’t believe my luck. It’s a living ReadTheDocs site explaining ML concepts. The table of contents is extensive, and content already exists for many of the sections, but it’s definitely a work in progress. The project is in active development, and the authors welcome PRs.
How can we understand the highly non-convex loss function associated with deep neural networks? Why does stochastic gradient descent even converge?
With thousands of papers submitted to ICML this year, it’s impossible for normals like me to hope to stay on top of current research without kind souls writing summary posts. This is an excellent one, summarizing a recent ICML talk.
Want to go deeper behind the math? Activation functions, loss functions, vectorization, and backpropogation, all explained in detail. Accessible yet substantive.
Other Stuff I Loved
Web-scale companies have been solving the problem of consistent state management across microservice architectures for years now. (Kafka originally grew out of this need from within Linkedin!) But each attempt builds on the efforts that came before it, and as such, gets more and more fully-featured.
This post outlines Airbnb’s approach for change data capture called SpinalTap. It is impressive, to say the least. The article is excellent, so I’ll let them describe it there. The thing I was most impressed by was the work they’ve done on continuous data validation.
This is far beyond what most other companies have in place. Airbnb has open sourced-several of the components (link at the bottom of the article).
Going from development notebook to production implementation is still one of the biggest problems that most data scientists face. Many just don’t have experience working in modern software development environments. This post is a good index of things you should at least be familiar with before hoping to get a line of code into production.
I don’t 100% agree with all of the recommendations, mostly because there are now opportunities to do things like deploy trained models as API endpoints, plus using AWS Lambda / Google Cloud Functions. Both of these can get around the “our production code is in Java” problem. Even so, this writeup is a good overview.
The visualization above highlights GEOS FP model output for aerosols on August 23, 2018. On that day, huge plumes of smoke drifted over North America and Africa, three different tropical cyclones churned in the Pacific Ocean, and large clouds of dust blew over deserts in Africa and Asia. The storms are visible within giant swirls of sea salt aerosol (blue), which winds loft into the air as part of sea spray.
Most visualizations result from a few lines of code or a couple of mouse clicks. It’s impressive what can be done with real time and expertise! Video below.
Check out the author’s Twitter feed for more great work.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123