Discover more from The Analytics Engineering Roundup
TensorFlow Lite. General-Purpose Language Models. Excel(!?). SQL Data Pipelines. [DSR #177]
❤️ Want to support this project? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
This week's best data science articles
Pete Warden is on the TensorFlow Lite team at Google and has written quite a lot of great stuff on the intersection of IoT and deep learning. This post is a transcription of a conference talk he gave in which he demos a speech recognition model using TF lite that takes up a whopping 20K of RAM and detects the word “yes”.
This space is still very early, but it’s tremendously exciting to see what’s happening in it. Most of the industry is still thinking about plugged in, cloud-connected applications of data science, but that’s going to evolve…
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training.
Stunning results. And the OpenAI team clearly thinks so as well: this is the first time that they haven’t actually released their work along with the announcement, citing concerns about malicious usage.
If you read nothing else, read the unicorn article written by the model. Very impressive.
I love this post. It’s a useful walkthrough of the many options you have of configuring a GPU-enabled training environment in the cloud. My favorite part was the walkthrough of the various services (including startups Paperspace, FloydHub, and Lambda Labs) that give you point-and-click access to an environment.
I now work under the paradigm of “Do not move data to code, move code to your data”. Python moves your data to the code while SQL acts on it in place.
Yep. This is the reason to express ETL workloads in SQL, and it’s one that many folks still don’t understand. The author saw a 14x performance improvement moving some simple data transformations from Pandas to Postgres… that number would’ve been much bigger on a real analytical database. Use Python when it’s actually needed.
Wow would this have been helpful to 20-year-old me. Watch the short animated gif in the article and you’ll get all you need to know about the new feature.
I can’t believe I’m linking to something about Excel but it actually is quite useful…
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123