New Deep Learning Architectures, GPU Databases, Histograms, & more! 👀 [DSR #99]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
Two Posts You Can't Miss
An Intuitive Guide to Deep Network Architectures
I got pretty tired of reading “guides to deep learning” a while ago, but am always on the lookout for ones that bring something new to the table. This is the first rundown I’ve seen on the major advances in network architecture over the past couple of years. Very digestible, very interesting. Even if you’re not going to put this to work tomorrow, highly recommended.
Image Augmentation for Deep Learning using Keras and Histogram Equalization
In order to combat the high expense of collecting thousands of training images, image augmentation has been developed in order to generate training data from an existing dataset. Image Augmentation is the process of taking images that are already in a training dataset and manipulating them to create many altered versions of the same image. This both provides more images to train on, but can also help expose our classifier to a wider variety of lighting and coloring situations so as to make our classifier more robust.
This is the single best post I’ve seen on the topic of image pre-processing, an increasingly critical skill in a wide range of use cases. The writeup and code for histogram normalization (pictured above) was particularly cool.
Whether of not you work with image data today, this is a must-read.
This Week's Top Posts
Histograms are a way to summarize a numeric variable. They use counts to aggregate similar values together and show you the overall distribution. However, they can be sensitive to parameter choices! We’re going to take you step by step through the considerations with lots of data visualizations.
Hype or Not? Some Perspective on OpenAI’s DotA 2 Bot
The OpenAI article I linked last week churned up quite a storm in the geek community, where overlap in interests between gaming and AI is high. Apparently several pros were able to beat the bot consistently within six hours of its release. Here’s a lengthy thread on Hacker News about the topic.
OpenAI’s accomplishment is still impressive, but its work in this type of real-time, collaborative, informationally-obscured environment is still very early.
Currently, the three primary cloud analytic database platforms (Redshift / Snowflake / BigQuery) use CPUs. Other data-intensive applications have made the switch to GPUs to take advantage of their superior parallel processing, but this change is only beginning in the world of analytic databases.
Several companies have begun to play in this space; my hope is that the tech gets incorporated into an offering from AWS or GCP. This represents real opportunity for a decrease in query response times.
The Data Journalism Awards are the first international awards recognizing outstanding work in the field of data journalism worldwide.
Much of this work is really stunning. Especially worth a look is the WSJ piece on lyrical styles in Hamilton.
www.datajournalismawards.org • Share
The Top 100 Medium Writers on AI
My aim with this research is to allow me to quickly find the most relevant and appreciated articles [on AI], so that I can improve my knowledge about the subject, without having to read 100 articles to find the 4–5 of them that are interesting…
This is a solid piece of data journalism, an interesting new open data set to play with, and a great index of content to peruse if you’re just getting into the space. Chris Dixon’s posts, in particular, are foundational.
Data viz of the week
It’s always satisfying when you find the very best version of a thing. This entire site is the most informative presentation I’ve ever seen of global arms trade information. Click around—it’s worth it.
Thanks to our sponsors!
Fishtown Analytics: Analytics Consulting for Startups
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Stitch: Simple, Powerful ETL Built for Developers
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123