Data Science Roundup #48: Databases, Scalability, and Shipping Routes

This week is a deep dive into databases, programming, and scalability. Plus, a useful color palette tool and an impressive visualization of global shipping. Enjoy!

This week's best data science articles

In Search of Database Nirvana

This is a seriously intense discussion of the implementation details of databases, breaking down the functionality provided by both the query engine and storage engine. While much of this goes far beyond what you need to know to do your job, it’s incredibly helpful to understand what’s going on under the hood in a database. This is a long post, but well worth it.

www.oreilly.comShare

Scalable data science with R

With R, you’ll immediately run into problems when your dataset size exceeds the memory size on your local machine. At that point, you have three options: scale up, scale out, or using R as an abstraction layer. This post walks you through the decision.

www.oreilly.comShare

Filtering inappropriate content with the Cloud Vision API

Want to incorporate content filtering in your new app? Now it’s easy. Google’s Cloud Vision API can detect inappropriate content in images using the same machine learning models that power Google SafeSearch. Another very hard problem solved and packaged up as an API.

cloud.google.comShare

Python Packaging Is Good Now

If you use Python, you almost definitely use pip. But did you know the history behind Python package management? As someone who’s joined the Python community relatively recently, this was an interesting history lesson to me. To others, it’ll be a fun walk down memory lane :)

glyph.twistedmatrix.comShare

Teaching Python and R to Work Together

Ever run into a package that exists in R but not in Python? Or vice versa? This article goes through a package development technique that is becoming increasingly common for major packages: write the underlying implementation in C and then develop language bindings in both Python and R. This article won’t make you a C developer, but it’s a useful technique to understand.

civisanalytics.comShare

Colorgorical - Easy Color Palettes

This tool is an easy way to come up with color palettes for non-designers. It gives you a ton of configuration options and outputs color values. Easy. Now please, stop making all of your charts in the same colors that Excel chose as defaults in 1997.

vrl.cs.brown.eduShare

Data viz of the week

Click on the image for an interactive world shipping map. Very impressive.

Click on the image for an interactive world shipping map. Very impressive.

Thanks to our sponsors!

Fishtown Analytics

Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.

fishtownanalytics.comShare

Stitch

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.comShare

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123