Data Science Roundup #63: 5 Brand New AWS Data Products, plus Intel's AI Efforts & more!

This week features a deep dive into recent announcements made by Amazon: don’t miss these new products. Also: building the data team @ Monzo, automated vs. learned features, Intel’s commitment to AI, and some sweet Westworld analytics to prep you for the season finale.

If you’ve been sent this newsletter by a friend, do me a favor and sign up. It’s your subscriptions that keep The Data Science Roundup growing!

Thanks :D

- Tristan

Spotlight on AWS

AWS re:Invent 2016 took place this past week and—wow—there are a lot of new products that have the potential to impact the way we all work with data. Here are just the highlights:

  • Amazon Athena is a direct competitor to Google BigQuery—a query engine as a service. Will it actually be as compelling as BQ? Unclear.

  • AWS Glue is an ETL service that extracts data from any JDBC-enabled source, has a transformation engine, and loads into AWS destinations. Crowded space, but always useful to have another option.

  • AWS Batch lets you “easily and efficiently run hundreds of thousands of batch computing jobs”.

  • Amazon Rekognition is image recognition out of the box, built on the same tech Amazon uses for its own products. The use cases proposed at the bottom of this page demonstrate just how useful this could be.

  • Amazon Polly turns text into speech. Building a data product that predicts conversational responses? Feed those responses into Polly and you now have an IVR system.

Algorithms are increasingly being abstracted into cloud services, meaning that data scientists have access to increasingly powerful tools without needing to understand the details of how each individual model was tuned. This leverage is critical: it acts as a multiplier for what each individual data scientist can accomplish.

Also not to be missed: How Google is Challenging AWS. The cloud platform wars are the hottest front of competition in tech right now.

This week's best data science articles

Laying the Foundation for a Data Team @ Monzo

Data Science Roundup reader Dimitri Masin runs the data team @ Monzo, a growth-stage consumer banking startup. This post details several core features of the Monzo data team, diving particularly deeply into their data infrastructure. If you’re interested in seeing what a mature data infrastructure looks like, this is a great read.

Oh, and they’re hiring. Link at the bottom of the post.


Feature Engineering is Just Easier

Great post on the choice between manual and automatic feature engineering. “In the upper stratosphere of academic and industrial machine learning, deep learning has almost entirely taken over, but it’s no accident that the field is dominated by a few large companies, and almost everyone involved has a PhD from one of a handful of programs.” But, “sometimes, feature engineering is just the correct, economical choice.”


Intel Unveils Strategy for State-of-the-Art Artificial Intelligence

Intel is all-in on AI: expect the battle between the Intel and Nvidia to heat up. Here are some choice quotes:

Intel aims to deliver up to 100x reduction in the time to train a deep learning model over the next three years compared to GPU solutions.

Intel sees AI transforming the way businesses operate and how people engage with the world.

The more investment dollars flow into this space, the faster the future becomes the present.

Westworld Data Visualization

For me, Westworld began as a curiosity but quickly became an obsession. Fortunately, there are some folks at Mode Analytics who are even more obsessed than I am: at a recent hackathon, they scraped scripts from each episode, hand-tagged each line with the appropriate character, and came up with some great analysis. I think I’m finally prepared to watch the season finale.


Data viz of the week

Amazing pairing of narrative storyline with custom viz.

Amazing pairing of narrative storyline with custom viz.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, powerful ETL built for developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123