GPU Databases(!!) Analytics Engineering. ML Project Management. NLP. [DSR #172]

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

Introducing AresDB: Uber’s GPU-Powered Open Source, Real-time Analytics Engine

Uber just released an open-source, GPU-based, streaming, in-memory analytical database. This could be a very big deal.

The technology in the BI ecosystem has been re-built over the past 7 years since the advent of the cloud-based MPP database. If Ares or another similar technology got traction it could lead to a fundamental shift in tooling, in user expectations, etc. People have talked about deploying GPUs in analytical databases before, but this is the first and only example of a scale-out deployment to my knowledge.


The Analytics Engineer

Michael Kaminsky of Locally Optimistic is proposing a new role on the data team: the Analytics Engineer.

The analytics engineer sits at the intersection of the skill sets of data scientists, analysts, and data engineers. They bring a formal and rigorous software engineering practice to the efforts of analysts and data scientists, and they bring an analytical and business-outcomes mindset to the efforts of data engineering. It’s their job to build tools and infrastructure to support the efforts of the analytics and data team as a whole.

I wholeheartedly agree. If you are (or want to be!) an analytics engineer, I highly recommend you join 1,300 other folks like you in dbt Slack.


Data Scientist: A Hot Job That Pays Well

Data Scientist: A Hot Job That Pays Well

Since December 2013, data science postings have rocketed 344% – more than quadrupling. Houston, San Francisco offer the best salaries for data scientists.

Some good quick stats on the growth of the field.


Why are Machine Learning Projects so Hard to Manage?

I’ve watched lots of companies attempt to deploy machine learning — some succeed wildly and some fail spectacularly. One constant is that machine learning teams have a hard time setting goals and setting expectations. Why is this?

Couldn’t be more dead-on. He not only describes the challenges, he also provides useful antidotes at the end of the post.


Google AI Blog: Transformer-XL: Unleashing the Potential of Attention Models

Google AI Blog: Transformer-XL: Unleashing the Potential of Attention Models

Transformer-XL obtains new state-of-the-art (SoTA) results on a variety of major language modeling (LM) benchmarks, including character-level and word-level tasks on both long and short sequences.

Short read—walks through a new NLP technique in a (fairly!) accessible way.


Thanks to our sponsors!

Mode Studio: A Free Toolkit for Every Analyst

Mode Studio combines a SQL editor, Python & R notebooks, and a visualization builder in one platform. And it’s free forever. Connect data from anywhere and analyze with your preferred language. Build custom visualizations or use our out-of-the-box charts.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123