Junior Analyst >> Senior Analyst. Optimizing hyper-parameters. Programming ML. GPUs in Databases. [DSR #130]

A bit shorter than usual! Honestly the past week was a bit quiet, so I kept this issue focused on just the good stuff.


- Tristan

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

The Week's Most Useful Posts

One Analyst’s Guide for going from Good to Great

Once you’ve mastered the basics—SQL, Excel and your business intelligence tool—it can be really hard to figure out how to increase analytics skills. For many analysts this can lead to frustrating periods where you know you need to level up, but aren’t sure exactly what to focus on next.

If you’re a hotshot junior analyst and you want a guide for breaking through this skills plateau, this guide is for you.

Data Science Roundup reader Jason Ganz just wrote a massive treatise on his process of leveling up as a senior analyst at a mid-stage SaaS startup. The lessons in this post span from technical to team dynamics to managing the politics of your organization, and it’s as good of a guide as I’ve ever seen to managing that critical part of your professional development.

If you have 5 seconds, I’d really appreciate it if you could click through and give the post a couple of 👏👏 on Medium. Thanks!


Andrew Ng


Get a free draft copy of my book on how to structure Machine Learning projects: https://t.co/TgvKPa033r I’d started this before but got distracted building Deep Learning Specialization; I’m now rebooting this. Sign up to get free chapters as they’re released!

1:28 PM - 4 Apr 2018

On Machine Learning and Programming Languages

I just came across this post from a couple of months ago, and wow. It’s from the folks who make Julia, and they’ve spent plenty of time thinking about the intersection of ML and programming languages. The authors believe that:

  1. ML “libraries” like TensorFlow are better thought of as actual programming languages,

  2. current ML languages constrain the types of workloads that can actually be expressed, and

  3. language design will need to make significant steps forwards for the field to accomplish its aspirations.

Here’s the final paragraph:

Can we build systems that treat numerics, derivatives and parallelism as first-class features, without sacrificing traditional programming ideas and wisdom? This is the foundational question which languages over the coming decade will have to answer.


Hyper-parameters in action!

This is the first of a series of posts aiming at presenting in a clear, concise and as much visual as possible fashion, some of the fundamental moving parts of training a neural network.

This is an awesome post. If you’re not really sure how to choose your learning rate or mini-batch size, this is for you. If you’re not sure what either of those terms mean, this is a very accessible introduction.


Does GPU Hardware Help Database Workloads?

I’ve covered GPUs in analytic databases here in the past. This post, by a senior PM at Oracle, explains that we’re not seeing GPUs in analytic databases because the workloads simply don’t align that well:

The huge number of parallel computation engines provided by these devices excel at accelerating tasks that require large numbers of computations on small amounts of data. GPUs are extremely effective for Blockchain applications because these require billions of computations on a few megabytes of data. GPUs are great for deep learning since these perform repeated computational loops on megabytes to gigabytes of data. Analytics typically perform a small number of simple calculations on large amounts of data, often hundreds of gigabytes to petabytes of data.

The article goes much deeper. Fascinating read and instructive on the future of this space.


Top 20 Deep Learning Papers, 2018 Edition

This list ranks papers by citation, making it a pretty solid indicator of a paper’s importance in the field. Unsurprisingly, Yann LeCun, Yoshua Bengio, and Geoff Hinton’s 2015 paper is the most-cited, with more than double the citations of its closest rival.

It’s hard to know the exact dynamics that cause this, but it is notable that 9 out of the 10 most-cited papers were published in 2015. Was 2015 a particularly prolific year, or does it just take that long for the next round of research to make its way into subsequent citations?


Data Viz of the Week

This recent Economist graphic caught me. Its point: The US economy is strong today, but states’ revenue is struggling. Succinct view of a complex dataset.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123