10 Statistical Techniques You Need to Know. Full Stack Data Science. Migration. [DSR #111]

Nov 12, 2017

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

Feature Visualization: How Neural Networks Build Up Their Understanding of Images

There is a growing sense that neural networks need to be interpretable to humans. The ﬁeld of neural network interpretability has formed in response to these concerns. As it matures, two major threads of research have begun to coalesce: feature visualization and attribution. This article focusses on feature visualization.

This is a fascinating read with some great images. It illustrates just how early we are at understanding the behaviors of neural networks and the cutting edge research that’s going on to push us forwards.

distill.pub • Share

The 10 Statistical Techniques Data Scientists Need to Master

While having a strong coding ability is important, data science isn’t all about software engineering. I personally know too many software engineers looking to transition into data scientist and blindly utilizing machine learning frameworks such as TensorFlow or Apache Spark to their data without a thorough understanding of statistical theories behind them.

Couldn’t agree more. This post is both an excellent index of techniques as well as a very readable introduction to each of them.

Highly recommended.

towardsdatascience.com • Share

6 Books Every Data Scientist Should Keep Nearby

Solid list. I need to read Andrew Ng’s book.

www.kdnuggets.com • Share

The 7 Kinds of Data Visualization People

Data visualization practitioners are a motley group, and while no two may look exactly alike, they all fall into one of 7 distinct categories.

This is very amusing. You will identify with at least one of the categories and will likely have made fun of people who come from several others.

Sorry for three listicles in a row.

medium.com • Share

From Data to Deployment: Full Stack Data Science at Indeed

In this talk, we walked through an actual Indeed data science full-stack model building process: labeling data, performing analysis, generating features, building the model, validating the model, building infrastructure, deploying the model, and monitoring the solution.

engineering.indeedblog.com • Share

Analyze the Migration of Scientific Researchers

This is both an excellent example of visualization and a critical piece of work for thinking about the future of data science and AI. The places where researchers migrate will become the centers of gravity for the industry.

towardsdatascience.com • Share

Salesforce Research: Fully-Parallel Text Generation for Neural Machine Translation

So far all text generation models based on neural networks and deep learning have had the same, surprisingly human, limitation: like us, they can only produce language word by word or even letter by letter. Today Salesforce is announcing a neural machine translation system that can overcome this limitation, producing translations an entire sentence at a time in a fully parallel way. This means up to 10x lower user wait time, with similar translation quality to the best available word-by-word models.

An impressive example of rebuilding an entire algorithm bottom-up to be parallelized. This is a major step forwards in a very hot research area.

einstein.ai • Share

The 10 Essential Rules of Dimensional Modeling

This article is really old (2009!), but I’m including it as a shout out to an awesome conversation that happened over in dbt’s Slack. A dozen-ish people weighed in on their data modeling practices, whether they strictly adhered to Kimball (most people) or think that Kimball contains useful concepts but needs an update (me).

If you do any work in a data warehouse, you need to at the very least be familiar with Kimball dimensional modeling concepts, and if you’re interested in talking with other people obsessed with this stuff, join us in Slack channel #modeling :)

www.kimballgroup.com • Share

Mapping street-level air quality across California

Google is experimenting with collecting air quality from its street view cars. What other map layers can we hope for? As more of this data becomes API-accessible, use cases abound.

blog.google • Share

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.

fishtownanalytics.com • Share

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post

Ready for more?