Discover more from The Analytics Engineering Roundup
Managing a Data Science Team. Python in Shiny. The Fall of RNN and LSTM. [DSR #132]
❤️ Want to support us? Forward this email to three friends!
🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.
The Week's Most Useful Posts
Oh snap. This is absolutely the must-read of the week—really great stuff. I’ll let the author intro the post himself:
In 2014, I joined a small team at Schibsted Media Group as the 6th Data Scientist in the organisation. Since then, I’ve worked on many data science initiatives in an organisation that now houses 40+ Data Scientists. In this post, I’ll go through some of the things I’ve learned over the last four years — first as Data Scientist and then as Data Science Manager.
This post follows the example of Robert Chang and his excellent “Doing Data Science at Twitter” — an article that I found hugely valuable when I first read it back in 2015. The objective of my own contribution is to provide equally useful reflections for Data Scientists and Data Science Managers around the world.
This is super-cool. From RStudio:
We are pleased to announce the reticulate package, a comprehensive set of tools for interoperability between Python and R. The package includes facilities for:
Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session.
Translation between R and Python objects (for example, between R and Pandas data frames, or between R matrices and NumPy arrays).
Flexible binding to different versions of Python including virtual environments and Conda environments.
My favorite usage so far: building Shiny apps that include Python!
The Indeed data science team did some analysis on their user behavior data for people who searched on “data science”. The graphic above summarizes the findings very nicely: roles in data divide into five buckets, and searchers seem to use the term “data science” as an entry point to all of them.
The role breakdown maps to my own anecdotal experiences about how cutting-edge teams are organizing themselves. I personally question the BI Developer vs. BI Analyst dichotomy—I’m not sure if long-term those need to be different people.
Note: The original title of this post, “There’s No Such Thing as a Data Scientist” is super-clickbait-y and doesn’t reflect the core insights of the post itself. FTFY.
Wes McKinney builds the future. He’s the creator of pandas, and has spent the past several years working on Apache Arrow. This article is a bit about his journey, but more about what he’s building next: Ursa Labs. In short, Ursa Labs’ goal is to build an entire data science ecosystem on top of Arrow. Pretty freaking ambitious.
I’ll leave it to him to explain the details. I find Wes’s vision of the future to be very compelling. Apparently Hadley Wickham does too; he’s signed on as a technical advisor.
There are a million “tips for beginners” posts, and I almost never link to them. This one is really excellent, though, and I thought it was worth sharing. My favorites: “Build something, anything” and “Contribute to open source”. Foundational.
Whether you’re currently learning Python, SQL, Scala, or TensorFlow, this article is equally useful.
Attention based networks are used more and more by Google, Facebook, Salesforce, to name a few. All these companies have replaced RNN and variants for attention based models, and it is just the beginning. RNN have the days counted in all applications, because they require more resources to train and run than attention-based models.
This post on Towards Data Science made the rounds this week. I hadn’t heard of attention-based networks before, but this post on WildML does a good job of explaining them:
Attention Mechanisms in Neural Networks are (very) loosely based on the visual attention mechanism found in humans. Human visual attention is well-studied and while there exist different models, all of them essentially come down to being able to focus on a certain region of an image with “high resolution” while perceiving the surrounding image in “low resolution”, and then adjusting the focal point over time.
This is an interesting topic. I’ll keep a lookout for future posts; send anything my way that you come across.
I wouldn’t normally link to this post, as it’s a “big think” piece on AI, society, etc. (I typically try to stay a little closer to the ground.) But it’s gotten 10k claps on Medium and has clearly hit a nerve.
I 100% agree with the author’s point: the public AI conversation is overly-focused on human-level general intelligence and misses other more pragmatic and potentially just as important stories.
This online tool will help you see how your color palette will look for people with various different types of color blindness. Useful: run your own palettes through.
Thanks to our sponsors!
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123