How to Become a Data Scientist. Plus, DOTA, Subways, Python as Poetry & more! [DSR #98]

So much good stuff this week! Enjoy! 😊

- Tristan

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

Two Posts You Can't Miss

How to Become a Data Scientist

This is the best “how to become a data scientist” post I’ve ever read (and I have an entire section of my Pocket just dedicated to this micro-genre of posts!). There are three reasons why it’s so brilliant:

  1. The author is a recruiter, not a data scientist, and is focused solely on hiring for data science positions. He/she talks to dozens of data scientists a day and the post reflects this broad view of the landscape.

  2. The post features interviews with practitioners in a wide range of roles, giving an up-close-and-personal look at the breadth of roles available.

  3. The post focuses on the motivations, not just the capabilities, of the best data scientists. This is so critically important. I love this quote from Dylan Hogg, Head of Data Science at The Search Party:

Regardless of education or experience, there’s something more fundamental, which is your nature of curiosity, determination and tenacity. There are so many times when you hit a problem: perhaps the algorithm isn’t performing in the way it needs to, or perhaps the technology is being a pain. Either way, you can study machine learning algorithms or software engineering best practice, but if you’re not really determined, you’re going to give up and not get through it.

If you or anyone you know is currently hoping to get into the field, this is a must-read.

medium.comShare

OpenAI: Bot Wins Dota 2 1v1

We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules. The bot learned the game from scratch by self-play, and does not use imitation learning or tree search. This is a step towards building AI systems which accomplish well-defined goals in messy, complicated situations involving real humans.

I’ll admit that I’ve spent probably more hours than I should have playing Dota, so this really hits home for me. Whereas Chess and Go are turn-based and therefore involve much more discreet choices, Dota is real-time: the entire game is one continuous stream of decisions. Real-time decision-making is obviously a critical step towards OpenAI’s goal of AGI.

OpenAI’s bot performance is now best in the world in a 1v1 setting, but Dota’s competitive scene is mostly focused around 5v5 teams. The next step is training a group of bots to perform at that level. The results have the potential to be truly fascinating—will bot teams play similarly to human teams? My bet is no.

While OpenAI is working on Dota 2, DeepMind is working on Starcraft 2. It’s a good time to be a fan of real-time strategy games.

blog.openai.comShare

This Week's Top Posts

Cargo Cult Data Science

This is the best article I’ve read in a while on organizational behavior and data science. Very highly recommended.

Data science is best viewed as a form of company culture, rather than a set of technologies. However, many firms will try to create that company culture by acquiring data-science technology, rather than working on their culture.

blog.richardweiss.orgShare

Craft Your Python Like Poetry

Code readability is a much bigger deal than many data scientists realize:

Python code is more like poetry than prose. Poets and Python programmers don’t wrap lines once they hit an arbitrary length; they wrap lines when they make sense for readability and beauty.

Especially important if you’re working on a team.

treyhunner.comShare

10 Significant Visualization Developments: January to June 2017

The Oscars of data viz. This season’s winners are impressive.

www.visualisingdata.comShare

What New York Subway Stations Actually Look Like

What New York Subway Stations Actually Look Like

Subway stations’ complex tunnel systems are a mystery even to most regular riders. Architect Candy Chan’s new X-ray maps demystify the paths in and around them.

Unique, very cool, mapping concept.

www.citylab.comShare

Facebook: Transitioning Entirely to Neural Machine Translation

…we recently switched from using phrase-based machine translation models to neural networks to power all of our backend translation systems, which account for more than 2,000 translation directions and 4.5 billion translations each day.

State of the art work on translation at scale. Worth a read even if you’re not directly working with NLP just to stay up-to-date on what is now achievable.

code.facebook.comShare

Dots vs. polygons: How I Choose the Right Visualization

If you’re mapping geographical data, would you use a dot density, choropleth, hexbin, or heatmap chart? I hadn’t thought much about this particular set of viz choices before reading this article and learned a lot from it. The author is a designer at Mapbox and likely spends more time thinking about mapping visualizations than almost anyone in the world.

blog.mapbox.comShare

Data Science: Challenges and Directions

This paper proposes an updated answer to the question “What is data science?” that focuses on its interdisciplinary nature:

Data science is a new trans-disciplinary field that builds on and synthesizes a number of relevant disciplines and bodies of knowledge, including statistics, informatics, computing, communication, management, and sociology, to study data following “data science thinking”.

It’s a fascinating, but dense, read. The results of this research will trickle out elsewhere…

cacm.acm.orgShare

Data viz of the week

Because sometimes the point is to have some fun :)

Because sometimes the point is to have some fun :)

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.

fishtownanalytics.comShare

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.comShare

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123