NumPy, Pandas Walkthroughs, Plus a Googler's Practical Guide to Analytics 📊 📈

This week’s issue focuses on Python, with detailed walkthroughs of Pandas and NumPy. Also included: three articles on applied data science, including one Googler’s practical guide to analytics (very good).

Referred by a friend? Sign up.

Department of Pay-It-Forward

I edit the roundup in my nights and weekends—if you enjoy it, please do me a favor and take time to share. Many thanks! 🙏

- Tristan

Share via Twitter | Share on Facebook | forward this email

Department of Python

NumPy Tutorial: Data analysis with Python

NumPy is the grandfather of Python analysis libraries, providing many of the internals that other libraries like pandas at matplotlib make use of. While you may end up calling pd and plot more often than np, what you’re getting is often np under the hood. NumPy is a critical package for data science in Python-based environments, and this walkthrough is the best I’ve seen.


Pandas Tutorial: DataFrames in Python

Pandas is the key library for data manipulation in Python, and the Pandas dataframe, modeled off of the R dataframe is the key data structure. This post is a welcome find, as Pandas documentation is notoriously subpar and can make learning the library a real headache. Make sure to read the section on reshaping your data with pivot(), stack(), and melt().


A Few Questions About Open Source

One of the best parts of working in Python is just how big the community is. But when you’re making technology choices in your job, how do you know when it’s reasonable to rely on an open source package? This post helps you think through the tradeoffs in adding that new dependency.

While we’re talking open source, I love this recent post about organizing commercial open source communities. This is a topic that is only getting more important.


Department of Real-Life Applications

Practical advice for analysis of large, complex data sets

This post, originally written for internal consumption at Google, is absolute gold. in it, the author lays out specific technical, procedural, and social guidance for how to go about conducting analytics. My favorite quote: “Credibility is the key social value for any data scientist.” If you spend your days doing analytics, this is a must-read.


Ten Ways Your Data Project is Going to Fail

This reads a bit like a cranky old man telling all of the young whippersnappers what f***ups they are, but the advice is dead-on. My favorite part: “Assume all data is garbage unless it’s been used before.” Highly recommended.


A Day in the Life of a Data Engineer

Your ability to be effective as a data scientist is intimately connected to the work of the data engineers that support you, and it pays to understand what they do. This post is an excellent look behind the scenes of the day-to-day responsibilities, technologies used, and mindset of a data engineer on a world class data team.


Data viz of the week

% of white people. Simple, effective, instructive.

% of white people. Simple, effective, instructive.

Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, powerful ETL built for developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123