This week’s issue focuses on Python, with detailed walkthroughs of Pandas and NumPy. Also included: three articles on applied data science, including one Googler’s practical guide to analytics (very good).
Referred by a friend? Sign up.
Department of Pay-It-Forward
I edit the roundup in my nights and weekends—if you enjoy it, please do me a favor and take time to share. Many thanks! 🙏
Department of Python
NumPy is the grandfather of Python analysis libraries, providing many of the internals that other libraries like pandas at matplotlib make use of. While you may end up calling pd and plot more often than np, what you’re getting is often np under the hood. NumPy is a critical package for data science in Python-based environments, and this walkthrough is the best I’ve seen.
Pandas is the key library for data manipulation in Python, and the Pandas dataframe, modeled off of the R dataframe is the key data structure. This post is a welcome find, as Pandas documentation is notoriously subpar and can make learning the library a real headache. Make sure to read the section on reshaping your data with pivot(), stack(), and melt().
One of the best parts of working in Python is just how big the community is. But when you’re making technology choices in your job, how do you know when it’s reasonable to rely on an open source package? This post helps you think through the tradeoffs in adding that new dependency.
While we’re talking open source, I love this recent post about organizing commercial open source communities. This is a topic that is only getting more important.
Department of Real-Life Applications
This post, originally written for internal consumption at Google, is absolute gold. in it, the author lays out specific technical, procedural, and social guidance for how to go about conducting analytics. My favorite quote: “Credibility is the key social value for any data scientist.” If you spend your days doing analytics, this is a must-read.
This reads a bit like a cranky old man telling all of the young whippersnappers what f***ups they are, but the advice is dead-on. My favorite part: “Assume all data is garbage unless it’s been used before.” Highly recommended.
Your ability to be effective as a data scientist is intimately connected to the work of the data engineers that support you, and it pays to understand what they do. This post is an excellent look behind the scenes of the day-to-day responsibilities, technologies used, and mindset of a data engineer on a world class data team.
Data viz of the week
% of white people. Simple, effective, instructive.
Thanks to our sponsors!
Fishtown Analytics works with venture-funded startups to implement Redshift, BigQuery, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123