Marketing Analytics @ Instagram. PhD: Yes or No? Knowledge Dissemination @ Uber. [DSR #151]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This Week's Most Useful Posts

What You Need to Know Before Considering a PhD

Question: I’m an undergrad student passionate about machine learning, and I feel a bit of pressure to get a PhD. Would it maybe make more sense to go into industry for a couple years and then consider going back to school? Any advice you have would be greatly appreciated.

The author—herself a math PhD—is quite negative on the prospect of you, an aspiring data scientist, getting a PhD. PhDs continue to be very common in data science today, and so conventional wisdom can tend to push junior practitioners towards feeling that they need the stamp in order to get the job they want. This post provides an incredibly important counter-perspective.

I’ve personally considered this decision as well and decided to go down the “no” branch. Happy to talk about my reasoning with anyone who happens to be thinking through their own decision today.

A Day in the Life of a Marketing Analytics Professional

Chris Dowsett, Head of Marketing Analytics at Instagram, talks about a day in his life. It’s a textbook example of a high-impact use of data within marketing: using data to direct people and dollars towards what works.

Frequently, analytics professionals are nervous about working in marketing because of the historical associations of drudgery. Downloading CSVs, creating reports in Excel, rinse, repeat. But this isn’t what the job looks like today at best-in-class companies: modern marketing is one of the most fruitful and challenging domains for the application of data analysis.


Databook: Turning Big Data into Knowledge with Metadata at Uber

Databook, Uber’s in-house platform for surfacing and managing contextual metadata, makes dataset discovery and exploration easier for teams across the company.

This is insanely cool. One of our beliefs recently from talking to hundreds of dbt users is that data discovery and knowledge dissemination has become a major problem at the most data-forward of organizations. The core problem is that with thousands of tables available in a data warehouse, it becomes harder and harder for data consumers to know exactly where exactly to go for information and to understand its provenance.

This post from Uber goes deep into Databook, their in-house solution for this problem. At the moment it doesn’t seem like Uber is planning on open sourcing Databook, but it’s still a fascinating example of what the most data-forward organizations are investing in.


On Software Tooling

A company is not data-driven by using a BI tool.

A data-science team is not agile by using a ticketing system.

Good reminder. Short read.


SQL Interview Questions for Data Analysts

The most feared part of the hiring data analyst/scientist process is the technical screening. Here are 3 SQL interview questions to practice.

Or, if you’re hiring, here are 3 example interview questions to give to your candidates.

Really, I wouldn’t recommend using exactly these questions, but I do highly recommend giving technical screens for analyst positions. Frequently data scientist positions get hit by massive tech screens, but analyst positions get almost none. This should be fixed: analysts need a baseline level of technical aptitude before starting on day 1, and 100% of those skills can be learned online via excellent free classes today. The tech screen is almost more about motivation than anything—if the candidate cares about the position they can learn the skills required to pass the screen in a small # of hours.

We do this for 100% of our applicants.


The Coolest Things I Learned at JupyterCon

I’m freshly back from JupyterCon in NY and still feeling the bubbly optimism that comes with bringing all you’ve learned at a conference back to your office. In that spirit, I wanted to share some of the coolest and most interesting things I learned with you all.

Awesome conference review. Tons of good tidbits, plenty of links to learn more.


Research Computing Times #1 – August 2018

This brand new series is published by long-time blogger, Mike Croucher, Head of Research Computing at University of Leeds. My favorite post he links to in the first issue is titled “Botched code causes seven-year scientific argument”:

Long story short, two groups were investigating what happens when you super-freeze water. They disagreed and much shouting happened for 7 years. There was a bug in the code of one group.

Heh, wow. What an advertisement for the importance of reproducibility. If your colleagues can’t assure themselves of the quality of your work, they’re unlikely to use it.

Stay tuned to Mike’s blog for future issues.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to build analytics teams. Whether you’re looking to get analytics off the ground after your Series A or need support scaling, let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123