Delivering Analytics on Time. WePay Waltz. DS/DE Collaboration. Data Science is Boring(??) [DSR #198]

Sep 22, 2019

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

Deliver Your Analytics Projects on Time

Something I don’t hear talked about enough in the data space is delivering work in a timely way. Software engineers spend a tremendous amount of time attempting to predict and control delivery timelines; why don’t data professionals do the same? There seems to be consensus that predicting the time required of a data project is inherently impossible because it’s fundamentally an act of discovery.

I think this is false. I’ve personally delivered hundreds of analytics sprints, and I’ve trained others to do the same. We deliver almost all of these sprints (95%+) on time. I think that setting clear timelines for the delivery of analytics work is critical in building the data team into a trusted advisor for business stakeholders, and in this post I share our thinking on how we do it. The core of our approach is:

Use Agile
Eliminate as much uncertainty as possible before writing stories.

This is not rocket science—you can easily apply it in your org.

blog.getdbt.com • Share

WePay | Waltz: A Distributed Write-Ahead Log

Very interesting!

We are happy to announce the open source release of Waltz. (…) Waltz is what we describe as a write-ahead log. This recorded log is neither the output of a change-data-capture from a database nor a secondary output from an application. It is the primary information of the system state transition. This is different from a typical transaction system built around a database system where the database is the source of truth. In the new model, the log is the source of truth (the primary information), and the database is derived from the log (the secondary information).

This is similar to Kafka in concept, but Waltz provides different guarantees and can therefore be used for different applications:

Waltz is similar to existing log systems like Kafka in that it accepts / persists / propagates transaction data produced / consumed by many services. However, unlike other systems, Waltz provides a machinery that facilitates a serializable consistency in distributed applications.

This just came out. Could be a big deal.

wecode.wepay.com • Share

OpenAI | Emergent Tool Use from Multi-Agent Interaction

You should absolutely watch all of the GIFs in this post. Really fascinating, and quite entertaining. I do not think I would’ve thought of the “box-surfing” strategy—kudos, agents.

openai.com • Share

Artificial Intelligence Podcast | Lex Fridman

This podcast was started last year but I’m just now coming across it. The host is a research scientist @ MIT, and every episode features a guest you’ll likely have heard of (I’m a bit surprised just how consistently famous these folks are!). It’s queued up for my commutes for the coming week, give it a try and let me know how you like it!

lexfridman.com • Share

The Data Tooling Market in 2019

An in-depth overview of the data tooling market by an investor in the space. If you’re new to the data tooling ecosystem, this is a great place to get started.

medium.com • Share

An engineer’s perspective on engineering and data science collaboration for data products

At Coursera, we’ve built data products whose missions range from facilitating better content discovery to scaling learner interventions to benchmarking learners’ performance of various skills. Each data product is a collaboration among product leaders, business leaders, data scientists, and engineers. Effective data products need effective collaborations between data scientists and engineers.

I think this is one of the hottest areas in all of data: getting the various members of the data team (analysts, engineers, and scientists) collaborating together. And this is one of the best posts I’ve seen at encouraging that collaboration.

👍👍

medium.com • Share

Data Science is Boring

Most young Data Scientists expect to spend most of the time tinkering with and building fancy ML models or presenting ground-breaking business insights with colorful visualizations. Sure, these are still part of the job.

But, as enterprises got more educated, they focus more on real operation values. This means enterprises want to deploy more ML systems; they care less about how many new models or fancy dashboards they have. As a result, Data Scientists are asked to do the non-ML works. This drives boringness.

I agree with this. I would expand on a bit, though, to say that all jobs will seem boring if you expected you’d only have to do the fun parts. There’s plenty drudgery in even the best jobs, and I’m saying that as someone with a pretty cool one ;)

We’re not doing junior data scientists any credit if we create unrealistic expectations for them (and there is now an entire commercial industry incentivized to do just that).

towardsdatascience.com • Share

Contextual cues = clear narrative! Tell me what I should think.

Thanks to our sponsors!

dbt: Your Entire Analytics Engineering Workflow

Analytics engineering is the data transformation work that happens between loading data into your warehouse and analyzing it. dbt allows anyone comfortable with SQL to own that workflow.

getdbt.com • Share

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.com • Share

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

915 Spring Garden St., Suite 500, Philadelphia, PA 19123

The Analytics Engineering Roundup

Discussion about this post

Ready for more?