BigQuery Omni. WFH --> Self-Service Analytics. Data Science Management. The Rise of Data Ops. [DSR #230]

❤️ Want to support this project? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

This week's best data science articles

BigQuery Omni for multi-cloud data analytics

BigQuery Omni, powered by Anthos, lets you analyze data in Google Cloud, as well as AWS and Azure (coming soon). It’s multi-cloud data analytics for the modern age.

Wowowow. This is a big deal…IMO it’s one of the most meaningful product announcements to come out about the modern data stack in the past year. As a practitioner, one of the things I find the most frustrating is when choice of cloud provider dictates choice of tooling. I’ve frequently worked with clients who have made decisions not to use BigQuery because of their existing AWS investments, and that is often a completely understandable (if unfortunate) decision. Omni changes that.

We’ve seen (via anonymous dbt usage stats) BigQuery adoption gaining meaningful market share over the past 12 months—my guess is that it’s one of GCP’s most strategic services. And I’ll be watching those graphs in the coming months…I wouldn’t be surprised if this accelerates that existing trend.

While we’re talking about BQ Omni, Stephen at Redmonk has very smart things to say. Is this an indication of GCP’s larger strategy as it competes with AWS and Azure? I would welcome that. I’m not into these walled gardens with egress charges acting as a defensive moat.

cloud.google.comShare

How WFH Has Reshaped Our Work Behaviors

How WFH Has Reshaped Our Work Behaviors

The part of this article that I’m most interested in is here:

One of the major goals of the Data & Analytics team, when it was founded in 2016, was to “democratize data at BetterCloud.” We started a long journey of extracting data from all the applications we use into a centralized database, and then layering Tableau dashboards on top to allow employees in different verticals to fish for themselves.

Tableau usage data indicates a strong upward trend in usage, but remote work has accelerated the trend. Our thought is that people are now unable to tap their neighbor on their shoulder to ask questions, and instead would have to message them and wait for a response. People will try to be self-sufficient and solve problems on their own. To that end, we saw a 67% increase in the number of monthly users from January to April 2020.

This is incredibly cool. It mirrors my gut instinct on the topic given what I’ve observed growing a company from 15 to 40 team members during Covid-enforced WFH. Shoulder-tapping and informal communication is no longer an appropriate solution when all shoulders are geographically distributed. This is a very good thing—more self-service analytics is absolutely the future.

As in other technology trends that are being accelerated by the pandemic, we’re seeing a decade of progress in several months.

www.bettercloud.comShare

Katie Bauer

@imightbemary

What is the difference between an engineering manager and a data science manager? It's a question I find myself ruminating over almost constantly. There's tons of good thinking and writing about eng management out there, but I don't find that it always translates to the DS world.

1:58 PM - 5 Jul 2020

^^ amazing thread, highly recommended reading.

The Rise of DataOps (from the ashes of Data Governance)

…I don’t even know how to summarize this. It’s a very “long view” post—it zooms way out and makes a comparison between software engineering and data analysis. Essentially, that source control management is the fundamental transition that allows the practice to go from hobby to profession. Source control management provides reproducibility, which is the core fundamental requirement of any engineering discipline.

That description does not, however, do the post justice. You should just…read it. 🙏🙏

towardsdatascience.comShare

The machine learning community has a toxicity problem

:(

Reddit thread, lots of comments, initial post is fantastic.

I hate to say it but much of this feels to me to be the all-too-predictable result of a still-nascent academic discipline that has become so important that national governments see it as a cold-war-style strategic priority and that shapes the market caps for the only trillion-dollar companies on the planet. Many academic norms are just that—norms—and they’re not necessarily designed to resist this level of external pressure.

I’m not suggesting that we should just throw our hands up and give up on these issues, but we should not be surprised that they exist, either. Recognizing their structural nature is an important part of coming to solutions.

The comments were mostly focused on the problems, not on solutions. If anyone has insightful solution-oriented thoughts here (or can point me to other writing) I’d love that.

www.reddit.comShare

Snorkel is a fundamentally new interface to ML without hand-labeled training data

Snorkel is a fundamentally new interface to ML without hand-labeled training data

The directness of rules with the flexibility of ML. Rule-based systems have long been used in industry for certain tasks—as an input, individual rules have the desirable property of being direct and interpretable. However, rules can also be brittle, and lack the robustness, flexibility, and sheer power of ML approaches. With Snorkel Flow, you get the best of both worlds: rules (and other interpretable resources) as inputs, and powerful ML models that generalize beyond these rules as the output.

This is a very interesting project out of Stanford AI labs. I was involved in building rules engine systems in the early 2000’s and know both how powerful and brittle they can be. The idea that you could start with a rules engine and then feed that into a neural net seems like both a) a good idea, and b) a good descriptor for how our own brains work.

Follow this.

www.snorkel.aiShare

Thanks to our sponsors!

dbt: Your Entire Analytics Engineering Workflow

Analytics engineering is the data transformation work that happens between loading data into your warehouse and analyzing it. dbt allows anyone comfortable with SQL to own that workflow.

getdbt.comShare

Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.

www.stitchdata.comShare

By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123