Code and No-Code: Having it Both Ways

Why we should refuse to choose one or the other and why civilization hangs in the balance.

Mar 13, 2022

Episode 2 of Season 2 of the podcast just dropped, and it’s potentially my favorite episode yet. In it, Julia interviews our colleague Natty about the history of analytical databases, diving deep into HTAP databases that we covered recently. It’s a fascinating topic and Natty’s knowledge of it is absolutely encyclopedic.

Enjoy the issue!

- Tristan

We should be more open to no-code done well.

Sarah Krasnik gets 🌶️ this week in her post No Code is the Future:

Do we really despise no-code, or are we just frustrated at its limitations?

While I don’t agree with the title, I’m not sure that Sarah does either based on the contents of the post itself. I think, maybe, the actual strong statement is more like: No code will inevitably be a part of the future.

The value in the piece in that it asks us—the early adopters, the will-learn-anything-if-it-makes-us-more-effective types—to come down off our high horses and accept the world as it is. If we say that the only appropriate way to participate in certain activities is to do so via writing code, then we are inherently excluding the majority of humanity. While there is an absolute, undeniable trend over time for more humans to write code, this is a gradual process that will play out over decades, generations. Is it possible that future generations will look back at our rates of coding knowledge like we look at this chart?

I don’t know. Maybe coding isn’t as critical of a life skills as reading, but it’s also totally possible to make the counter-argument. Literacy clearly wasn’t that important for most humans in 1500, and I’m sure with a time machine you’d find plenty of intelligent humans laboring in fields and workshops who would’ve been confused as to why you thought they should know how to read.

Regardless: let’s just stipulate that

everyone isn’t going to learn to code tomorrow, and
reducing barriers to participation is always better.

With this framing, the focus then shifts from “code vs. no-code” to a more interesting set of questions:

strengths and weaknesses of code vs. no-code (when to prefer which),
how to deliver great no-code experiences that preserve the best parts of code workflows (especially PRs, CI/CD), and
how to integrate these two experiences together such that practitioners using both modalities can collaborate.

Language vs. Emojis

“Code” and “no-code” is not really the right framing for the strengths and weaknesses conversation. The right framing, IMO, is in the choice between using alphabetic languages vs. using hieroglyphics (emojis!) to express an idea.

In the English language (just to take the one I’m most familiar with), there are 26 letters. These 26 letters can be combined together using at least somewhat consistent phonetic rules to create words. Words can then be combined together to build sentences using the rules of grammar. Sentences can be combined into paragraphs and paragraphs into texts using rhetoric, argumentation, narrative, poetry, etc.

The entire process starts with those 26 letters, and the meaning we create from them is entirely combinatorial and rules-based. For simplicity let’s just imagine that it’s possible to combine every letter in sequence with each following letter. In practice, this doesn’t actually obey English phonetic rules, but it makes the math easier so let’s just go with it. With this assumption, the number of potential N-letter words one could make with 26 letters is 26^N, so there are potentially 141,000,000,000,000 10-letter english words.

And all of this is permissionless—anyone can create a new word. Apparently Shakespeare created 170 new English words! All you have to do is start using the word and teach others what it means—if they find it useful, they will start using it too! Someday the Oxford English Dictionary will add it (that’s when you know you’ve hit it big time), but the OED doesn’t define what is and is not a word, it’s just an index of the most commonly-used ones.

This system is quite literally what has enabled humans to create civilization. Everything we’ve built as a species, we’ve done so thanks to our ability to flexibly and permissionlessly communicate in this way.

Take emojis on the other hand. As of my writing, there are apparently 3,633 emojis approved by unicode and supported on modern devices. 3,633 is a lot more than 26! But they aren’t combinatorial in the same way—emojis represent what they represent, and if you want to represent a new concept, you have to lobby the Unicode Consortium, a centralized standards-setting body, to get your new emoji included.

The benefit of emojis is extremely clear: it’s way more natural and faster to say 👍 than “acknowledged” (more human, too!). I genuinely enjoy their addition to the standard lexicon. But you can’t create civilization on the back of emojis—they just aren’t expressive enough.

Let’s bring it back to code and user interfaces. To code is to use language—in exactly the same way that English is combinatorial and permissionless, our primary programming languages are as well. You can use the language as it is, or you can change it to suit your needs. You can arbitrarily combine unrelated ideas together whose originators had never before realized could be combined. You are in control, and your expressiveness is only bounded by your own creativity.

User interfaces, by definition, are constraining. They make the most common and straightforward ideas (like “👍”) extremely easy and ergonomic to express. But the inherent tradeoff in user interfaces is that they limit expressiveness to achieve this goal. That is ok! Just like emojis are useful, user interfaces are useful.

The real benefits of hieroglyphics is when they are combined with alphabetic language! This is what we have arrived at today in our electronic communication: we use an alphabetic language as the primary mode of information transfer and sprinkle in emojis to convey some specific ideas.

Combining Code and No-Code

So…how do we actually do this in data-land? This is actually something we’ve thought about for a long time. Here’s the simplest example of why.

Imagine you’re using dbt Docs. You look up a model, you learn whatever about it you were looking for, case closed. Then you realize that the description of one of the important fields in that table is missing. Being a good data citizen, you’d like to update it. What’s the right experience you should go through to do so?

Typing into the dbt Docs UI that you’re already in and hitting “submit”, or
pulling up your terminal, cloning master, branching, finding the right YAML file, updating it, committing your changes, and submitting a PR?

Friction matters. If you say that the only appropriate answer is #2, you’ll find that a) fewer people are able to do this behavior at all, and b) even people who are capable of this behavior choose to do it far less frequently. And when it comes to updating a shared information asset like this, you want as much crowdsourced participation as possible.

So: we must find a way for the answer to be #1, because otherwise we are confining ourself to a future where the data catalog is always out of date. The question is, how do we do this without sacrificing the good stuff that code gets us from a CI/CD and governance perspective?

I think the question kind of answers itself, and generalizes well to the entire data ecosystem: you have to create no-code experiences that read and write code. In the example above, when the user UX hits “submit”, what must happen behind the scenes is that a file in a repo is getting updated and a PR is getting automatically created for someone to review. This is the only way to unify code and no-code workflows; IMO it is the only path forwards here. This must become the way these systems work together.

Here are what I believe the requirements of a great no-code solution to be:

No-code systems actually have to, behind the scenes, write code. They have to anticipate that there will be two main modalities: code written by their tools, and code written by hand. Both modalities must be a part of the experience of using the product, even if the product itself doesn’t contain a code editor. The language ergonomics are a part of the product.)
No-code products must store their resulting code as a files in a filesystem. git, CI/CD, PRs, and all developer automation rely on code living in files, not traditional data stores that one would typically build a SaaS product on top of.
No-code products must integrate with git providers as the primary source of truth for the current state of the codebase.
Code-written-by-humans and code-written-by-no-code-tools have to be able to live side-by-side in the same repo sensibly. Better yet, human- and system-written code should be interchangeably edited by the other modality (cannot be “one-way”).
Code written by the no-code experience has to be readable by humans so that PRs can be meaningfully reviewed.

Few no-code tools are built like this today. Notably, Looker is! Looker’s charts can fully sync backwards-and-forwards into code and it’s magical. I’m excited about a future where this is commonplace—I’d love to get the good stuff no-code promises without making untenable tradeoffs.

To go on the record: I’m fully supportive of the idea that many parts of the dbt codebase could be usefully edited by no-code solutions. I’d love to build infrastructure to help folks build these experiences in line with the principles I outline above!

From elsewhere on the internet…

❄️ Benn discusses Snowflake’s $800m purchase of Streamlit in just a fantastic post. IMO (and I have 0 insider knowledge here) this read of the tea leaves is spot on—it’s the only way you can get to that kind of price tag.

My personal opinion is that the building of one or more “data app stores” is inevitable, and the dynamics are very similar to the iOS / Android mobile ecosystem. This is how you deliver computing to hundreds of millions or billions of humans—you create a technical framework and economic framework for to allow rapid iteration by large groups of developers. I’ll pause there because Benn’s post does the bull case for app stores more justice than I could in a summary.

I want to zoom in on exactly one thing in the article. If it is the cloud data platforms that build the data app stores, this is much more closed, more balkanized approach than the web frameworks (Rails, Django) approach discussed in the post. Rails does not tie you to any particular commercial relationship, and I cannot imagine a Snowflake app development framework that allows you to run your app on Bigquery. As I’ve written about before, I think that this is a future that members of the community should be concerned about; I would prefer to see this layer emerge in a cross-platform way. Seems like at least one community member agrees:

David Jayatillake @DSJayatillake

I'm much more enthusiastic about a @dbt_labs app marketplace sitting on the semantics layer, enabling apps to sit on any data warehouse that integrates with dbt.

Tightly coupling the app store with the cloud data platform has a ton of advantages and makes obvious sense for a lot of stakeholders (including customers) in the short term. Which direction this ultimately goes will be determined by the number of competitors in the compute layer over the long term, though: if there are 1-2-3 big Cloud Data Platforms in the same way that the whole world runs on iOS and Android, you could imagine developers targeting multiple distinct app stores. If there are 4+, this becomes untenable. This may ultimately be decided by market structure.

📈 Stefan at Vortexa wrote up a great review of their BI tool selection process. I don’t link to a lot of posts like this because, well…there are a lot of them. But this post was unique in its focus area: the team wanted to select an open source BI tool that it would host itself and so compared Superset, Redash, and Metabase. Fantastic resource—I personally have very limited experience in self-hosting these tools and learned a lot.

🤔 Colin Zima, ex-VP Product @ Looker, teases what he’s working on next:

At Looker, we saw how organizations can be transformed by making data accessible to more users. We learned what building a great company looked like, and participated in the birth of the modern data stack. But sometimes success creates challenges - there was always a little more ambition in Looker around the corner that we never got to tackle. Could we bridge all of Looker’s power and depth with the beauty and agility of the best consumerized SaaS? We believe that great software can make technical users 10x faster and get out of the way, and still provide the building blocks to create scale later.
There is an opportunity for a new paradigm that can provide direct value to individuals and small teams at a reasonable cost, but grow in complexity and power alongside the organization. A platform that can be the fastest way to ask the first 100 questions while optimizing cost and performance over time. A platform that is simple and intuitive, but extensible, customizable, and pushes work to the right partner in the ecosystem.

Call me curious! Stay tuned for more from this high-powered insider team that also includes Jamie Davidson (also sat in Looker’s VP Product seat for several years) and Chris Merrick, ex-CTO of Stitch.

❓: how do you get your data team members to focus on the important-but-small-and-often-annoying tasks that never seem to get done?
✅: “doc days”! Fantastic post by Emily Thompson.

💬 Ananth kindly responded to the questions I asked in my most recent issue! I think his post may have generated more questions for me than it answered though :P I’m going to take the conversation offline and will say more here if and when I feel like I have more interesting stuff to say on the one-DAG-vs-two-DAGs question.

Luke Ambrosetti

> I think that this is a future that members of the community should be concerned about; I would prefer to see this layer emerge in a cross-platform way.

This is definitely one of the things I am concerned about. I'm talking to a lot of developers of data apps, and while I generally agree that Snowflake is the innovator here, you miss out on a lot of the market by tying yourself exclusively to them (or Google, AWS, Azure, etc.)

At MessageGears, we've made our approach to just connect and run all operations via JDBC, but even Google/BigQuery has made that difficult because they don't support an official JDBC driver (we opted to integrate with their APIs instead).

We should be trying to encourage them to use a consitent framework, but I'm worried it might be too late at this point.

Expand full comment

The Analytics Engineering Roundup

Discussion about this post