Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

panproto

panproto is a toolchain for evolving schemas: defining them, migrating data across versions, translating between protocols (JSON Schema, Protobuf, GraphQL, Avro, ATProto, SQL DDL, and many more), versioning the whole thing the way git versions source, and round-tripping data through bidirectional transforms whose laws are mechanically checked.

This documentation is organised in four quadrants, following the Diataxis framework. Pick the one that matches what you are trying to do.

Tutorials

Learning by doing. Pick this if you are new and want a guided sit-down with a working example at the end. No prior knowledge of category theory or schema theory is assumed.

How-to guides

Recipes for specific tasks: defining a schema in your language of choice, building a migration, wiring breaking-change detection into CI, querying instances, parsing full ASTs, bridging panproto’s version control to git. Pick this when you know what you want to do and need the steps.

Reference

Authoritative listings: every CLI subcommand, the SDK surfaces for Rust, TypeScript, and Python, the protocol catalogue, the expression-language builtins, the lens combinator algebra, the panproto.toml schema. Tables and signatures, no exposition.

Explanation

Why the system is shaped the way it is. What schemas, migrations, lenses, and merges mean under the hood. The denotational semantics of panproto’s three DSLs, and a precise account of which categorical properties the implementation mechanically verifies. Most pages here are accessible to working developers; the semantics/ cluster is the place where the math is load-bearing.

Where to start

You are…Start at
New to panprotoYour first schema
Adding panproto to an existing projectInstall
Looking up a CLI flagCLI reference
Wondering what schemas-as-theories meansSchemas as theories
Building a custom protocolBuild a custom protocol
Setting up CI gatesBreaking-change gate

Tutorials

Tutorials walk through a complete example end-to-end. They assume no prior knowledge of panproto and no category-theoretic background. Each one ends with a working artifact you can run and modify.

If a tutorial mentions a concept by name without explaining it (a lens, a protocol, a migration), it links forward to the explanation page that defines it. You do not need to follow those links to finish the tutorial.

TutorialWhat you buildTime
Your first schemaA typed schema for a small data model in TypeScript, validated against a protocol~10 min
Your first migrationA migration that adds a field, renames another, and lifts existing data forward~15 min
Schema version control basicsA schema repository with branches and a merge~20 min
Cross-protocol translationA JSON Schema converted into a Protobuf definition with data round-tripped through both~20 min

Your first schema

You will define a schema for a small data model (users with names and ages), validate it against the atproto protocol, and load some data through it. About ten minutes.

By the end you will have: a working panproto setup, a schema you wrote, an instance of that schema parsed from a JSON file, and a sense of how the four pieces (protocol, schema, instance, validation) fit together.

We use atproto because it is the most fully-built-out protocol in the current registry; the same code shape applies to any of the protocols listed by Panproto.listProtocols().

No prior knowledge of category theory or schema theory is assumed. We use ordinary words for everything; if you want the formal treatment of any concept, the explanation chapters are linked at the end.

Setup

Pick a language. The walkthrough uses TypeScript; the Python and Rust versions are at the bottom.

mkdir my-first-schema && cd my-first-schema
npm init -y
npm install @panproto/core

Step 1: load a protocol

Create src/main.ts:

import { Panproto } from '@panproto/core';

const p = await Panproto.init();
const proto = p.protocol('atproto');

console.log('protocol:', proto.name);

Run it: npx tsx src/main.ts. You see protocol: atproto (using Protocol.name). The protocol object knows how to validate, parse, and emit schemas in its native form; it is the starting point for building schemas in this language.

Step 2: build a schema

Add to src/main.ts:

const schema = proto.schema()
  .vertex('user', 'object')
  .vertex('user.name', 'string')
  .vertex('user.age', 'integer')
  .edge('user', 'user.name', 'prop', { name: 'name' })
  .edge('user', 'user.age', 'prop', { name: 'age' })
  .required('user', [{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' }])
  .build();

console.log(
  'vertices:', Object.keys(schema.vertices).length,
  'edges:', schema.edges.length,
);

.vertex() declares a vertex (a kind, e.g. an object or a leaf string). .edge() declares an edge (a field, item, or variant). This schema says: a user is an object with a required string name and an optional integer age.

.build() validates the construction: required edges are present, every reference targets an existing vertex, the protocol’s equations are satisfied. If anything is wrong, you get an error here, before any data is touched. The returned BuiltSchema carries .vertices (an id -> Vertex record), .edges (an array), and .protocol.

Step 3: parse and validate data

Create data/sample.json:

{ "name": "Alice", "age": 30 }

Add to src/main.ts:

import { readFileSync } from 'node:fs';

const bytes = readFileSync('data/sample.json');
const instance = p.parseJson(schema, bytes);
console.log('parsed:', new TextDecoder().decode(p.toJson(schema, instance)));

const validation = instance.validate();
console.log('valid?', validation.isValid, validation.errors);

Run it. You see the parsed record echoed back and valid? true []. Panproto.parseJson(schema, bytes) returns an Instance by walking the JSON against the schema graph (a non-integer age, or JSON not matching the schema’s shape, raises during this call). Panproto.toJson(schema, instance) serialises it back out. instance.validate() runs the separate required-fields/constraints pass.

Step 4: catch a violation

Edit data/sample.json to remove name:

{ "age": 30 }

Run again. The parse still succeeds (parsing only walks the structure that is present), but validation.isValid is now false and validation.errors carries MissingRequiredEdge { ..., edge: "name (prop)" }. Required-field enforcement is performed by instance.validate(), not by parseJson itself; run both whenever you want a hard check.

What you built

Three things:

  1. A reference to a protocol (atproto).
  2. A schema (a graph of vertices and edges) within that protocol.
  3. Instances (data) parsed and validated against the schema.

This same pattern works for every protocol panproto supports. Swap 'atproto' for any other name in the built-in registry (Panproto.listProtocols() lists them), and the rest of the code is identical.

Next

Python version

import panproto

proto = panproto.get_builtin_protocol("atproto")

b = proto.schema()
b.vertex("user", "object")
b.vertex("user.name", "string")
b.vertex("user.age", "integer")
b.edge("user", "user.name", "prop", "name")
b.edge("user", "user.age", "prop", "age")
schema = b.build()

io = panproto.IoRegistry()
with open("data/sample.json", "rb") as f:
    instance = io.parse("atproto", schema, f.read())
print(instance.to_dict())

The Python builder uses statement-by-statement mutation (each .vertex() and .edge() mutates in place and returns None); chain syntax does not work. Parsing data through a protocol’s codec goes through IoRegistry().parse(protocol, schema, bytes). The full list of built-in protocols is panproto.list_builtin_protocols().

Rust version

panproto-core is a re-export facade over the sub-crates; there is no single Panproto entry-point struct. You compose the same flow directly from the sub-crates: build a Schema via panproto_schema::SchemaBuilder, validate it against a protocol from panproto_protocols, parse instances via panproto_inst::parse_json. The shape:

use panproto_core::{protocols, schema::SchemaBuilder, inst};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let proto = protocols::atproto::protocol();

    let schema = SchemaBuilder::new(&proto)
        .vertex("user", "object", Some("app.example.user"))?
        .vertex("user:name", "string", None)?
        .vertex("user:age",  "integer", None)?
        .edge("user", "user:name", "prop", Some("name"))?
        .edge("user", "user:age",  "prop", Some("age"))?
        .entry("user")
        .build()?;

    let bytes = std::fs::read("data/sample.json")?;
    let json: serde_json::Value = serde_json::from_slice(&bytes)?;
    let instance = inst::parse_json(&schema, "user", &json)?;
    println!("{instance:?}");
    Ok(())
}

Method signatures track the underlying crates rather than a fluent facade; consult docs.rs/panproto-schema and docs.rs/panproto-inst for current arguments.

Your first migration

You will take the user schema from Your first schema and evolve it to v2 by renaming age to years, then lift existing v1 records to v2 shape. About fifteen minutes.

By the end you will have: a v2 schema, a migration from v1 to v2 with explicit edge mappings, a compatibility report, an existence check that passes, and v2-shape data lifted from your v1 records with a verified round-trip.

Prerequisites

Completed Your first schema. The same my-first-schema/ project.

Step 1: write both schemas in one file

src/migration.ts:

import { Panproto } from '@panproto/core';

const p = await Panproto.init();
const proto = p.protocol('atproto');

const v1 = proto.schema()
  .vertex('user', 'object')
  .vertex('user.name', 'string')
  .vertex('user.age',  'integer')
  .edge('user', 'user.name', 'prop', { name: 'name' })
  .edge('user', 'user.age',  'prop', { name: 'age' })
  .required('user', [{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' }])
  .build();

const v2 = proto.schema()
  .vertex('user', 'object')
  .vertex('user.name',  'string')
  .vertex('user.years', 'integer')
  .edge('user', 'user.name',  'prop', { name: 'name' })
  .edge('user', 'user.years', 'prop', { name: 'years' })
  .required('user', [{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' }])
  .build();

The only change: age is renamed to years (a structural rename of both vertex id and edge name). We deliberately keep this minimal: adding a new required field with no v1 source would make lift produce records that do not satisfy v2, which the existence check would flag in Step 4.

Step 2: declare the migration

const builder = p.migration(v1, v2)
  .map('user',      'user')
  .map('user.name', 'user.name')
  .map('user.age',  'user.years')
  .mapEdge(
    { src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' },
    { src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' },
  )
  .mapEdge(
    { src: 'user', tgt: 'user.age',  kind: 'prop', name: 'age'  },
    { src: 'user', tgt: 'user.years', kind: 'prop', name: 'years' },
  );

const mig = builder.compile();

MigrationBuilder is fluent: .map(srcVertex, tgtVertex) aligns vertices and .mapEdge(srcEdge, tgtEdge) aligns edges. Vertex mappings alone are not enough: without the edge maps the lift drops every field and you get back {}. .compile() returns a CompiledMigration exposing .lift(), .get(), and .put() (a migration is a lens). For value-level transforms that compute one field from another, reach for the lens DSL or panproto-lens-dsl from the SDK.

Step 3: classify the change

const report = p.diffFull(v1, v2).classify(proto);
console.log('compatible?', report.isCompatible);
console.log('breaking changes:', report.breakingChanges);
console.log('non-breaking changes:', report.nonBreakingChanges);

Panproto.diffFull(old, new) returns a FullDiffReport; calling .classify(protocol) returns a CompatReport with three booleans (isCompatible, isBackwardCompatible, isBreaking) and two arrays (breakingChanges, nonBreakingChanges). For human-readable output call report.toText(); for a machine-readable summary report.toJson().

For this rename, isCompatible is false: the diff classifies the removal of user.age/edge age as breaking, even though the migration we declared explicitly carries the field forward under a new name. The classifier looks only at the structural diff, not at the migration mapping, so a “rename” between schemas registers as a removal plus an addition. Whether that is a problem for your downstream consumers depends on whether they have been told about the migration.

Step 4: check before you lift

const existence = p.checkExistence(v1, v2, builder);
console.log('existence valid?', existence.valid);
console.log('existence errors:', JSON.stringify(existence.errors, null, 2));
if (!existence.valid) {
  throw new Error('existence check failed');
}

Panproto.checkExistence(src, tgt, builder) runs the existence-condition test: for every required v2 field, is the mapping populated from a v1 source? It returns an ExistenceReport with valid: boolean and errors: ExistenceError[]. Each error is a serde-tagged object (for example { "RequiredFieldMissing": { "vertex": "user", "field": "name" } }), so inspect them as structured data rather than expecting a message string. Inspect valid before lifting; an invalid report means the lift will produce data that does not satisfy the v2 schema.

Step 5: lift the data

import { writeFileSync } from 'node:fs';

const v1Records = [
  { name: 'Alice', age: 30 },
  { name: 'Bob',   age: 42 },
];

const v2Records = v1Records.map((r) => mig.liftJson(r, 'user'));
console.log('v2:', v2Records);
writeFileSync('data/v2.jsonl', v2Records.map((r) => JSON.stringify(r)).join('\n'));

mig.liftJson(record, rootVertex) round-trips a JSON-shaped record through the migration: the wrapper parses it as an instance of the source schema rooted at rootVertex, lifts it through the edge mapping, and emits the target shape as a JS object. (The lower-level mig.lift() exists too, but it expects already-encoded Instance bytes rather than JSON-native records.) The complement (the data discarded by the forward direction) is captured on the get/put path; see Step 6.

Step 6: confirm round-trip

const original = v1Records[0];
const { view, complement } = mig.getJson(original, 'user');
const back = mig.putJson(view, complement, 'user');
console.log('view:', view);
console.log('round-trip:', back);

mig.getJson(record, rootVertex) returns { view, complement }; mig.putJson(view, complement, rootVertex) returns the restored record. The complement carries the data the v2 shape does not see; together, get and put satisfy the round-trip laws. (The lower-level mig.get() / mig.put() exist for already-encoded Instance bytes; the *Json wrappers handle the JSON-native path.) The restored record’s field ordering is not preserved (object keys come out alphabetised), so structural equality, not JSON-string equality, is the right round-trip predicate.

What you built

A migration that is type-checked, classified, existence-checked, and reversible via the get/put pair. None of those properties was hand-asserted; each was checked by the panproto tooling.

Next

Schema version control basics

You will turn a small TypeScript source file into a panproto repository, commit two versions of the schema parsed from that file, branch off a feature, and merge it back. About twenty minutes.

By the end you will have: a .panproto/ directory tracking schema history, two commits on main, a feature branch with a third commit merged back, and an attempted conflict you can inspect.

Prerequisites

The schema CLI installed (Install the CLI). This tutorial does not depend on the SDK; it stays at the CLI throughout.

A note on schema input

schema add accepts three kinds of input:

  • a source file in a language the parser registry knows (TypeScript, Rust, Python, etc.), parsed via tree-sitter into a full-AST schema;
  • a directory, parsed as a project into a unified schema; or
  • a .json file whose content is a serialised panproto-schema::Schema struct.

The hand-authored Schema JSON format is non-trivial (it’s the in-memory representation, not a pretty document language), so most workflows use the first two. This tutorial uses the source-file path.

Step 1: initialise

mkdir vcs-tutorial/src -p && cd vcs-tutorial
schema init
ls .panproto/

A .panproto/ directory appears, structurally similar to .git/. It has objects/, refs/, HEAD, and a small config.

Step 2: commit v1

Save the v1 user shape as src/user.ts:

export interface User {
  name: string;
  age: number;
}

Stage and commit:

schema add src/user.ts
schema commit -m "v1 user schema"
schema log --oneline

schema add runs the TypeScript tree-sitter parser, extracts a schema from the AST (vertices for the interface and its property types, edges for the properties, plus a fine layer of byte-position constraints used for round-tripping), and stages it. schema log --oneline now shows one commit with a blake3 hash.

Step 3: commit v2

Edit src/user.ts to rename age and add email:

export interface User {
  name: string;
  years: number;
  email: string;
}

Stage and commit:

schema add src/user.ts
schema diff --staged
schema commit -m "v2: rename age to years, add email"
schema log --oneline

schema diff --staged shows the structural changes (a renamed property, a new property, plus the tree-sitter byte-position constraint deltas). Two commits now.

Step 4: branch

schema branch feature/handle
schema checkout feature/handle

You are on a new branch sharing history with main up to the v2 commit. Add a handle field:

export interface User {
  name: string;
  years: number;
  email: string;
  handle: string;
}

Then:

schema add src/user.ts
schema commit -m "add handle"
schema log --oneline

You see the linear history with the new commit on feature/handle.

Step 5: merge

schema checkout main
schema merge feature/handle
schema log --oneline

Since main is an ancestor of feature/handle (no conflicting work was done on main in parallel), the merge is a fast-forward and produces no merge commit. The output ends with Merge successful. and reports the merged schema’s vertex/edge counts. For a non-trivial three-way merge, pass --no-ff to force a merge commit.

Step 6: provoke a conflict

For a clean conflict demonstration, start fresh in a new repo so the conflicting branches diverge from the same commit:

cd .. && mkdir vcs-conflict/src -p && cd vcs-conflict
schema init
cat > src/user.ts <<'EOF'
export interface User {
  name: string;
  age: number;
}
EOF
schema add src/user.ts && schema commit -m "v1"

# Create both branches at v1 (each branch starts at HEAD).
schema branch feature/handle-str
schema branch feature/handle-int

schema checkout feature/handle-str
# edit src/user.ts: add  handle: string;
schema add src/user.ts && schema commit -m "handle as string"
schema checkout main
schema merge feature/handle-str    # fast-forward, succeeds

schema checkout feature/handle-int
# edit src/user.ts: add  handle: number;
schema add src/user.ts && schema commit -m "handle as number"
schema checkout main
schema merge feature/handle-int

The second merge fails with conflict objects. For the type clash above you see entries like:

Merge produced 3 conflict(s):
  BothAddedConstraintDifferently {
    vertex_id: "src/user.ts::User::$2::$11",
    sort: "literal-value",
    ours_value: "string",
    theirs_value: "number"
  }
  ...
Error:   × merge failed with 3 conflict(s)

Each conflict is a structural object reported by the pushout construction, not a textual three-way merge marker. The variants you can hit include BothAddedConstraintDifferently (used here, when two branches set the same constraint to different values) and UniversalFactorizationFailure (raised by panproto-vcs’s pushout when no universal merge object exists at all). To resolve, choose one branch’s shape, edit the source file, schema add, schema commit.

What you built

A schema history that you can navigate, branch, and merge with the same affordances as a git history, but where the merge operation is a precise structural construction rather than a three-way text merge.

Next

Cross-protocol translation

You will translate the user schema from one shape to another using an explicit field-by-field mapping, then verify the translation reverses cleanly. About twenty minutes.

By the end you will have: two user schemas with different vertex and edge names expressing the same model, a CompiledMigration connecting them, a record converted from one shape to the other, and the reverse-direction restoration verified.

Prerequisites

Completed Schema version control basics. The my-first-schema/ project with @panproto/core installed.

State of the art

True cross-protocol translation (a JSON Schema document automatically converted into a Protobuf .proto) requires both schemas to be expressible against a single composed theory that the migration generator can align them in. Today that means either:

  • both schemas in the same protocol with an explicit edge mapping (demonstrated here), or
  • both schemas in a custom composed theory authored via the theory DSL (see Composing protocols by colimit), or
  • a hand-authored lens in the lens DSL bridging the two (see Write a lens (DSL)).

The CLI’s schema lens subcommands currently resolve --protocol against the built-in atproto only; everything multi-protocol lives in the SDK. The auto-aligning Panproto.lens(from, to) works at the WASM lens level but returns raw WInstance graphs rather than JS-native records, so it is not the right tool for a tutorial. We drive the translation from Panproto.migration(from, to) instead: its liftJson/getJson/putJson wrappers handle JSON encoding/decoding for you.

Step 1: build the source schema

Create src/cross.ts:

import { Panproto } from '@panproto/core';

const p = await Panproto.init();
const atproto = p.protocol('atproto');

const source = atproto.schema()
  .vertex('user',       'object')
  .vertex('user.name',  'string')
  .vertex('user.email', 'string')
  .vertex('user.years', 'integer')
  .edge('user', 'user.name',  'prop', { name: 'name' })
  .edge('user', 'user.email', 'prop', { name: 'email' })
  .edge('user', 'user.years', 'prop', { name: 'years' })
  .build();

Step 2: build the target schema

A second atproto schema for the same user model with different field names (display_name instead of name, email_address instead of email):

const target = atproto.schema()
  .vertex('user',              'object')
  .vertex('user.display_name', 'string')
  .vertex('user.email_address','string')
  .vertex('user.years',        'integer')
  .edge('user', 'user.display_name',  'prop', { name: 'display_name' })
  .edge('user', 'user.email_address', 'prop', { name: 'email_address' })
  .edge('user', 'user.years',         'prop', { name: 'years' })
  .build();

The shapes differ structurally (different vertex ids, different edge names) but represent the same information.

Step 3: declare the migration

const mig = p.migration(source, target)
  .map('user',       'user')
  .map('user.name',  'user.display_name')
  .map('user.email', 'user.email_address')
  .map('user.years', 'user.years')
  .mapEdge(
    { src: 'user', tgt: 'user.name',  kind: 'prop', name: 'name' },
    { src: 'user', tgt: 'user.display_name', kind: 'prop', name: 'display_name' },
  )
  .mapEdge(
    { src: 'user', tgt: 'user.email', kind: 'prop', name: 'email' },
    { src: 'user', tgt: 'user.email_address', kind: 'prop', name: 'email_address' },
  )
  .mapEdge(
    { src: 'user', tgt: 'user.years', kind: 'prop', name: 'years' },
    { src: 'user', tgt: 'user.years', kind: 'prop', name: 'years' },
  )
  .compile();

.map(srcVertex, tgtVertex) aligns vertices; .mapEdge(srcEdge, tgtEdge) aligns edges. You need both: vertex-only mappings produce a migration that drops every field on lift. For larger schemas, Panproto.lens(from, to) and LensHandle.autoGenerate will infer many of these alignments via name similarity and structural priors, but the resulting LensHandle operates on opaque WInstance bytes rather than JS-native records.

Step 4: convert a record

const alice = { name: 'Alice', email: 'alice@example.com', years: 30 };
const converted = mig.liftJson(alice, 'user');
console.log('converted:', converted);

mig.liftJson(record, rootVertex) round-trips JSON through the migration: it parses the input against source, lifts it through the edge mapping, and emits the result in target’s shape as a plain JS object. You see:

{ display_name: 'Alice', email_address: 'alice@example.com', years: 30 }

Step 5: round-trip via get/put

const { view, complement } = mig.getJson(alice, 'user');
console.log('view:', view);
const back = mig.putJson(view, complement, 'user');
console.log('back:', back);

mig.getJson(record, rootVertex) returns { view, complement }. view is the target-shape projection (same as liftJson’s output); complement carries the encoding state the forward projection sets aside. mig.putJson(view, complement, rootVertex) reverses the translation, restoring the source-shape record:

view: { display_name: 'Alice', email_address: 'alice@example.com', years: 30 }
back: { email: 'alice@example.com', name: 'Alice', years: 30 }

The restored record’s field ordering is not preserved (object keys come out alphabetised), so structural equality, not JSON-string equality, is the right round-trip predicate.

Step 6: dispose

mig[Symbol.dispose]();

CompiledMigration holds a WASM-side resource. Call [Symbol.dispose]() explicitly or use a using declaration so it is released at scope exit:

using mig = p.migration(source, target). /* ... */ .compile();

What you built

Two schemas expressing the same model with different field names, an explicit field-by-field migration between them, a forward conversion of a record, and a verified reverse-direction restoration. The same pattern works for any pair of schemas in a single protocol; for genuinely cross-protocol pairs (e.g. JSON Schema ↔ Protobuf), express both against a composed theory authored via the theory DSL, or author the bridge in the lens DSL.

See also

Next

Where to go from here

You have walked through the four core flows of panproto: defining schemas, evolving them via migrations, version-controlling the history, and translating between protocols. From here:

  • The how-to guides cover specific workflows in depth (CI, lenses, format-preserving codecs).
  • The reference quadrant is the lookup for everything: CLI, SDKs, protocols, expression language, lens combinators, configuration.
  • The explanation quadrant is for understanding why the system is shaped the way it is.

How-to guides

How-to pages are recipes. Each one assumes you know what you are trying to do and need the steps to do it. Every page follows the same skeleton:

  1. Prerequisites. What must be installed and what state your project must be in.
  2. The task. The minimum sequence of commands or code to accomplish the goal.
  3. Verification. How to confirm the task succeeded.
  4. Common mistakes. Failure modes and how to recognise them.
  5. See also. Adjacent how-tos, the relevant reference page, and the explanation page that covers the underlying model.

If you want to understand why a step works, follow the link to the explanation quadrant. If you want a complete walk-through with no prior context, start with a tutorial.

Index

Setup

Schemas and migrations

Working with data

Version control

Translation and integration

CI

Install panproto

panproto ships in four distributions, one per surface. Pick the one that matches what you are building.

SurfacePagePackage
Command line (schema)Install the CLIpanproto-cli (Homebrew, shell installer, cargo install)
Rust applicationInstall the Rust SDKpanproto-core (crates.io)
TypeScript / JavaScript applicationInstall the TypeScript SDK@panproto/core (npm)
Python applicationInstall the Python SDKpanproto (PyPI)

The CLI and the language SDKs are independent: you can install any subset.

See also

Install the CLI

The CLI is a single binary called schema. It is the entry point for the panproto-cli crate.

Prerequisites

A POSIX shell on macOS or Linux, or PowerShell on Windows. No language toolchain is required; pre-built binaries are published with every release.

Install

Homebrew (macOS, Linux)

brew install panproto/tap/panproto-cli

Shell installer (macOS, Linux, WSL)

curl --proto '=https' -LsSf https://github.com/panproto/panproto/releases/latest/download/panproto-cli-installer.sh | sh

PowerShell installer (Windows)

powershell -ExecutionPolicy ByPass -c "irm https://github.com/panproto/panproto/releases/latest/download/panproto-cli-installer.ps1 | iex"

From source

cargo install panproto-cli

Requires a Rust toolchain (1.85 or newer).

Verification

schema --version

prints the installed version. The full subcommand list is at Reference: CLI, or schema --help.

Common mistakes

  • Installing through cargo install without an up-to-date toolchain. panproto requires Rust 1.85+.
  • Mixing the Homebrew install with a from-source install on the same machine: only one schema ends up first on PATH.

See also

Install the Rust SDK

Prerequisites

A Rust toolchain at edition 2024 (toolchain 1.85+).

Install

# Cargo.toml
[dependencies]
panproto-core = "0.49"

For specific feature flags (full-parse, project, git, llvm, jit, tree-sitter), see Reference: Rust SDK.

Verification

use panproto_core::protocols::atproto;
use panproto_core::schema::SchemaBuilder;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let proto = atproto::protocol();
    let schema = SchemaBuilder::new(&proto)
        .vertex("root", "record", Some("app.example.root"))?
        .entry("root")
        .build()?;
    println!("built {} vertex(es)", schema.vertices.len());
    Ok(())
}

cargo run builds and links against the panproto facade.

Common mistakes

  • Pinning a stale toolchain. panproto-core requires edition 2024.
  • Depending on lower-level crates (panproto-schema, panproto-mig, …) directly without a strong reason. The facade re-exports the canonical surface; reach past it only when you need an internal API.

See also

Install the TypeScript SDK

Prerequisites

Node 20 or newer. A package manager (npm, pnpm, or yarn). A bundler with WASM support if targeting the browser (Vite, Rollup, esbuild, webpack 5+).

Install

npm install @panproto/core
# or
pnpm add @panproto/core
# or
yarn add @panproto/core

Verification

import { Panproto } from '@panproto/core';

const p = await Panproto.init();
console.log(p.listProtocols());

Panproto.init() loads the WASM module and returns the top-level handle. Calling p.listProtocols() enumerates the built-in protocols and confirms the binding is wired up.

Common mistakes

  • Forgetting to await Panproto.init(). The WASM load is asynchronous; using the handle before init resolves throws.
  • Bundler that does not understand .wasm imports. Vite handles this out of the box; older webpack configurations may need a loader.
  • Running under Node 18 or earlier. WebAssembly bigint coercion under those versions is incomplete; some integer operations will throw.

See also

Install the Python SDK

Prerequisites

Python 3.13 or newer. A virtual environment is recommended.

Install

pip install panproto

The wheel includes native PyO3 bindings (no WASM) and eleven core tree-sitter grammars. Additional grammar packs are installed separately:

pip install panproto-grammars-functional   # Haskell, OCaml, Elm, Erlang, Elixir, ...
pip install panproto-grammars-web          # HTML, CSS, Vue, Svelte, ...
pip install panproto-grammars-all          # umbrella package

The full table of packs is in Reference: Python SDK.

Verification

import panproto

print(panproto.list_builtin_protocols()[:3])

The native module loads at import time (no async wrapper, unlike the TypeScript SDK). Listing a few of the built-in protocols confirms the linkage. The full top-level surface is in Reference: Python SDK.

Common mistakes

  • Running under Python < 3.13. The wheel uses 3.13-only typing constructs in its public API.
  • Installing the deprecated pure-Python WASM SDK in parallel. The native PyO3 wheel supersedes it; remove the older package.
  • Importing companion grammar packs before panproto. The packs auto-register on import; the order matters only when both packs and panproto are imported in the same module.

See also

Define a schema

A schema in panproto is a graph of vertices and edges, validated against a protocol’s GAT. You can build one from any of the four surfaces, with the same outcome.

SurfacePage
schema CLIFrom the CLI
TypeScript SDKFrom TypeScript
Python SDKFrom Python
Rust SDKFrom Rust

The CLI is the recommended starting point if you have an existing schema file (JSON Schema, Protobuf, ATProto Lexicon, …) you want to load and inspect. The language SDKs are the recommended starting point if you are building schemas programmatically inside an application.

See also

Define a schema from the CLI

Prerequisites

The schema binary installed (Install the CLI). A schema file in a supported protocol, or a protocol name to scaffold an empty one.

The task

Validate an existing schema

schema validate --protocol atproto path/to/schema.json

Exits zero on success. On failure, prints the failing equation or constraint with the offending vertex and edge.

Scaffold from an existing schema

schema scaffold --protocol atproto schemas/post.json

scaffold runs free-model construction over the given schema and prints sample term assignments to stdout. Use --json for machine-readable output, --depth and --max-terms to bound the search. (To start from nothing, write a one-vertex schema by hand and scaffold against that.)

Inspect

schema diff schemas/post-v1.json schemas/post-v2.json

diff reports vertex and edge changes between two schemas. Inside a panproto repository, schema show <ref> resolves a commit, schema, or migration object and prints its contents.

Verification

After validation, run:

schema verify --protocol atproto path/to/schema.json

verify checks that the schema satisfies every equation in the protocol’s theory, not just the constraints you wrote. A pass guarantees the schema is well-formed in the categorical sense.

Common mistakes

  • Forgetting --protocol. Many subcommands require an explicit protocol; auto-detection happens only at the project level when a panproto.toml is present.
  • Running schema validate when you mean schema check (the latter checks a migration, not a schema).

See also

Define a schema from TypeScript

Prerequisites

@panproto/core installed (Install the TypeScript SDK).

The task

import { Panproto } from '@panproto/core';

const p = await Panproto.init();
const proto = p.protocol('atproto');

const schema = proto.schema()
  .vertex('post', 'record', { nsid: 'app.bsky.feed.post' })
  .vertex('post:body', 'object')
  .vertex('post:body.text', 'string')
  .edge('post', 'post:body', 'record-schema')
  .edge('post:body', 'post:body.text', 'prop', { name: 'text' })
  .build();

p.protocol(name) loads the named protocol’s theory. proto.schema() returns a SchemaBuilder. Each .vertex() call adds a vertex (record kind) with optional constraints; each .edge() call adds an edge (field, item, or variant) between vertices. .build() validates the constructed schema against the protocol’s theory and returns a Schema handle.

Verification

const result = schema.validate(proto);
if (!result.isValid) throw new Error(JSON.stringify(result.issues));

validate(protocol) returns a ValidationResult containing any issues. An empty issue list confirms the schema satisfies the protocol’s edge rules and obj-kinds.

Common mistakes

  • Calling .build() before all required edges are added. The validation runs eagerly; missing required structure raises immediately.
  • Re-using a builder after .build(). Builders are single-shot; create a new one for each schema.
  • Treating the returned Schema handle as a plain object. It is an opaque handle into the WASM heap; pass it to subsequent SDK calls, do not introspect it directly.

See also

Define a schema from Python

Prerequisites

panproto installed (Install the Python SDK).

The task

import panproto

proto = panproto.get_builtin_protocol("atproto")

b = proto.schema()
b.vertex("user", "object", "app.example.user")
b.vertex("user:name", "string")
b.vertex("user:age", "integer")
b.edge("user", "user:name", "prop", "name")
b.edge("user", "user:age", "prop", "age")
schema = b.build()

panproto.get_builtin_protocol(name) returns the named protocol; .vertex(id, kind, nsid=None) and .edge(src, tgt, kind, name=None) each mutate the SchemaBuilder in place (returning None), and .build() validates and returns a Schema. The TypeScript SDK exposes the same operations as a chainable surface; the Python binding does not.

Call panproto.list_builtin_protocols() to see every registered protocol name (atproto, openapi, cddl, geojson, …).

Verification

issues = schema.validate(proto)
assert not issues, issues

Schema.validate(protocol) returns a list of validation messages; an empty list means the schema is well-formed.

Common mistakes

  • Chaining the builder calls. The Python SchemaBuilder.vertex(...) / edge(...) / constraint(...) methods mutate in place and return None; hold the builder in a variable and mutate it statement-by-statement, then call .build().
  • Using a Python dict where the SDK expects a Schema handle. Conversion is deliberate; to materialise an Instance from bytes against a built Schema, use panproto.IoRegistry().parse(protocol, schema, data).

See also

Define a schema from Rust

Prerequisites

panproto-core in your Cargo.toml (Install the Rust SDK).

The task

use panproto_core::protocols::atproto;
use panproto_core::schema::SchemaBuilder;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let proto = atproto::protocol();

    let schema = SchemaBuilder::new(&proto)
        .vertex("user", "object", Some("app.example.user"))?
        .vertex("user:name", "string", None)?
        .vertex("user:age", "integer", None)?
        .edge("user", "user:name", "prop", Some("name"))?
        .edge("user", "user:age", "prop", Some("age"))?
        .entry("user")
        .build()?;

    println!("{} vertices, {} edges", schema.vertices.len(), schema.edges.len());
    Ok(())
}

SchemaBuilder::new(&protocol) constructs the builder; each vertex and edge call validates against the protocol’s edge rules and obj-kind list. entry declares basepoint vertices for instance rooting. build returns a Schema (or a SchemaError).

Verification

use panproto_core::schema::{SchemaBuilder, validate};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let proto = panproto_core::protocols::atproto::protocol();
let schema = SchemaBuilder::new(&proto)
    .vertex("user", "object", Some("app.example.user"))?
    .entry("user")
    .build()?;
let errors = validate(&schema, &proto);
assert!(errors.is_empty(), "validation errors: {errors:?}");
Ok(()) }

validate returns a Vec<ValidationError>. The error carries the failing equation and the offending vertex or edge as structured fields.

Common mistakes

  • Reaching past panproto-core to lower-level crates without a reason. The facade re-exports the canonical surface; do not depend on panproto-schema directly unless you need an internal API.
  • Holding Schema across an await. Handles are not Send by default; clone them or restructure the call.

See also

Build a migration

A migration is a structured map between two schemas plus, optionally, value-level transforms applied during lift. This page covers building one from the CLI and from the SDKs.

Prerequisites

Two schemas in the same protocol (or compatible protocols). The schema CLI installed, or one of the language SDKs.

The task

From the CLI

Given schemas/v1.json and schemas/v2.json, plus a mapping file migrations/v1-to-v2.json:

schema check --src schemas/v1.json --tgt schemas/v2.json --mapping migrations/v1-to-v2.json

check runs the existence check: which fields in v2 require which fields in v1, and is every required input present. Exits zero if the migration is well-defined.

To also type-check at the GAT level (equivalent to a separate schema typecheck):

schema check --src schemas/v1.json --tgt schemas/v2.json --mapping migrations/v1-to-v2.json --typecheck

For schema-level diff classification, generate a lens between the two schemas:

schema lens generate --protocol atproto schemas/v1.json schemas/v2.json --save lens.json
schema diff schemas/v1.json schemas/v2.json --lens --save lens.json

To migrate data, use the VCS-driven path: commit v1 and v2 to a panproto repository, then run schema data migrate <data-dir> against the working tree (see Schema VCS data versioning).

From the SDKs

const mig = p
  .migration(srcSchema, tgtSchema)
  .map('user', 'user')
  .compile();

const { data: forward } = mig.lift(oldRecord);   // forward
const { view, complement } = mig.get(oldRecord); // forward, retaining complement
mig.put(view, complement);                       // backward

p.checkExistence(src, tgt, builder) runs the same existence check as the CLI. Python and Rust SDKs use the same shape with language-idiomatic naming.

Verification

schema check exits zero if the migration is well-defined (existence conditions hold). For diff classification, use panproto.diff_and_classify(old, new, protocol) in Python, or panproto_check::diff(old, new) followed by panproto_check::classify(&diff, &protocol) in Rust. In TypeScript the equivalent is Panproto.diffFull(old, new).classify(protocol), which returns a CompatReport with one of:

ClassificationMeaning
fully-compatibleOld data lifts unchanged; the migration is a refinement.
backward-compatibleOld data lifts via a value-level transform.
breakingSome old records cannot be lifted; CI should reject.

For wiring this into CI, see Breaking-change gate.

Common mistakes

  • Skipping --typecheck for non-trivial migrations. Existence checking does not catch GAT-level type errors; the --typecheck flag does.
  • Treating a breaking classification as a warning. CI should reject by default; merging a breaking migration without an explicit acknowledgement is the most common cause of data corruption in production.
  • Lifting data before the check passes. Lift can produce invalid output if the migration is not well-defined.

See also

Apply field transforms

A field transform is a value-level expression applied during migration: a way to compute the new field’s value from the old data. Transforms are written in the expression language.

Prerequisites

A migration mapping between two schemas. The expression language reference for available builtins.

The task

Inline a transform in a mapping file

The mapping JSON consumed by schema check is a serialized Migration. Field-level value transforms attach as expression resolvers indexed by the (src_vertex, tgt_vertex) pair they bridge:

{
  "vertex_map": {
    "user": "user",
    "user:first": "user:given_name"
  },
  "edge_map": [],
  "hyper_edge_map": {},
  "label_map": [],
  "resolver": [],
  "hyper_resolver": [],
  "expr_resolvers": [
    [["user:full_name", "user:full_name"],
     "\\r -> concat(r.first, \" \", r.last)"]
  ]
}

Each expr_resolvers entry is [[src_vertex, tgt_vertex], expression]. The expression is parsed by panproto-expr-parser and applied during lift (old to new). Backward direction is supplied by the lens/protolens layer rather than the mapping file: pair an ApplyExpr field transform with its inverse on the corresponding Protolens step, or annotate a coercion on the schema and let the migration compiler emit both directions.

From the SDKs

The TypeScript and Python SDKs do not yet expose per-field transforms on the MigrationBuilder. To compose a migration with a value-level rewrite, build a ProtolensChain directly (combinators::rename_field, elementary::apply_expr, …) and call compile_migration on it:

import panproto

# `panproto.rename_field`, `add_field`, `remove_field`, `hoist_field`,
# and `pipeline` each return a `ProtolensChain`. Serialize and apply via
# `chain.to_json()` and the lens APIs.
chain = panproto.rename_field("user", "full_name", "full_name", "name")
print(chain.to_json())

To attach a value-level expression resolver to a Migration between two specific schemas, write the mapping JSON by hand (see above) and call panproto.compile_migration(migration, src_schema, tgt_schema).

Verification

schema check --src schemas/v1.json --tgt schemas/v2.json --mapping migration.json --typecheck

check --typecheck ensures the transforms type-check against the source and target schemas. Property tests in CI then verify the lens laws on sampled data.

Common mistakes

  • Expecting expr_resolvers to supply both directions. The mapping carries forward-only expressions; the backward leg comes from the lens/protolens layer (pair an ApplyExpr field transform with its inverse on the corresponding protolens step, or annotate a coercion on the schema and let the migration compiler emit both directions).
  • Using IO or random functions in the expression. The language is bounded-pure; non-deterministic builtins are not exposed.
  • Letting the budget exceed. Long string operations on large records can hit the step budget. Expressions that hit the budget raise ExprError::StepLimitExceeded at runtime.

See also

Use lenses

Every migration is a lens. The lens API gives you direct access to the bidirectional transform: get lifts data forward, put projects new data back to the old shape, complement records what get discarded so put can restore it.

Prerequisites

A migration (Build a migration) or a hand-written lens via the lens DSL.

The task

A CompiledMigration is itself a lens; reach for LensHandle only when you want a free-standing protolens chain.

const { view, complement } = mig.get(oldRecord);

const editedView = { ...view, age: view.age + 1 };
const { data: updatedOld } = mig.put(editedView, complement);

mig.get returns the forward view together with the complement (the data discarded by get); mig.put consumes them and returns a LiftResult { data, ... } reconstructed with the edit applied. The round-trip laws guarantee this is well-defined.

To compose two compiled migrations sequentially:

const composed = p.compose(mig_ab, mig_bc);

To compose two free-standing protolens chains:

const composedChain = p.composeLenses(chainAB, chainBC);

Both are methods on Panproto; composition fails (throws) if the intermediate schemas do not chain.

Verification

const result = lens.checkLaws(instanceBytes);
console.log(result.holds, result.violation);

// For individual laws:
const getput = lens.checkGetPut(instanceBytes);
const putget = lens.checkPutGet(instanceBytes);

LensHandle.checkLaws(instance) returns a LawCheckResult { holds, violation } covering GetPut and PutGet together. checkGetPut and checkPutGet test each law individually; the Rust property tests in panproto-lens cover PutPut as well, exercised continuously in CI.

Common mistakes

  • Calling put with a complement from a different source. Complement::compose will refuse with ComplementFingerprintMismatch. Recompute the complement from the current source rather than reusing one.
  • Reading get and then mutating the source before calling put. The complement is computed against the source as it was at get time; if you mutate the source, the complement is stale.
  • Composing lenses whose intermediate schemas are isomorphic but not equal. The structural-equality check on protolens_composable will reject; rebuild one of the lenses against the other’s schema.

See also

Use protolenses

A protolens is a schema-parameterised lens family: one declaration that produces a lens for any schema satisfying a precondition. Use protolenses when you want to apply the same transform across a fleet of related schemas without writing one lens per schema.

Prerequisites

The Rust SDK (panproto-lens::protolens) or the lens DSL with parametric declarations enabled.

The task

Declare

A Protolens packages a precondition (a TheoryConstraint on the source theory) with a TheoryTransform. Build elementary protolenses via the elementary helpers, or compose them into a ProtolensChain:

use panproto_lens::protolens::{ProtolensChain, combinators};

fn main() {
let rename_legacy_id: ProtolensChain = combinators::rename_field(
    "user", "user:legacy_id", "legacy_id", "id",
);
let _ = rename_legacy_id;
}

The chain captures a precondition on the source theory and a sequence of transforms. It does not yet know which concrete schema it will run against.

Apply (fused)

use panproto_lens::protolens::{ProtolensChain, combinators};
use panproto_core::schema::{Protocol, Schema, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let rename_legacy_id: ProtolensChain = combinators::rename_field("user", "user:legacy_id", "legacy_id", "id");
let protocol: Protocol = panproto_core::protocols::atproto::protocol();
let user_schema: Schema = SchemaBuilder::new(&protocol).vertex("user", "object", None)?.entry("user").build()?;
let post_schema: Schema = user_schema.clone();
let lens_for_users = rename_legacy_id.instantiate(&user_schema, &protocol)?;
let lens_for_posts = rename_legacy_id.instantiate(&post_schema, &protocol)?;
let _ = (lens_for_users, lens_for_posts);
Ok(()) }

Each call produces a concrete Lens against a specific schema, with the migration metadata preserved as a single fused morphism.

Apply (sequential)

use panproto_lens::protolens::{ProtolensChain, combinators};
use panproto_core::schema::{Protocol, Schema, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let rename_legacy_id: ProtolensChain = combinators::rename_field("user", "user:legacy_id", "legacy_id", "id");
let protocol: Protocol = panproto_core::protocols::atproto::protocol();
let base_schema: Schema = SchemaBuilder::new(&protocol).vertex("user", "object", None)?.entry("user").build()?;
let stepwise = rename_legacy_id.instantiate_sequential(&base_schema, &protocol)?;
let _ = stepwise;
Ok(()) }

Sequential instantiation yields one lens per step; useful in property tests when each intermediate state needs to be inspected.

Compose

use panproto_lens::protolens::{Protolens, vertical_compose, elementary};

fn main() -> Result<(), Box<dyn std::error::Error>> {
let first:  Protolens = elementary::drop_sort("old_field");
let second: Protolens = elementary::drop_sort("legacy_id");
let composed = vertical_compose(&first, &second)?;
let _ = composed;
Ok(()) }

vertical_compose requires the target endofunctor of first to structurally match the source endofunctor of second. A mismatch returns LensError.

Verification

After instantiate returns a Lens, exercise the round-trip laws on representative data via Lens::get / Lens::put (or use the higher-level lens-law harness in panproto_lens::laws). Property tests in crates/panproto-lens/tests/ are the canonical examples.

Common mistakes

  • Composing protolenses whose intermediate schemas only happen to look the same. protolens_composable enforces structural equality, not name equality; build one against the other to be safe.
  • Reaching for sequential instantiation in production. Use fused (instantiate) by default; sequential exists for inspection and tests.
  • Treating preconditions as pure documentation. The precondition is checked at instantiation time; a schema that does not satisfy it raises LensError::ProtolensError with a message listing the unmet constraints (use Protolens::check_applicability first if you want to surface the reasons separately).

See also

Use dependent optics

A dependent optic is an optic-kind chosen by the schema edge it is applied at: a prop edge yields a Lens, an item edge yields a Traversal, a variant edge yields a Prism. Dependent optics let you write transforms that work uniformly across all three edge kinds, with the optic kind unified at instantiation.

Prerequisites

The Rust SDK. The panproto-lens crate (re-exported from panproto-core::lens).

The task

Apply a scoped transform

#![allow(unused)]
fn main() {
use panproto_lens::protolens::elementary;
use panproto_lens::optic::OpticKind;

// An elementary step is itself a Protolens whose optic kind is derived
// from the edge kind at the focus.
let inner = elementary::rename_edge_name("post", "tags", "tags", "labels");
let scoped = elementary::scoped("post:tags", inner);

match scoped.optic_kind() {
    OpticKind::Lens      => { /* prop edge */ }
    OpticKind::Traversal => { /* item edge */ }
    OpticKind::Prism     => { /* variant edge */ }
    other => panic!("unexpected optic kind: {other:?}"),
}
}

elementary::scoped wraps an inner protolens at the given focus vertex. At the instance level the optic kind follows from the edge kind connecting the parent to the focus vertex: prop -> Lens, item -> Traversal, variant -> Prism.

Field-level combinators

The panproto_lens::protolens::combinators module exposes higher-level chains assembled from elementary steps; for instance combinators::rename_field(parent, field, old_name, new_name) returns a ProtolensChain that renames a JSON property key. To traverse an item edge and apply an inner transform per element, use elementary::scoped (or the combinators::map_items helper) targeting the array element’s vertex; the result is a Traversal under the optic-kind classifier.

Verification

The lens laws apply uniformly across all three optic kinds. Instantiate the protolens into a Lens against a concrete schema, then call panproto_lens::laws::check_laws(&lens, &instance) (or check_get_put / check_put_get individually) on representative data; each returns Result<(), LawViolation>.

Common mistakes

  • Hand-coding the optic kind. The point of dependent optics is that the kind follows from the edge; if you find yourself branching on it manually, read it off Protolens::optic_kind() (or ProtolensChain::composed_optic_kind() for chains) and let the schema decide.
  • Applying a prop-style combinator at an item edge. The carrier optic at an item edge is a Traversal, so refine_scoped_optic composes the inner Lens with Traversal to yield Traversal: the round-trip laws still hold, but the result is a multi-focus traversal rather than a single-focus lens, which is rarely what the call site meant.

See also

Write lenses in the lens DSL

The lens DSL is a declarative way to specify a lens between two schemas. Specs are written in Nickel, JSON, or YAML and compile to the lens combinator algebra.

Prerequisites

A pair of schemas to bridge. The schema CLI or panproto-lens-dsl crate.

The task

Write the spec (Nickel)

# lenses/user-v1-to-v2.ncl
{
  id = "user.v1-to-v2",
  description = "Rename name fields and add display_name",
  steps = [
    { rename_field = { from = "first_name", to = "given_name" } },
    { rename_field = { from = "last_name",  to = "family_name" } },
    { add_field = {
        name = "display_name",
        default = "",
        expr = "Concat(record.given_name, \" \", record.family_name)",
      } },
  ],
}

Each step is a single-key object. The key picks the variant; the value carries its parameters. The DSL applies steps left-to-right against the source schema, producing a target schema and a CompiledLens between them. Full step grammar: crates/panproto-lens-dsl/src/document.rs.

Compile a chain

schema lens does not consume .ncl/.json/.yaml lens documents directly through the CLI. The DSL is compiled by the panproto-lens-dsl library; the CLI works on the compiled ProtolensChain it produces. To go from a pair of schemas to a chain via the CLI, use:

schema lens generate \
  --protocol atproto \
  schemas/user-v1.json \
  schemas/user-v2.json \
  --save lenses/user-v1-to-v2.json

This auto-derives a chain. To author a chain by hand, write the lens DSL and call panproto_lens_dsl::load and panproto_lens_dsl::compile from Rust (or the equivalent SDK calls).

Apply

schema lens apply --protocol atproto lenses/user-v1-to-v2.json data/users.json

Inspect

schema lens inspect --protocol atproto lenses/user-v1-to-v2.json

Prints the combinator chain. --protocol is required.

Compile from Python

ProtolensChain exposes loaders that consume a lens-DSL document and compile it directly to a chain anchored at the named body vertex of the source schema:

import panproto

chain = panproto.ProtolensChain.from_dsl_path(
    "lenses/user-v1-to-v2.ncl",
    body_vertex="record:body",
)

# Or from a string:
chain = panproto.ProtolensChain.from_dsl_json(json_source, "record:body")
chain = panproto.ProtolensChain.from_dsl_yaml(yaml_source, "record:body")
chain = panproto.ProtolensChain.from_dsl_nickel(nickel_source, "record:body")

from_dsl_path dispatches on file extension (.ncl / .json / .yaml / .yml). For Nickel, an optional import_paths argument extends the import-resolution lookup so user-defined modules can be referenced.

Verification

# Check the chain is applicable to every schema in a directory.
schema lens check --protocol atproto lenses/user-v1-to-v2.json schemas/

# Verify the lens laws on test data.
schema lens verify --protocol atproto data/users.json schemas/user-v2.json

lens check reports applicability without instantiating; lens verify checks the round-trip laws on actual data.

Common mistakes

  • Step ordering. The DSL deliberately exposes ordering; some sequences commute and produce the same result, others do not. When in doubt, run schema lens check with high sample counts.
  • Forgetting backward on a field_transform step. Without it, the step is half a lens; compilation rejects.
  • Writing schemas inline. The DSL expects path references; embedding a schema as a literal works for small examples but loses VCS tracking.

See also

Query instances

A panproto instance is a graph of records. Queries filter and project that graph using the expression language.

Prerequisites

A schema and an instance loaded against it. The expression-language reference for predicates.

The task

Filter and project

executeQuery is exported from @panproto/core. Its signature is executeQuery(query, instance, wasm) (query first, instance second, WASM module third). The query keys are anchor, predicate (an Expr object, not a string), projection (string array), groupBy, limit, and path.

import { executeQuery, parseExpr, ExprBuilder } from '@panproto/core';

const recent = executeQuery(
  {
    anchor: 'post',
    predicate: parseExpr('post.created_at > "2024-01-01"'),
    projection: ['id', 'title'],
  },
  instance,
  panproto._wasm,
);

The predicate is evaluated against each matched node; projection selects the fields included in each QueryMatch.

Following edges

Use the path field to traverse from the anchor before predicate matching:

const userPosts = executeQuery(
  { anchor: 'user', path: ['authored'], projection: ['title'] },
  instance,
  panproto._wasm,
);

To filter on edge presence, build a predicate with the instance-aware builtins (Edge, Children, HasEdge, EdgeCount) using ExprBuilder or parseExpr.

Verification

Predicates are type-checked against the schema before evaluation; an ill-typed predicate raises before any data is touched.

Common mistakes

  • Using Edge, Children, etc. in the standard evaluator. They return Null outside an instance environment. The query API sets the environment correctly; reaching for the bare evaluator does not.
  • Hitting the step budget on large records. Heavy string operations on long fields will exceed the budget; either narrow the predicate or raise the budget at the call site.

See also

Convert data between formats

panproto’s parse/migrate/emit pipeline converts data between any two protocols whose theories overlap. The intermediate is a panproto-inst instance graph; you do not need to write a transcoder per format pair.

Prerequisites

The schema CLI installed, or the SDK in your language. The source and target protocol names.

The task

Single file or directory (within one protocol)

schema data convert --protocol atproto \
  --from schemas/user-v1.json --to schemas/user-v2.json \
  data/users.json -o data/users-v2.json

<DATA> is a positional file or directory. --from and --to are schema paths (within the named protocol), not protocol names. Add --direction backward to push data the other way along the lens; add --defaults k=v,... to supply complement defaults.

For batch conversion, point <DATA> at a directory.

Across protocols

Cross-protocol conversion goes through a composed theory; see Translate across protocols. schema data convert is intra-protocol only.

From the SDKs

const out = await p.convert(jsonBytes, {
  from: srcSchema,
  to: tgtSchema,
});

Panproto.convert(data, opts) auto-generates a lens between the two BuiltSchema arguments, applies it forward, and returns the converted record. Pass defaults in opts to supply complement defaults for fields the source does not determine.

Verification

To verify round-trip fidelity, generate a chain explicitly and run schema lens verify:

schema lens generate --protocol atproto schemas/user-v1.json schemas/user-v2.json --save chain.json
schema lens verify --protocol atproto data/users.json schemas/user-v2.json

Lens verification on test data exercises the three round-trip laws (GetPut, PutGet, PutPut); a pass means the chain is loss-free for the sampled records.

Common mistakes

  • Assuming all schema pairs are loss-free. They are not. Run schema lens verify after conversion to exercise the round-trip laws on representative data, and run schema check --src --tgt --mapping ... --typecheck to classify the diff as fully compatible, backward compatible, or breaking.
  • Skipping lens verification in CI. Without it, silent loss is possible.

See also

Round-trip with format preservation

When you parse a JSON, YAML, TOML, XML, or CSV file and emit it back without changes, panproto can guarantee emit(parse(bytes)) == bytes byte-for-byte. This requires the format-preserving codec, which uses tree-sitter grammars and a CST complement to capture whitespace, comments, and ordering.

This page covers the structured-data path (JSON / YAML / TOML / XML / CSV). For source-code grammars (Rust, Python, Stan, BUGS, and 257 more), the emit path is emit_pretty, which derives spacing and dispatch from grammar.json structurally and exercises a per-protocol verification tier. See Parse full ASTs and Source-code emission. The two systems are independent and serve different formats; choose one based on what you are parsing.

Prerequisites

A panproto build with the tree-sitter feature flag enabled on panproto-core (or directly on panproto-io). The CLI ships with tree-sitter enabled by default.

The task

The format-preserving round-trip is exposed via schema parse emit, which parses a file and emits it back in one step:

schema parse emit config.yaml > config.roundtripped.yaml
diff config.yaml config.roundtripped.yaml

The diff is empty when the codec preserves the input. (For programmatic use, parse and emit are exposed separately by the SDK; see below.)

In Rust:

use panproto_core::io::unified_codec::UnifiedCodec;
use panproto_core::schema::{Schema, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let proto = panproto_core::protocols::atproto::protocol();
let schema: Schema = SchemaBuilder::new(&proto).vertex("root", "object", None)?.entry("root").build()?;
let bytes: &[u8] = b"";
let codec = UnifiedCodec::yaml("atproto")?;
let (instance, complement) = codec.parse_wtype_preserving(&schema, bytes)?;
let out = codec.emit_wtype_preserving(&schema, &instance, &complement)?;
assert_eq!(out, bytes);
Ok(()) }

The complement carries the CST data that the schema does not see. emit_wtype_preserving reconstructs the byte-for-byte original from (instance, complement). Constructors exist for each supported format: UnifiedCodec::json, xml, yaml, toml, csv, and tsv.

Verification

The byte equality is the verification. Property tests in CI check emit(parse(b)) == b against a corpus of real-world JSON, YAML, TOML, XML, and CSV files.

Common mistakes

  • Modifying the instance without modifying the complement. If you edit a value in the instance, the complement still records the old whitespace around the old value; the round-trip will preserve the old layout around the new value.
  • Mixing format-preserving codec output with non-preserving codec input. The two pipelines are separate; choose one consistently.

See also

Parse full ASTs

panproto can parse source code in 261 languages via tree-sitter and treat the full AST as a schema instance. The resulting instance can be queried, diffed, migrated, and version-controlled like any other schema.

Prerequisites

The schema CLI, or the Rust SDK with the full-parse feature enabled. For the Python SDK, the relevant grammar pack (the wheel ships eleven core languages; install panproto-grammars-functional, -web, -systems, etc. for more).

The task

Single file

schema parse file src/main.rs > main.ast.json

schema parse file <PATH> writes the AST as a JSON instance against an auto-derived GAT theory for the language to stdout. Redirect to a file as needed.

Whole project

schema parse project . > project.ast.json

Walks every recognised file in the project (default .), parses each with the appropriate grammar, and writes a single instance covering the whole project to stdout.

Round-trip a single file

schema parse emit src/main.rs

Parses then emits, useful for confirming a clean round-trip through the format-preserving codec.

From Rust

use panproto_core::parse::ParserRegistry;

fn main() -> Result<(), Box<dyn std::error::Error>> {
let registry = ParserRegistry::new();
let schema = registry.parse_with_protocol(
    "rust",
    std::fs::read("src/main.rs")?.as_slice(),
    "src/main.rs",
)?;
let _ = schema;
Ok(()) }

panproto_core::parse is the re-export of panproto-parse. ParserRegistry::new() populates with every grammar enabled at build time; for a specific file path, registry.parse_file(path, content) auto-detects the language by extension.

From Python

import panproto

reg = panproto.AstParserRegistry()
schema = reg.parse_with_protocol("python", source_bytes, "src/app.py")

Companion grammar packs install additional languages: pip install panproto-grammars-functional, -web, -systems, etc.

Read anonymous-token field values

A tree-sitter rule of the form field('<name>', choice('+', '-', '*', '/')) attaches a field name to an unnamed token alternative. The walker captures the matched token’s text as a field:<name> constraint on the parent vertex; Schema::field_text is the supported accessor:

schema = reg.parse_with_protocol("qvr", b"let y = log(x)", "demo.qvr")
let_call = next(v.id for v in schema.vertices if v.kind == "let_call")
schema.field_text(let_call, "func")   # -> "log"

The Rust equivalent is Schema::field_text(vertex_id, name) -> Option<&str>. Named-node field children continue to surface as edges; this accessor is specifically for the anonymous-token field case.

Override a registered grammar at runtime

Grammar authors iterating on a grammar’s parser.c / grammar.json / node-types.json outside the panproto release cadence can swap in a freshly-compiled grammar mid-process. Compile the grammar via tree-sitter build, load the resulting shared library with ctypes, and pass the integer address of the tree_sitter_<name> symbol to override_grammar:

import ctypes
import panproto

lib = ctypes.CDLL("./build/qvr.dylib")
language_ptr = ctypes.cast(lib.tree_sitter_qvr, ctypes.c_void_p).value

reg = panproto.AstParserRegistry()
reg.override_grammar(
    name="qvr",
    extensions=["qvr"],
    language_ptr=language_ptr,
    node_types=open("./grammars/qvr/src/node-types.json", "rb").read(),
    grammar_json=open("./grammars/qvr/src/grammar.json", "rb").read(),
)
schema = reg.parse_with_protocol("qvr", source_bytes, "demo.qvr")  # uses the new grammar

If a parser is already registered under name, it is dropped first (along with any extension mappings). Cannot run while a ParseEmitLens produced by reg.lens(...) is alive: drop outstanding lens handles, or construct a fresh registry, first. The byte payloads are leaked into 'static storage on the Rust side (intended for dev-time work, not production).

Going the other way

The schema you get back from parse_with_protocol carries a complete layout fibre: byte spans, the whitespace between every pair of adjacent tokens, and discriminators recording which CHOICE alternative the parser took at each branch point. The emitter consumes those constraints to render bytes back. A schema you build by hand from SchemaBuilder carries none of them; emit_pretty_with_protocol falls back to a grammar walk driven by the structural acceptance predicate, a layered cassette system, and per-position interstitial scoring (see Source-code emission for the mechanics). The grammar walk produces structurally valid output for the verified set and best-effort output for the rest.

For generators that build a schema from scratch and want to render it to source bytes, see Decorate an abstract schema. The decorate operation takes an AbstractSchema (the hand-built half), attaches the layout fibre via a grammar walk, and returns a DecoratedSchema the emitter can render byte-for-byte.

Verifying the emitter for a protocol

Before relying on emit_pretty_with_protocol in a downstream pipeline, ask the registry which tier the protocol falls into:

use panproto_parse::{EmitVerificationStatus, ParserRegistry};

fn main() -> Result<(), Box<dyn std::error::Error>> {
let reg = ParserRegistry::new();
match reg.emit_verification_status("python") {
    EmitVerificationStatus::Verified => { /* round-trips its full corpus */ }
    EmitVerificationStatus::Generic  => { /* registered, but unverified */ }
    EmitVerificationStatus::Unsupported => { /* not registered */ }
}
Ok(()) }

The 255 protocols currently in the Verified set are listed in crates/panproto-parse/src/registry.rs under VERIFIED_EMIT_PROTOCOLS. A protocol earns the tier by round-tripping its grammar author’s full test/corpus/ under the strict emit_corpus_audit oracle (emit(parse(emit(s))) == emit(s) plus vertex-kind and edge-shape multiset preservation on every entry), or by being pinned to a quivers transpile backend test.

Verification

Tree-sitter parsing is total: every byte sequence parses into some AST, with error nodes inserted around unparseable spans. The verified guarantee is a round-trip up to the vertex-kind and edge-shape multiset: emit(parse(bytes)) re-parses to the same abstract syntax tree, which check_parse_emit in panproto_parse::parse_emit_lens asserts. Interstitial preservation makes that emit reproduce the original whitespace byte-for-byte for most inputs, but some legitimately reformat (json re-indents arrays), so byte equality is not promised universally. schema parse emit <file> is the smoke test.

Common mistakes

  • Treating the AST as the source of truth for non-syntactic information. Type information, name resolution, control flow are not modelled by the auto-derived theories.
  • Assuming language coverage. The 261-language list is in crates/panproto-grammars/. Languages not in the list have no parser.

See also

Decorate an abstract schema

When you parse source code into a schema, the parse walker attaches layout data to every vertex: byte spans, the whitespace between adjacent tokens, and a discriminator recording which CHOICE alternative tree-sitter took. The emitter consumes those constraints to render source bytes back. A schema you build by hand from SchemaBuilder has none of them, so emit_pretty_with_protocol falls back to a grammar walk that may pick the wrong alternative or render nothing at all.

decorate fills the gap. Given an abstract schema (vertex kinds, child_of edges, leaf literal-value constraints) and a registered grammar, it attaches the full layout fibre and returns a DecoratedSchema the emitter can render.

Prerequisites

The Rust SDK with the full-parse feature, or the CLI. Python bindings are forthcoming.

The task

Build the abstract schema

use panproto_core::schema::{Protocol, SchemaBuilder};

fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p)
    .vertex("$0", "record", None)?
    .vertex("$1", "object", None)?
    .edge("$0", "$1", "record-schema", None)?
    .vertex("$2", "string", None)?
    .edge("$1", "$2", "prop", Some("title"))?
    .constraint("$2", "literal-value", "hello")
    .build_abstract()?;
Ok(()) }

build_abstract checks that no layout-fibre constraint was added during construction (no start-byte, no interstitial-N, no chose-alt-*) and returns an AbstractSchema. If a layout sort slipped in, you get SchemaError::LayoutConstraintsOnAbstractBuild; use build_decorated if a decorated schema was the intent.

Decorate

use panproto_core::parse::{LayoutPolicy, ParserRegistry};
use panproto_core::schema::{Protocol, SchemaBuilder};

fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy::default();
let decorated = reg.decorate("lilypond", &abstract_schema, &policy)?;
let _ = decorated;
Ok(()) }

decorate runs emit_pretty_with_policy to render the abstract schema to canonical bytes under the policy, then re-parses those bytes. The re-parse attaches the complete layout fibre: every start-byte, every end-byte, every interstitial-N, plus the chose-alt-fingerprint and chose-alt-child-kinds discriminators that pin down which CHOICE alternative the parser took.

Render straight to bytes

If all you want is the rendered source, skip the re-parse:

use panproto_core::parse::{LayoutPolicy, ParserRegistry};
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy::default();
let bytes = reg.pretty_with_protocol("lilypond", &abstract_schema, &policy)?;
let _ = bytes;
Ok(()) }

pretty_with_protocol honours every field of the policy in the output: separator, newline, indent_width, line_break_after, and the indent open/close token sets. Two different policies render the same abstract schema to different bytes.

Customise the policy

use panproto_core::parse::LayoutPolicy;
use panproto_core::parse::ParserRegistry;
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy {
    indent_width: 4,
    separator: "  ".into(),
    newline: "\r\n".into(),
    ..LayoutPolicy::default()
};
let bytes = reg.pretty_with_protocol("lilypond", &abstract_schema, &policy)?;
let _ = bytes;
Ok(()) }

LayoutPolicy is an alias for the de-novo emitter’s FormatPolicy; the put direction of the parse / emit lens and the emitter use the same configuration type.

Strip layout back down

use panproto_core::parse::{LayoutPolicy, ParserRegistry};
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy::default();
let decorated = reg.decorate("lilypond", &abstract_schema, &policy)?;
let stripped = decorated.forget_layout();   // -> AbstractSchema
let _ = stripped;
Ok(()) }

forget_layout is the schema-level forgetful U: it drops every constraint whose sort is in the layout fibre (per panproto_gat::is_layout_sort) and returns an AbstractSchema. The section law forget_layout(decorate(a)) ≅ a holds up to vertex-id renaming and kind/edge-multiset equivalence, which is the granularity panproto’s round-trip law machinery already uses.

Verification

The section-law smoke test in crates/panproto-parse/tests/decorate_section_law.rs walks a sample source through parse → forget_layout → decorate → forget_layout for every grammar with a parse fixture and asserts that both the vertex-kind multiset and the edge-shape multiset match before and after. Run it with:

cargo test -p panproto-parse --test decorate_section_law \
    --features lang-json,lang-lilypond

The matching test for policy fidelity (pretty_with_protocol_honours_policy) renders the same abstract schema under two distinct policies and asserts the output bytes differ in exactly the way the policy prescribes (CRLF vs LF newline, two-space vs single-space separator, four-space vs zero indent).

Common mistakes

  • Wrapping a parsed schema as an AbstractSchema and expecting decorate to keep its vertex IDs. The parse walker invents fresh IDs; the section law holds up to multiset equivalence, not pointwise. If you need the parse-side IDs preserved, work with the DecoratedSchema directly.
  • Passing an AbstractSchema built against one protocol into decorate for a different protocol. The protocol-match guard rejects the call with ParseError::SchemaConstruction; build the schema against the right protocol or look up the parser by the schema’s own protocol().
  • Reaching for DecoratedSchema::wrap_unchecked on a hand-built schema and expecting emit_pretty_with_protocol to round-trip through byte-position arithmetic. The wrapping is a type-level assertion the constructor cannot verify; an empty layout fibre means the emitter falls back to a grammar walk, which is well-defined but uses the default FormatPolicy, not whatever policy you’d have passed to decorate.
  • Calling decorate on a protocol that returns EmitVerificationStatus::Generic. decorate runs emit_pretty internally, so its output inherits whatever fidelity the emitter has on that grammar. Check ParserRegistry::emit_verification_status(protocol) first; if the result is Generic, the round-trip kind multiset still satisfies the section law, but byte-for-byte stability across re-emits is not guaranteed.

See also

Build a custom protocol

A custom protocol is a new GAT plus a parser/emitter pair, registered with panproto so it can participate in the same diff/migrate/version-control workflow as the built-ins.

Prerequisites

Familiarity with Schemas as theories and Composing protocols by colimit. The Rust toolchain.

The task

Declare the theory

The fastest path is the theory DSL (panproto-theory-dsl):

{
  name = "MyProto",
  components = ["ThGraph", "ThConstraint", "ThNamed"],
  extensions = {
    sorts = ["Permission"],
    ops = [
      { name = "perm_of", in = ["Edge"], out = "Permission" },
    ],
    eqns = [],
  },
}

components lists the building-block theories to compose by colimit. extensions adds sorts, operations, and equations on top of the colimit.

For finer control, declare the theory directly in Rust with the class! and inductive! macros from panproto-gat-macros.

From Python, the same DSL document loads via Theory.from_nickel(source), Theory.from_yaml(source), Theory.from_json(source), or Theory.from_path(path) (dispatches by extension). The loaders accept the theory, class, and inductive body variants; multi-body documents (morphism, composition, protocol, bundle) belong in panproto-theory-dsl::load_and_compile directly. For incremental authoring, panproto.TheoryBuilder mirrors class! in a chainable form. Round-trip the flat Theory shape via to_json / to_yaml paired with from_dict_json / from_dict_yaml.

Implement parser and emitter

Each protocol provides a Parser: Bytes -> Schema and an Emitter: Schema -> Bytes. Implement both in a new submodule of panproto-protocols. See existing modules (serialization::avro, data_schema::cddl, web_document::atproto) for canonical structure.

Register

Each protocol module exposes protocol() -> Protocol and register_theories(&mut HashMap<String, Theory, _>):

// crates/panproto-protocols/src/my_proto.rs
use std::collections::HashMap;
use panproto_gat::Theory;
use panproto_schema::Protocol;

use crate::theories;

pub fn protocol() -> Protocol {
    Protocol {
        name: "my_proto".into(),
        schema_theory: "ThMyProtoSchema".into(),
        instance_theory: "ThMyProtoInstance".into(),
        ..Protocol::default()
    }
}

pub fn register_theories<S: ::std::hash::BuildHasher>(
    registry: &mut HashMap<String, Theory, S>,
) {
    theories::register_constrained_multigraph_wtype(
        registry,
        "ThMyProtoSchema",
        "ThMyProtoInstance",
    );
}

Pick the theories::register_* helper that matches your theory shape (register_constrained_multigraph_wtype, register_typed_graph_wtype, register_hypergraph_functor, …). Colimit failures during registration panic with a named intermediate step; that is a build-time bug to fix in the theory composition. To expose my_proto to the language SDKs and CLI, add it to the lookup table in crates/panproto-py/src/protocols.rs and the lookup_builtin_protocol table in crates/panproto-wasm/src/api/helpers.rs.

Verification

cargo test -p panproto-protocols my_proto

The standard property-test suite for protocols verifies parse/emit round-trip, schema validation against the theory, and migration existence between two scaffolded schemas.

Common mistakes

  • Declaring extensions before the colimit components are correct. Extensions interact with the colimit structure; if a building-block step is wrong, the extension may not even reach registration.
  • Skipping the parser/emitter. Without both, the protocol cannot participate in schema convert or schema validate workflows.

See also

Schema version control

panproto-vcs is git for schemas: an immutable DAG of schema, migration, lens, and data objects with branches, tags, merges, and a content-addressed store. The CLI verbs match git: init, add, commit, branch, merge, log, diff.

The four pages here cover the practical workflows.

PagePurpose
Init and commitStart a repository, stage and commit changes.
Branch and mergeDiverge a feature branch, merge it back via pushout.
Version data alongside schemasCarry data instances in commits and lift them through migrations automatically.
Bridge to gitRun panproto-vcs alongside git, or as a custom git remote.

See also

Init and commit

Prerequisites

The schema CLI installed. A directory containing schema files (or a fresh directory you are about to populate).

The task

cd my-schemas/
schema init                 # create .panproto/
schema add user.json
schema commit -m "initial user schema"
schema log                  # show history

init creates a .panproto/ directory analogous to .git/. add stages a schema; commit creates a content-addressed snapshot. The behaviour matches git for the parts that overlap.

To inspect the current state:

schema status
schema diff               # diff working tree vs HEAD
schema show <commit-hash>

Verification

schema log --oneline

prints the commit history. Each commit shows its blake3 hash, schema hash, message, and author.

Common mistakes

  • Forgetting to schema add before schema commit. Like git, the staging area is explicit; commits only include staged changes.
  • Editing inside .panproto/objects/ directly. The store is content-addressed; manual edits break the hash invariants.

See also

Branch and merge

Prerequisites

A panproto repository with at least one commit (Init and commit).

The task

schema branch feature/add-handle
schema checkout feature/add-handle
# edit the schema, add a `handle` field
schema add user.json
schema commit -m "add handle"

schema checkout main
schema merge feature/add-handle

merge runs the pushout construction over the schema graphs at HEAD, the merging branch’s HEAD, and their common ancestor. The result is the smallest schema containing both branches’ changes.

If the pushout’s universal property fails (because the two branches’ changes contradict on a shared substructure), merge raises UniversalFactorizationFailure and produces a conflict object instead of an unsafe result.

To resolve a conflict:

schema status              # lists conflict objects
schema show <conflict>     # inspect the contradiction
# edit the conflict descriptor
schema add <conflict>
schema commit -m "resolve conflict"

Verification

schema log --graph

shows the DAG with the merge commit visible. The merge commit has two parents (HEAD and the merged branch’s HEAD).

Common mistakes

  • Editing schemas while on the wrong branch. schema status always shows the current branch; check it before editing.
  • Treating conflict objects as text. They are structured descriptions of an irreconcilable categorical equation; resolution is by selecting one side or extending the schema, not by editing JSON in place.

See also

Version data alongside schemas

panproto-vcs commits can carry data instances. When a branch’s schema migrates, the data carried by its commits is automatically lifted forward by the migration’s lens. This makes data evolution part of the schema history rather than a parallel concern.

Prerequisites

A panproto repository with at least one schema and a corresponding data instance.

The task

Data is staged together with its schema via schema add --data:

schema add user.json --data records/
schema commit -m "v1 schema and seed data"

# Evolve the schema and re-stage with the same data directory.
schema add user-v2.json --data records/
schema commit -m "v2 schema"

# Sync the working data directory to the latest schema via the auto-derived chain.
schema data sync records/

schema add --data <DATA> stages a data directory alongside the schema. schema data sync lifts the on-disk data forward through any schema migrations between the recorded commit and the target ref (default HEAD). To preview without writing:

schema data migrate records/ --dry-run

schema data migrate runs the migration between two specific commits (default parent..HEAD); add --coverage to print statistics. Inspect data status:

schema status --data records/
schema data status records/

To extract historical data, check out the commit:

schema checkout <commit>
# read records/

Verification

schema data status records/

reports staleness relative to the current schema. A clean status means the data conforms.

schema data migrate --coverage records/ prints how much of the data was actually transformed by the lift, surfacing partial migrations.

Common mistakes

  • Editing data inside the store directly. Like schemas, data objects are content-addressed.
  • Skipping data when committing schema changes. If you commit a v2 schema without ever staging v1 data, there is nothing for the lens to lift; this is fine, but the v2 commit will have no data even though v1 might.
  • History rewrites (rebase, amend) on a branch carrying data. The rewrite must lift the data through the new history; this is automatic, but re-run schema data status records/ afterwards to confirm the on-disk data is in sync.

See also

Bridge to git

panproto-vcs is independent of git, but most projects host their source in git. The bridge lets a panproto-vcs repository live alongside (or inside) a git one, and lets git remotes serve panproto histories.

Prerequisites

A git repository (the host project) and either a .panproto/ directory inside it or a sibling panproto repository.

The task

Sidecar mode

The simplest setup: keep .panproto/ next to .git/. Both are tracked by their own tools; commits to either are independent.

git add .panproto/
git commit -m "snapshot panproto state"

panproto-vcs’s content-addressed objects are deterministic, so storing them in git works (no merge conflicts inside .panproto/objects/).

Bidirectional translation

schema git import reads a git repository’s history and produces a corresponding panproto-vcs DAG; schema git export does the reverse:

schema git import path/to/git-repo HEAD
schema git export path/to/output-git-repo --repo .

The import range can be any git revspec (HEAD, main, HEAD~10..HEAD). The export takes the current panproto repository and writes a git mirror.

git-remote helper

The panproto-git-remote crate registers a custom git remote helper that exposes a panproto-vcs repository to git clients (so git clone panproto://path/to/repo works). The helper is separate from schema git; install the binary and configure git accordingly. See crates/panproto-git-remote.

Merge bridging

A git merge that touches .panproto/objects/ will not preserve panproto’s structural merge guarantees, since git merges the bytes. The supported pattern is to do schema-level work in panproto and then schema git export the result. There is no automatic git-merge-to-pushout translation in the CLI; the merge must originate in schema merge to get the universal-property-checked pushout.

Verification

schema status
git status

Both should be clean. A clean panproto status with a dirty git status is normal (you may have untracked files git knows about that panproto does not). The reverse is not.

Common mistakes

  • Three-way text-merging .panproto/objects/ files. The store is content-addressed; the bytes are correct or wrong, never partially. Resolve via schema rebase or by re-running the merge inside panproto with schema merge, then re-export to git with schema git export.
  • Mixing the two modes. Choose sidecar or remote-bridge per project; mixing creates ambiguity about which DAG is the source of truth.

See also

Translate across protocols

Cross-protocol translation moves schemas between schema languages: JSON Schema to Protobuf, ATProto Lexicon to GraphQL SDL, and so on. Translation is a migration whose source and target theories belong to different protocols.

Prerequisites

Two protocols whose theories overlap on enough building-block theories that a colimit-mediated translation exists. The CLI or one of the SDKs.

The task

Cross-protocol translation runs through the colimit-of-theories construction in panproto-protocols. There is no single CLI subcommand that takes a source-protocol schema and emits a target-protocol schema directly; instead, the workflow is:

  1. Compose the theory. Use the theory DSL to declare a composition theory whose colimit covers both protocols, or rely on the built-in compositions for the protocol pair you want.
  2. Generate a lens between schemas in the composed theory. schema lens generate produces the chain; both schemas must be expressed against the composed theory.
  3. Apply the lens to convert data.

The theory DSL exposes a small library of named building-block theories via builtin_resolver(): ThGraph, ThConstraint, ThMulti, ThWType, ThMeta, ThSimpleGraph, ThHypergraph, ThInterface, ThFunctor, ThFlat, ThGraphInstance. Whole-protocol theories (the per-protocol GATs that JSON Schema, Protobuf, ATProto, etc. compile to) are constructed in Rust inside panproto-protocols, not through the DSL.

The DSL composition shape is compose = { result, bases, steps }, with each step a ColimitStepSpec { left, right, shared_sorts, shared_ops? }. A toy composition over building blocks:

{
  id = "dev.example.constrained-multigraph",
  description = "Compose ThGraph, ThConstraint, and ThMulti by identifying their Vertex and Edge sorts",
  compose = {
    result = "ConstrainedMultigraph",
    bases = ["ThGraph", "ThConstraint", "ThMulti"],
    steps = [
      { left = "ThGraph", right = "ThConstraint", shared_sorts = ["Vertex", "Edge"] },
      # Reference the prior step's intermediate result by its generated
      # name (`step_<i>`); the final intermediate is renamed to `result`.
      { left = "step_0", right = "ThMulti", shared_sorts = ["Vertex", "Edge"] },
    ],
  },
}

Compile it:

schema theory compile theories/constrained-multigraph.ncl

For cross-protocol translation between two existing built-in protocols (say, JSON Schema and Protobuf), the composition is in Rust inside crates/panproto-protocols. Once both protocols are registered, the actual CLI flow is:

# Generate a chain between two schemas, one in each protocol; both schemas
# must be expressible against the same registered protocol theory.
schema lens generate --protocol <protocol> \
  schemas/user-a.json schemas/user-b.json \
  --save lenses/a-to-b.json

# Apply.
schema lens apply --protocol <protocol> lenses/a-to-b.json data/user.json

For data conversion within a single protocol’s schema fleet (different schemas, same protocol), use schema data convert:

schema data convert --protocol atproto \
  --from schemas/user-v1.json --to schemas/user-v2.json \
  data/users/

Verification

schema lens verify --protocol <name> <data> <schema> checks the round-trip laws on the converted data. A clean run means the chain is loss-free for the given samples.

Common mistakes

  • Reaching for a one-shot CLI conversion. The colimit composition step is essential for cross-protocol work; without it, schema lens generate has no shared theory to align the schemas against.
  • Translating between protocols with non-overlapping required structure. The lens auto-generation will be partial, and apply will fail on records using the source-only structure. Distant protocols (say, FHIR to MongoDB) may require a hand-written chain on top of the auto-derived skeleton.
  • Assuming auto-derived translations preserve every constraint. Constraints expressible in one theory but not the other are dropped; schema lens verify flags this.

See also

Continuous integration

CI for panproto projects covers two questions: did this change validate, and did this change break compatibility?

PagePurpose
Breaking-change gateBlock PRs that introduce a breaking schema change unless explicitly acknowledged.
GitHub ActionsDrop-in workflows for schema validation and breaking-change detection.
Pre-commit hooksRun schema validation before each local commit.

See also

Breaking-change gate

A breaking-change gate is a CI step that fails the build when a PR introduces a schema change panproto classifies as breaking. It runs against the diff between the PR’s schema and the schema at main.

Prerequisites

A panproto repository under git. A CI system that can run shell commands.

The task

schema check enforces existence conditions; it does not classify. Use schema diff (which reports structural changes) plus schema lens generate (which fails when no chain exists) and combine into a gate:

# In your CI script.
git fetch origin main
git show origin/main:schemas/user.json > /tmp/user-base.json

# Existence + GAT-level type check.
schema check \
  --src /tmp/user-base.json \
  --tgt schemas/user.json \
  --mapping migrations/user.json \
  --typecheck

# Generate a chain; non-zero exit means the change is breaking
# (no auto-generated lens covers it).
schema lens generate --protocol atproto /tmp/user-base.json schemas/user.json --save /tmp/chain.json

Either step’s non-zero exit fails the build. To allow an explicit override, gate on a commit-message marker or a PR label:

if git log -1 --format=%B | grep -q '\[breaking-change-acknowledged\]'; then
  schema check ... --typecheck || true
  schema lens generate ... || true
else
  schema check ... --typecheck
  schema lens generate ...
fi

For richer classification, use the SDK: in Python, panproto.diff_and_classify(old, new, protocol) returns a CompatReport; in Rust, call panproto_check::diff(...) followed by panproto_check::classify(&diff, &protocol); in TypeScript, panproto.diffFull(old, new).classify(protocol) returns the same report. Each carries a classification (fully_compatible, backward_compatible, or breaking) along with the offending elements.

Verification

Open a PR that adds a backward-compatible field; the gate passes. Open a PR that drops a required field; the gate fails. Add [breaking-change-acknowledged] to the commit message; the gate passes with a warning.

Common mistakes

  • Skipping --typecheck. Without it, the gate misses GAT-level type errors that are also breaking changes.
  • Comparing against the wrong base. The gate must compare against the merge base of the PR and main, not against main’s tip; otherwise force-pushes to main make every PR look breaking.

See also

GitHub Actions

A drop-in workflow for schema validation and breaking-change detection on every pull request.

Prerequisites

A panproto repository on GitHub. The panproto-cli Homebrew tap, shell installer, or cargo install available on Linux runners.

The task

# .github/workflows/panproto.yml
name: panproto

on:
  pull_request:
    paths:
      - 'schemas/**'
      - 'panproto.toml'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0           # need history for the diff base

      - name: Install schema
        run: |
          curl --proto '=https' -LsSf \
            https://github.com/panproto/panproto/releases/latest/download/panproto-cli-installer.sh | sh
          echo "$HOME/.cargo/bin" >> "$GITHUB_PATH"

      - name: Validate
        run: |
          for f in schemas/*.json; do
            schema validate --protocol atproto "$f"
          done

      - name: Breaking-change gate
        run: |
          base=$(git merge-base origin/${{ github.base_ref }} HEAD)
          git show $base:schemas/user.json > /tmp/base.json
          schema check --src /tmp/base.json --tgt schemas/user.json \
            --mapping migrations/user.json --typecheck
          schema lens generate --protocol atproto /tmp/base.json schemas/user.json \
            --save /tmp/chain.json

The --protocol flag is required for every per-file schema validate. There is no --project flag; per-file iteration is the supported pattern.

Two jobs: validation and the breaking-change gate. Validation fails on a malformed schema; the gate fails on a breaking diff.

Verification

Push a PR and watch the workflow run. The validation step exits zero on a clean schema; the gate prints the classification and exits non-zero on breaking.

Common mistakes

  • Omitting fetch-depth: 0. Shallow clones make the merge-base lookup fail; the gate then runs against the wrong base.
  • Pinning panproto-cli to a stale version. Use the latest installer; the protocol catalogue and existence-check rules evolve.

See also

Pre-commit hooks

A pre-commit hook runs schema validation before each git commit, so you catch malformed schemas before they enter the history. Optionally, it can also warn about breaking changes.

Prerequisites

The schema CLI installed. A git repository.

The task

Plain git hook

# .git/hooks/pre-commit
#!/usr/bin/env bash
set -euo pipefail

changed=$(git diff --cached --name-only --diff-filter=ACM | grep -E '^schemas/.*\.json$' || true)
[ -z "$changed" ] && exit 0

for f in $changed; do
  schema validate --protocol atproto "$f"
done

# Optional breaking-change warning: try to auto-generate a chain
# between the staged version and the upstream copy. Failure suggests
# the change is breaking.
for f in $changed; do
  base_blob=$(git show :@{u}:"$f" 2>/dev/null) || continue
  echo "$base_blob" > /tmp/base.json
  if ! schema lens generate --protocol atproto /tmp/base.json "$f" --save /tmp/chain.json 2>/dev/null; then
    echo "warning: $f introduces a breaking change (commit anyway? Ctrl-C to abort)"
  fi
done

chmod +x .git/hooks/pre-commit installs it.

With pre-commit framework

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: schema-validate
        name: panproto schema validate
        entry: schema validate --protocol atproto
        language: system
        files: '^schemas/.*\.json$'

The hook receives the staged file paths as positional arguments; the --protocol flag is required.

pre-commit install activates it.

Verification

Stage a malformed schema and try to commit; the hook rejects. Fix the schema and commit again; the hook passes.

Common mistakes

  • Hook stalls on missing schema binary. Wrap the invocation in a command -v schema || exit 0 to fall back gracefully on machines without the CLI.
  • Validating every file on every commit. The script above scopes to staged schemas/*.json only; broader scopes are noisy.

See also

Reference

Reference pages are authoritative lookups: signatures, flags, tables, and grammars. They contain no exposition. If you want to understand the model behind these signatures, follow the per-page link to the corresponding explanation chapter.

PageWhat it lists
CLIEvery schema subcommand, with usage, options, and examples. Generated from the clap derive tree; never stale.
Rust SDKCrate selection, feature flags, public re-exports, and the canonical entry points. Links into docs.rs for full type signatures.
TypeScript SDKPackage layout, initialization, the high-level facade, and the handle-based API. Links into TypeDoc.
Python SDKPyO3 native module surface, top-level re-exports, and the companion grammar packs. Links into the mkdocs reference site.
Protocol catalogueEvery protocol panproto recognises, with its theory composition and supported operations.
Expression languageSurface grammar, builtins by category, and the type signatures of each builtin.
Lens combinatorsThe combinator algebra exposed by panproto-lens, organised by optic kind.
ConfigurationThe panproto.toml manifest schema.
Crate mapThe 38 panproto-* crates in the workspace, with one-line descriptions and dependency direction.

CLI reference

Every schema subcommand, with its full --help text. This page is regenerated from the live binary by xtask/src/bin/gen-cli-docs.rs; edit the CLI, not the page.

To regenerate locally:

cargo run -p xtask --bin gen-cli-docs

CI runs the same command and fails if the result differs from what is checked in.

For the model that the commands operate on, see Schemas as theories, Migrations as morphisms, and Schema version control semantics.

schema

Schematic version control: schema migration toolkit based on generalized algebraic theories

Usage: schema [OPTIONS] <COMMAND>

Commands:
  validate      Validate a schema against a protocol
  check         Check existence conditions for a migration between two schemas
  scaffold      Generate minimal test data from a protocol theory using free model construction
  normalize     Simplify a schema by merging equivalent elements
  typecheck     Type-check a migration between two schemas at the GAT level
  verify        Verify that a schema satisfies its protocol theory's equations
  init          Initialize a new panproto repository
  add           Stage a schema for the next commit
  commit        Create a new commit from staged changes
  status        Show repository status
  log           Show commit history
  diff          Diff two schemas or show staged changes
  show          Inspect a commit, schema, or migration object
  branch        Create, list, or delete branches
  tag           Create, list, or delete tags
  checkout      Switch to a branch or commit
  merge         Merge a branch into the current branch
  rebase        Replay current branch onto another
  cherry-pick   Apply a single commit's migration to the current branch
  reset         Move HEAD / unstage / restore
  stash         Save or restore working state
  reflog        Show ref mutation history
  bisect        Binary search for the commit that introduced a breaking change
  blame         Show which commit introduced a schema element
  lift          Apply a migration to a record, transforming it from source to target schema
  integrate     Integrate two schemas by computing their pushout
  auto-migrate  Automatically discover a migration between two schemas
  gc            Garbage collect unreachable objects
  expr          Evaluate, type-check, or interactively explore GAT expressions
  enrich        Add, list, or remove schema enrichments (defaults, coercions, mergers, policies)
  remote        Add, list, or remove remote repositories
  push          Push schemas to a remote repository
  pull          Pull schemas from a remote repository
  fetch         Fetch schemas from a remote repository
  clone         Clone a remote repository
  data          Data operations: migrate, convert, sync, and status
  theory        Theory DSL operations: define theories, morphisms, and protocols from data files
  lens          Bidirectional lens operations
  parse         Parse source files into full-AST schemas via tree-sitter
  git           Import/export between git repositories and panproto-vcs
  help          Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help
  -V, --version  Print version

schema validate

Validate a schema against a protocol

Usage: schema validate [OPTIONS] --protocol <PROTOCOL> <SCHEMA>

Arguments:
  <SCHEMA>  Path to the schema JSON file

Options:
      --protocol <PROTOCOL>  The protocol name (e.g., "atproto")
  -v, --verbose              Enable verbose output
  -h, --help                 Print help

schema check

Check existence conditions for a migration between two schemas

Usage: schema check [OPTIONS] --src <SRC> --tgt <TGT> --mapping <MAPPING>

Options:
      --src <SRC>          Path to the source schema JSON file
  -v, --verbose            Enable verbose output
      --tgt <TGT>          Path to the target schema JSON file
      --mapping <MAPPING>  Path to the migration mapping JSON file
      --typecheck          Also type-check the migration morphism at the GAT level
  -h, --help               Print help

schema scaffold

Generate minimal test data from a protocol theory using free model construction

Usage: schema scaffold [OPTIONS] --protocol <PROTOCOL> <SCHEMA>

Arguments:
  <SCHEMA>  Path to the schema JSON file

Options:
      --protocol <PROTOCOL>    The protocol name (e.g., "atproto")
  -v, --verbose                Enable verbose output
      --depth <DEPTH>          Maximum term generation depth (default: 3) [default: 3]
      --max-terms <MAX_TERMS>  Maximum terms per sort (default: 1000) [default: 1000]
      --json                   Output as JSON
  -h, --help                   Print help

schema normalize

Simplify a schema by merging equivalent elements

Usage: schema normalize [OPTIONS] --protocol <PROTOCOL> <SCHEMA>

Arguments:
  <SCHEMA>  Path to the schema JSON file

Options:
      --protocol <PROTOCOL>         The protocol name (e.g., "atproto")
  -v, --verbose                     Enable verbose output
      --identify <IDENTIFICATIONS>  Pairs of elements to identify, as "A=B"
      --json                        Output as JSON
  -h, --help                        Print help

schema typecheck

Type-check a migration between two schemas at the GAT level

Usage: schema typecheck [OPTIONS] --src <SRC> --tgt <TGT> --migration <MIGRATION>

Options:
      --src <SRC>              Path to the source schema JSON file
  -v, --verbose                Enable verbose output
      --tgt <TGT>              Path to the target schema JSON file
      --migration <MIGRATION>  Path to the migration mapping JSON file
  -h, --help                   Print help

schema verify

Verify that a schema satisfies its protocol theory's equations

Usage: schema verify [OPTIONS] --protocol <PROTOCOL> <SCHEMA>

Arguments:
  <SCHEMA>  Path to the schema JSON file

Options:
      --protocol <PROTOCOL>
          The protocol name (e.g., "atproto")
  -v, --verbose
          Enable verbose output
      --max-assignments <MAX_ASSIGNMENTS>
          Maximum assignments to check per equation (default: 10000) [default: 10000]
  -h, --help
          Print help

schema init

Initialize a new panproto repository

Usage: schema init [OPTIONS] [PATH]

Arguments:
  [PATH]  Directory to initialize (defaults to current dir) [default: .]

Options:
  -b, --initial-branch <INITIAL_BRANCH>  Use the given name for the initial branch
  -v, --verbose                          Enable verbose output
  -h, --help                             Print help

schema add

Stage a schema for the next commit

Usage: schema add [OPTIONS] <SCHEMA>

Arguments:
  <SCHEMA>  Path to the schema JSON file

Options:
  -n, --dry-run      Show what would be staged without actually staging
  -v, --verbose      Enable verbose output
  -f, --force        Force staging even if validation fails
      --data <DATA>  Stage data files alongside the schema
  -h, --help         Print help

schema commit

Create a new commit from staged changes

Usage: schema commit [OPTIONS] --message <MESSAGE>

Options:
  -m, --message <MESSAGE>  Commit message
  -v, --verbose            Enable verbose output
      --author <AUTHOR>    Author name [default: anonymous]
      --amend              Amend the previous commit instead of creating a new one
      --allow-empty        Allow creating a commit with no changes
      --skip-verify        Skip GAT equation verification
  -h, --help               Print help

schema status

Show repository status

Usage: schema status [OPTIONS]

Options:
  -s, --short        Show output in short format
  -v, --verbose      Enable verbose output
      --porcelain    Show output in machine-readable format
  -b, --branch       Show branch information
      --data <DATA>  Show data staleness for files in this directory
  -h, --help         Print help

schema log

Show commit history

Usage: schema log [OPTIONS]

Options:
  -n, --limit <LIMIT>    Maximum number of commits to show
  -v, --verbose          Enable verbose output
      --oneline          Show each commit on a single line
      --graph            Show an ASCII graph of the branch structure
      --all              Show all branches, not just the current one
      --format <FORMAT>  Pretty-print commits using a format string
      --author <AUTHOR>  Filter commits by author
      --grep <GREP>      Filter commits whose message matches a pattern
      --data             Show data and complement IDs in commit history
  -h, --help             Print help

schema diff

Diff two schemas or show staged changes

Usage: schema diff [OPTIONS] [OLD] [NEW]

Arguments:
  [OLD]  Path to the old schema (or first ref)
  [NEW]  Path to the new schema (or second ref)

Options:
      --stat            Show a diffstat summary
  -v, --verbose         Enable verbose output
      --name-only       Show only names of changed elements
      --name-status     Show names and status (A/D/M) of changed elements
      --staged          Diff the staged schema against HEAD
      --detect-renames  Detect likely renames between schemas
      --theory          Show theory-level diff (sorts, operations, equations)
      --lens            Also generate a protolens chain between the schemas
      --save <SAVE>     Save the protolens chain to a file (requires --lens)
      --optic-kind      Show the optic classification of the diff
  -h, --help            Print help

schema show

Inspect a commit, schema, or migration object

Usage: schema show [OPTIONS] <TARGET>

Arguments:
  <TARGET>  Ref name or object ID

Options:
      --format <FORMAT>  Pretty-print using a format string
  -v, --verbose          Enable verbose output
      --stat             Show a diffstat summary for commits
  -h, --help             Print help

schema branch

Create, list, or delete branches

Usage: schema branch [OPTIONS] [NAME]

Arguments:
  [NAME]  Branch name to create. Lists branches if omitted

Options:
  -d, --delete         Delete the branch
  -D                   Force-delete the branch even if not fully merged
  -f, --force          Force overwrite if branch already exists
  -m, --move <RENAME>  Rename a branch (value is the new name)
  -v, --verbose        Show commit info for each branch
  -a, --all            List both local and remote-tracking branches
  -h, --help           Print help

schema tag

Create, list, or delete tags

Usage: schema tag [OPTIONS] [NAME]

Arguments:
  [NAME]  Tag name to create. Lists tags if omitted

Options:
  -d, --delete             Delete the tag
  -v, --verbose            Enable verbose output
  -a, --annotate           Create an annotated tag
  -m, --message <MESSAGE>  Tag message (implies --annotate)
  -l, --list               List tags matching a pattern
  -f, --force              Force-replace an existing tag
  -h, --help               Print help

schema checkout

Switch to a branch or commit

Usage: schema checkout [OPTIONS] <TARGET>

Arguments:
  <TARGET>  Branch name or commit ID

Options:
  -b                       Create a new branch with the given name at HEAD and switch to it
  -v, --verbose            Enable verbose output
      --detach             Detach HEAD at the target commit
      --migrate <MIGRATE>  Migrate data in this directory to match the target branch's schema
  -h, --help               Print help

schema merge

Merge a branch into the current branch

Usage: schema merge [OPTIONS] [BRANCH]

Arguments:
  [BRANCH]  Branch to merge

Options:
      --author <AUTHOR>    Author name [default: anonymous]
      --no-commit          Perform the merge but do not commit
      --ff-only            Refuse to merge unless fast-forward is possible
      --no-ff              Create a merge commit even for fast-forward merges
      --squash             Squash the branch into a single change set
      --abort              Abort an in-progress merge
  -m, --message <MESSAGE>  Custom merge commit message
  -v, --verbose            Show pullback-based overlap detection details
      --migrate <MIGRATE>  Migrate data in this directory through the merge
  -h, --help               Print help

schema rebase

Replay current branch onto another

Usage: schema rebase [OPTIONS] [ONTO]

Arguments:
  [ONTO]  Branch or commit to rebase onto

Options:
      --author <AUTHOR>  Author name [default: anonymous]
  -v, --verbose          Enable verbose output
      --abort            Abort the current rebase operation
      --cont             Continue a paused rebase after resolving conflicts
  -h, --help             Print help

schema cherry-pick

Apply a single commit's migration to the current branch

Usage: schema cherry-pick [OPTIONS] [COMMIT]

Arguments:
  [COMMIT]  Commit ID to cherry-pick

Options:
      --author <AUTHOR>  Author name [default: anonymous]
  -v, --verbose          Enable verbose output
  -n, --no-commit        Apply the change without committing
  -x                     Append "(cherry picked from commit ...)" to the message
      --abort            Abort the current cherry-pick operation
  -h, --help             Print help

schema reset

Move HEAD / unstage / restore

Usage: schema reset [OPTIONS] <TARGET>

Arguments:
  <TARGET>  Target ref or commit ID

Options:
      --soft             Soft reset: move HEAD only, keep staged and working changes
  -v, --verbose          Enable verbose output
      --hard             Hard reset: move HEAD, discard all changes
      --author <AUTHOR>  Author name [default: anonymous]
  -h, --help             Print help

schema stash

Save or restore working state

Usage: schema stash [OPTIONS] <COMMAND>

Commands:
  push   Save the current staged schema
  pop    Restore the most recent stash
  list   List all stash entries
  drop   Drop the most recent stash
  apply  Apply a stash entry without removing it
  show   Show the contents of a stash entry
  clear  Remove all stash entries
  help   Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema stash push

Save the current staged schema

Usage: schema stash push [OPTIONS]

Options:
  -m, --message <MESSAGE>  Optional stash message
  -v, --verbose            Enable verbose output
      --author <AUTHOR>    Author name [default: anonymous]
  -h, --help               Print help

schema stash pop

Restore the most recent stash

Usage: schema stash pop [OPTIONS]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema stash list

List all stash entries

Usage: schema stash list [OPTIONS]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema stash drop

Drop the most recent stash

Usage: schema stash drop [OPTIONS]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema stash apply

Apply a stash entry without removing it

Usage: schema stash apply [OPTIONS] [INDEX]

Arguments:
  [INDEX]  Stash index to apply [default: 0]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema stash show

Show the contents of a stash entry

Usage: schema stash show [OPTIONS] [INDEX]

Arguments:
  [INDEX]  Stash index to inspect [default: 0]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema stash clear

Remove all stash entries

Usage: schema stash clear [OPTIONS]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema reflog

Show ref mutation history

Usage: schema reflog [OPTIONS] [REF_NAME]

Arguments:
  [REF_NAME]  Ref name (defaults to HEAD) [default: HEAD]

Options:
  -n, --limit <LIMIT>  Maximum entries to show
  -v, --verbose        Enable verbose output
      --all            Show reflogs for all refs
  -h, --help           Print help

schema bisect

Binary search for the commit that introduced a breaking change

Usage: schema bisect [OPTIONS] <GOOD> <BAD>

Arguments:
  <GOOD>  Known good commit
  <BAD>   Known bad commit

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema blame

Show which commit introduced a schema element

Usage: schema blame [OPTIONS] --element-type <ELEMENT_TYPE> <ELEMENT_ID>

Arguments:
  <ELEMENT_ID>  Element identifier (vertex ID, edge `"src->tgt"`, or `"vertex_id:sort"`)

Options:
      --element-type <ELEMENT_TYPE>  Element type: vertex, edge, or constraint
  -v, --verbose                      Enable verbose output
      --reverse                      Walk history from the first commit forward
  -h, --help                         Print help

schema lift

Apply a migration to a record, transforming it from source to target schema

Usage: schema lift [OPTIONS] --migration <MIGRATION> --src-schema <SRC_SCHEMA> --tgt-schema <TGT_SCHEMA> <RECORD>

Arguments:
  <RECORD>  Path to the record JSON file

Options:
      --migration <MIGRATION>          Path to the migration mapping JSON file
  -v, --verbose                        Enable verbose output
      --src-schema <SRC_SCHEMA>        Path to the source schema JSON file
      --tgt-schema <TGT_SCHEMA>        Path to the target schema JSON file
      --direction <DIRECTION>          Migration direction: restrict (default, `Delta_F`), sigma (`Sigma_F`), or pi (`Pi_F`) [default: restrict]
      --instance-type <INSTANCE_TYPE>  Instance type: wtype (default) or functor [default: wtype]
  -h, --help                           Print help

schema integrate

Integrate two schemas by computing their pushout

Usage: schema integrate [OPTIONS] <LEFT> <RIGHT>

Arguments:
  <LEFT>   Path to the left schema JSON file
  <RIGHT>  Path to the right schema JSON file

Options:
      --auto-overlap  Automatically discover the overlap between schemas
  -v, --verbose       Enable verbose output
      --json          Output the integrated schema as JSON
  -h, --help          Print help

schema auto-migrate

Automatically discover a migration between two schemas

Usage: schema auto-migrate [OPTIONS] <OLD> <NEW>

Arguments:
  <OLD>  Path to the old/source schema JSON file
  <NEW>  Path to the new/target schema JSON file

Options:
      --monic    Require injective (one-to-one) vertex mapping
  -v, --verbose  Enable verbose output
      --json     Output the migration as JSON
  -h, --help     Print help

schema gc

Garbage collect unreachable objects

Usage: schema gc [OPTIONS]

Options:
      --dry-run  Show what would be deleted without actually deleting
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema expr

Evaluate, type-check, or interactively explore GAT expressions

Usage: schema expr [OPTIONS] <COMMAND>

Commands:
  gat-eval   Evaluate a JSON-encoded GAT term from a file
  gat-check  Type-check a JSON-encoded GAT term from a file
  repl       Interactive expression REPL
  parse      Parse a Haskell-style expression and print its AST
  eval       Parse and evaluate a Haskell-style expression, printing the result
  fmt        Parse an expression and pretty-print it back in canonical form
  check      Parse an expression and report any syntax errors
  help       Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema expr gat-eval

Evaluate a JSON-encoded GAT term from a file

Usage: schema expr gat-eval [OPTIONS] <FILE>

Arguments:
  <FILE>  Path to the JSON file containing a GAT term

Options:
      --env <ENV>  Path to a JSON file with variable bindings
  -v, --verbose    Enable verbose output
  -h, --help       Print help

schema expr gat-check

Type-check a JSON-encoded GAT term from a file

Usage: schema expr gat-check [OPTIONS] <FILE>

Arguments:
  <FILE>  Path to the JSON file containing term, theory, and context

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema expr repl

Interactive expression REPL

Usage: schema expr repl [OPTIONS]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema expr parse

Parse a Haskell-style expression and print its AST

Usage: schema expr parse [OPTIONS] <SOURCE>

Arguments:
  <SOURCE>  Expression source text

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema expr eval

Parse and evaluate a Haskell-style expression, printing the result

Usage: schema expr eval [OPTIONS] <SOURCE>

Arguments:
  <SOURCE>  Expression source text

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema expr fmt

Parse an expression and pretty-print it back in canonical form

Usage: schema expr fmt [OPTIONS] <SOURCE>

Arguments:
  <SOURCE>  Expression source text

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema expr check

Parse an expression and report any syntax errors

Usage: schema expr check [OPTIONS] <SOURCE>

Arguments:
  <SOURCE>  Expression source text

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema enrich

Add, list, or remove schema enrichments (defaults, coercions, mergers, policies)

Usage: schema enrich [OPTIONS] <COMMAND>

Commands:
  add-default   Add a default value expression to a vertex
  add-coercion  Add a coercion expression between two vertex kinds
  add-merger    Add a merger expression to a vertex
  add-policy    Add a conflict policy to a vertex
  list          List all enrichments on the HEAD schema
  remove        Remove an enrichment by name
  help          Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema enrich add-default

Add a default value expression to a vertex

Usage: schema enrich add-default [OPTIONS] --expr <EXPR> <VERTEX>

Arguments:
  <VERTEX>  Vertex name

Options:
      --expr <EXPR>  Default value as JSON
  -v, --verbose      Enable verbose output
  -h, --help         Print help

schema enrich add-coercion

Add a coercion expression between two vertex kinds

Usage: schema enrich add-coercion [OPTIONS] --expr <EXPR> <FROM> <TO>

Arguments:
  <FROM>  Source vertex kind
  <TO>    Target vertex kind

Options:
      --expr <EXPR>  Coercion expression as JSON
  -v, --verbose      Enable verbose output
  -h, --help         Print help

schema enrich add-merger

Add a merger expression to a vertex

Usage: schema enrich add-merger [OPTIONS] --expr <EXPR> <VERTEX>

Arguments:
  <VERTEX>  Vertex name

Options:
      --expr <EXPR>  Merger specification as JSON
  -v, --verbose      Enable verbose output
  -h, --help         Print help

schema enrich add-policy

Add a conflict policy to a vertex

Usage: schema enrich add-policy [OPTIONS] --strategy <STRATEGY> <VERTEX>

Arguments:
  <VERTEX>  Vertex name

Options:
      --strategy <STRATEGY>  Conflict resolution strategy name
  -v, --verbose              Enable verbose output
  -h, --help                 Print help

schema enrich list

List all enrichments on the HEAD schema

Usage: schema enrich list [OPTIONS]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema enrich remove

Remove an enrichment by name

Usage: schema enrich remove [OPTIONS] <NAME>

Arguments:
  <NAME>  Enrichment name or vertex name to remove enrichments from

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema remote

Add, list, or remove remote repositories

Usage: schema remote [OPTIONS] <COMMAND>

Commands:
  add     Register a new remote
  remove  Remove a remote
  list    List configured remotes
  help    Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema remote add

Register a new remote

Usage: schema remote add [OPTIONS] <NAME> <URL>

Arguments:
  <NAME>  Remote name
  <URL>   Remote URL

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema remote remove

Remove a remote

Usage: schema remote remove [OPTIONS] <NAME>

Arguments:
  <NAME>  Remote name to remove

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema remote list

List configured remotes

Usage: schema remote list [OPTIONS]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema push

Push schemas to a remote repository

Usage: schema push [OPTIONS] [REMOTE] [BRANCH]

Arguments:
  [REMOTE]  Remote name
  [BRANCH]  Branch to push

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema pull

Pull schemas from a remote repository

Usage: schema pull [OPTIONS] [REMOTE] [BRANCH]

Arguments:
  [REMOTE]  Remote name
  [BRANCH]  Branch to pull

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema fetch

Fetch schemas from a remote repository

Usage: schema fetch [OPTIONS] [REMOTE]

Arguments:
  [REMOTE]  Remote name

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema clone

Clone a remote repository

Usage: schema clone [OPTIONS] <URL> [PATH]

Arguments:
  <URL>   Repository URL
  [PATH]  Local path

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema data

Data operations: migrate, convert, sync, and status

Usage: schema data [OPTIONS] <COMMAND>

Commands:
  migrate  Migrate data to match the current schema version
  convert  Convert data between schemas
  sync     Sync data to match a target schema version via VCS
  status   Report data staleness relative to the current schema version
  help     Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema data migrate

Migrate data to match the current schema version

Usage: schema data migrate [OPTIONS] <DATA>

Arguments:
  <DATA>  Data directory containing JSON files

Options:
      --protocol <PROTOCOL>  Protocol name (inferred from HEAD commit if omitted)
  -v, --verbose              Enable verbose output
      --range <RANGE>        Migrate between specific commits (default: parent..HEAD)
      --dry-run              Preview without modifying files
  -o, --output <OUTPUT>      Output directory (default: overwrite in place)
      --backward             Migrate backward (requires stored complement)
      --coverage             Apply migration and print coverage statistics
  -h, --help                 Print help

schema data convert

Convert data between schemas

Usage: schema data convert [OPTIONS] --protocol <PROTOCOL> <DATA>

Arguments:
  <DATA>  Data file or directory of JSON files

Options:
      --from <FROM>            Source schema
  -v, --verbose                Enable verbose output
      --to <TO>                Target schema
      --protocol <PROTOCOL>    Protocol name
      --chain <CHAIN>          Pre-built protolens chain JSON (alternative to --from/--to)
  -o, --output <OUTPUT>        Output file or directory
      --direction <DIRECTION>  Direction: "forward" or "backward" [default: forward]
      --defaults <DEFAULTS>    Default values as key=value pairs
  -h, --help                   Print help

schema data sync

Sync data to match a target schema version via VCS

Usage: schema data sync [OPTIONS] <DATA_DIR>

Arguments:
  <DATA_DIR>  Data directory

Options:
      --edits            Store an edit log object in the VCS
  -v, --verbose          Enable verbose output
      --target <TARGET>  Target ref (default: HEAD)
  -h, --help             Print help

schema data status

Report data staleness relative to the current schema version

Usage: schema data status [OPTIONS] <DATA_DIR>

Arguments:
  <DATA_DIR>  Data directory

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema theory

Theory DSL operations: define theories, morphisms, and protocols from data files

Usage: schema theory [OPTIONS] <COMMAND>

Commands:
  validate             Validate a theory document (load + typecheck)
  compile              Compile a theory document and print results
  compile-dir          Compile all theory documents in a directory
  check-morphism       Validate a morphism document
  recompose            Replay a composition and print the resulting theory
  check-coercion-laws  Run sample-based coercion law checks over every directed equation in a theory document. Exits non-zero when any declared coercion class is falsified by a sample
  repl                 Interactive theory REPL with syntax highlighting and history
  help                 Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema theory validate

Validate a theory document (load + typecheck)

Usage: schema theory validate [OPTIONS] <FILE>

Arguments:
  <FILE>  Path to the theory document file (.ncl, .json, .yaml)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema theory compile

Compile a theory document and print results

Usage: schema theory compile [OPTIONS] <FILE>

Arguments:
  <FILE>  Path to the theory document file

Options:
      --json     Output as JSON
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema theory compile-dir

Compile all theory documents in a directory

Usage: schema theory compile-dir [OPTIONS] <DIR>

Arguments:
  <DIR>  Path to the directory

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema theory check-morphism

Validate a morphism document

Usage: schema theory check-morphism [OPTIONS] <FILE>

Arguments:
  <FILE>  Path to the morphism document file

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema theory recompose

Replay a composition and print the resulting theory

Usage: schema theory recompose [OPTIONS] <FILE>

Arguments:
  <FILE>  Path to the composition document file

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema theory check-coercion-laws

Run sample-based coercion law checks over every directed equation in a theory document. Exits non-zero when any declared coercion class is falsified by a sample

Usage: schema theory check-coercion-laws [OPTIONS] <FILE>

Arguments:
  <FILE>  Path to the theory document file

Options:
      --json                 Output the full report as JSON
  -v, --verbose              Enable verbose output
      --var-name <VAR_NAME>  Name under which each sample is bound in the evaluation environment. Defaults to `"x"`; override when a theory's equations bind a different free variable so the checker does not surface "unbound variable" errors on every sample [default: x]
  -h, --help                 Print help

schema theory repl

Interactive theory REPL with syntax highlighting and history

Usage: schema theory repl [OPTIONS]

Options:
      --load <PATH>  Theory documents to load on startup. Same shape accepted by `:load` inside the REPL
  -v, --verbose      Enable verbose output
  -h, --help         Print help

schema lens

Bidirectional lens operations

Usage: schema lens [OPTIONS] <COMMAND>

Commands:
  generate  Generate a lens between two schemas
  apply     Apply a saved lens chain to data
  compose   Compose two protolens chains or schemas
  verify    Verify lens laws on test data
  inspect   Inspect a saved protolens chain
  check     Check applicability of a chain against schemas in a directory
  lift      Lift a chain along a theory morphism
  help      Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema lens generate

Generate a lens between two schemas

Usage: schema lens generate [OPTIONS] --protocol <PROTOCOL> <OLD> <NEW>

Arguments:
  <OLD>
          Path to the old/source schema

  <NEW>
          Path to the new/target schema

Options:
      --protocol <PROTOCOL>
          Protocol name

  -v, --verbose
          Enable verbose output

      --json
          Output as JSON

      --chain
          Output a reusable protolens chain (JSON to stdout)

      --try-overlap
          Try overlap-based alignment when direct morphism fails

      --save <SAVE>
          Save the generated protolens chain to a file

      --defaults <DEFAULTS>
          Default values as key=value pairs

      --fuse
          Fuse multi-step chain into single protolens

      --requirements
          Show complement requirements (defaults/data needed)

      --hints <HINTS>
          Path to a JSON hints file for guided auto-lens generation

      --stringency <TIER>
          Stringency tier governing which alignment strategies run.
          
          Accepted case-insensitively for parity with the Python and WASM bindings, both of which trim and lowercase their input.
          
          `strict`: only kind-exact name equality. `balanced`: alias dictionary + tight token similarity (default). `lenient`: span-search and structural priors. `exploratory`: lossy retraction witnesses.

          Possible values:
          - strict:      Kind-exact, edge-name-pruned CSP search; total morphism only
          - balanced:    Adds alias dictionary and tight token-similarity priors (default)
          - lenient:     Adds span-search and structural priors
          - exploratory: Adds lossy retraction witnesses

      --top-n <N>
          Emit up to N ranked candidate lenses instead of the single best one. Output format switches to a JSON array when combined with `--json` or `--chain`
          
          [default: 1]

      --explain
          Print per-step explanations (and confidences) for each emitted candidate

  -h, --help
          Print help (see a summary with '-h')

schema lens apply

Apply a saved lens chain to data

Usage: schema lens apply [OPTIONS] --protocol <PROTOCOL> <CHAIN> <DATA>

Arguments:
  <CHAIN>  Path to the protolens chain JSON
  <DATA>   Path to the data file

Options:
      --protocol <PROTOCOL>      Protocol name
  -v, --verbose                  Enable verbose output
      --direction <DIRECTION>    Direction: "forward" or "backward" [default: forward]
      --complement <COMPLEMENT>  Complement data for backward apply
      --schema <SCHEMA>          Schema for chain instantiation
  -h, --help                     Print help

schema lens compose

Compose two protolens chains or schemas

Usage: schema lens compose [OPTIONS] --protocol <PROTOCOL> <CHAIN1> <CHAIN2>

Arguments:
  <CHAIN1>  First chain or schema file
  <CHAIN2>  Second chain or schema file

Options:
      --protocol <PROTOCOL>  Protocol name
  -v, --verbose              Enable verbose output
      --json                 Output as JSON
      --chain                Output in chain format
  -h, --help                 Print help

schema lens verify

Verify lens laws on test data

Usage: schema lens verify [OPTIONS] --protocol <PROTOCOL> <DATA> [SCHEMA]

Arguments:
  <DATA>    Path to test data file
  [SCHEMA]  Schema file (second schema is optional)

Options:
      --protocol <PROTOCOL>  Protocol name
  -v, --verbose              Enable verbose output
  -h, --help                 Print help

schema lens inspect

Inspect a saved protolens chain

Usage: schema lens inspect [OPTIONS] --protocol <PROTOCOL> <CHAIN>

Arguments:
  <CHAIN>  Path to the protolens chain JSON

Options:
      --protocol <PROTOCOL>  Protocol name
  -v, --verbose              Enable verbose output
  -h, --help                 Print help

schema lens check

Check applicability of a chain against schemas in a directory

Usage: schema lens check [OPTIONS] --protocol <PROTOCOL> <CHAIN> <SCHEMAS_DIR>

Arguments:
  <CHAIN>        Path to the protolens chain JSON
  <SCHEMAS_DIR>  Directory containing schema JSON files

Options:
      --protocol <PROTOCOL>  Protocol name
  -v, --verbose              Enable verbose output
      --dry-run              Report only, do not instantiate
  -h, --help                 Print help

schema lens lift

Lift a chain along a theory morphism

Usage: schema lens lift [OPTIONS] <CHAIN> <MORPHISM>

Arguments:
  <CHAIN>     Path to the protolens chain JSON
  <MORPHISM>  Path to the theory morphism JSON

Options:
      --json     Output as JSON
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema parse

Parse source files into full-AST schemas via tree-sitter

Usage: schema parse [OPTIONS] <COMMAND>

Commands:
  file     Parse a single source file into a full-AST schema
  project  Parse all files in a directory into a unified project schema
  emit     Parse a file and emit it back to source (round-trip test)
  help     Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema parse file

Parse a single source file into a full-AST schema

Usage: schema parse file [OPTIONS] <PATH>

Arguments:
  <PATH>  Path to the source file

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema parse project

Parse all files in a directory into a unified project schema

Usage: schema parse project [OPTIONS] [PATH]

Arguments:
  [PATH]  Path to the project directory [default: .]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema parse emit

Parse a file and emit it back to source (round-trip test)

Usage: schema parse emit [OPTIONS] <PATH>

Arguments:
  <PATH>  Path to the source file

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema git

Import/export between git repositories and panproto-vcs

Usage: schema git [OPTIONS] <COMMAND>

Commands:
  import  Import a git repository's history into panproto-vcs
  export  Export panproto-vcs history to a git repository
  help    Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema git import

Import a git repository's history into panproto-vcs

Usage: schema git import [OPTIONS] <REPO> [REVSPEC]

Arguments:
  <REPO>     Path to the git repository
  [REVSPEC]  Git revspec (e.g. "HEAD", "main", "HEAD~10..HEAD") [default: HEAD]

Options:
  -v, --verbose  Enable verbose output
  -h, --help     Print help

schema git export

Export panproto-vcs history to a git repository

Usage: schema git export [OPTIONS] <DEST>

Arguments:
  <DEST>  Destination path for the git repository

Options:
      --repo <REPO>  Path to the panproto repository (default: current directory) [default: .]
  -v, --verbose      Enable verbose output
  -h, --help         Print help

Rust SDK reference

The Rust surface of panproto is the panproto-core facade. Add it to your Cargo.toml:

[dependencies]
panproto-core = "0.49"

Full type signatures, constructors, and method documentation live on docs.rs:

Feature flags

FeatureEffect
full-parsePulls in panproto-parse and tree-sitter-based AST parsing.
projectPulls in panproto-project for multi-file project assembly.
gitPulls in panproto-git for the git bridge.
llvmEnables LLVM-backed lowering via panproto-llvm.
jitEnables JIT-compiled migration via panproto-jit.
tree-sitterEnables tree-sitter-based format-preserving parsing for built-in protocols (forwards to panproto-io/tree-sitter).

The default feature set re-exports the always-on crates: panproto-gat, panproto-schema, panproto-inst, panproto-mig, panproto-lens, panproto-check, panproto-protocols, panproto-io, and panproto-vcs. The feature flags above pull in the optional crates on top.

Sub-crate lookup

For lower-level work, depend on individual crates rather than the facade. The crate map lists every workspace member with a one-line description and a link to its docs.rs page.

TaskCrate
Define a GAT in Rustpanproto-gat, panproto-gat-macros
Validate a schema against a protocolpanproto-schema, panproto-protocols
Parse data and produce an instancepanproto-io, panproto-inst
Build and apply a migrationpanproto-mig
Construct or compose lensespanproto-lens
Write lenses in the lens DSLpanproto-lens-dsl
Use the expression languagepanproto-expr, panproto-expr-parser
Version-control schemas and datapanproto-vcs, panproto-git
Parse full ASTs across 261 languagespanproto-parse
Decorate an abstract schema with a layout fibrepanproto-parse (ParserRegistry::decorate, ParserRegistry::pretty_with_protocol)
Distinguish abstract and decorated schemas at the type levelpanproto-schema (AbstractSchema, DecoratedSchema, LayoutWitness)

See also

TypeScript SDK reference

The TypeScript SDK is published as @panproto/core. It wraps the WASM build of panproto with a fluent, handle-based API.

Installation

npm install @panproto/core
# or
pnpm add @panproto/core
# or
yarn add @panproto/core

Node 20+ is required. Browser builds are supported via the same package; a bundler with WASM support is needed.

Initialization

import { Panproto } from '@panproto/core';

const p = await Panproto.init();

Panproto.init() loads the WASM module, sets up the slab allocator behind the boundary, and returns the top-level handle. All subsequent calls go through it.

Surface

The SDK exposes (selected; see bindings/typescript/src/index.ts for the full list):

Object / functionPurpose
PanprotoTop-level handle. init(), protocol(name), migration(src, tgt), parseJson(schema, json), toJson(schema, instance), convert(data, { from, to, defaults? }), compose(m1, m2), composeLenses(l1, l2), checkExistence(src, tgt, builder), diffFull(old, new).
ProtocolA loaded protocol. .name, .schema() returns a SchemaBuilder.
SchemaBuilderFluent builder. .vertex(name, kind), .edge(src, tgt, kind, opts), .build() returns BuiltSchema.
BuiltSchemaA built schema. .vertices, .edges, .protocol.
InstanceA parsed data record. .toJson(), .validate().
MigrationBuilderBuilder. .map(srcVertex, tgtVertex), .mapEdge(srcEdge, tgtEdge), .resolve(...), .compile() returns CompiledMigration.
CompiledMigrationA migration that is a lens. .lift(record) returns LiftResult { data, _rawBytes? }; .get(record) returns GetResult { view, complement }; .put(view, complement) returns LiftResult.
LensHandleA free-standing protolens chain. .get(bytes), .put(view, complement), .checkLaws(instance) returns LawCheckResult { holds, violation }; .checkGetPut, .checkPutGet for individual laws; .toJson().
FullDiffReport / CompatReportReturned by Panproto.diffFull(old, new). Call .classify(protocol) on the diff to get a CompatReport with classification of fully-compatible / backward-compatible / breaking.
executeQuery(query, instance, wasm)Standalone query function. The query is InstanceQuery { anchor, predicate?, projection?, path?, groupBy?, limit? }; the predicate is an Expr object, not a source string.
parseExpr, evalExpr, formatExpr, ExprBuilderExpression-language entry points.
IoRegistryMulti-protocol parse/emit registry.
Repositorypanproto-vcs repository handle.
DataSetHandleData-versioning handle.

Full API documentation, including every method signature and parameter, lives in the TypeDoc output:

The package source is at bindings/typescript/.

Boundary characteristics

The SDK is a thin layer over panproto-wasm. Data crossing the boundary is encoded with MessagePack, and live values on the Rust side are referenced from JavaScript through opaque integer handles allocated from a slab. Object identity is preserved across calls, so storing handles in JavaScript data structures is safe.

See also

Python SDK reference

The Python SDK is published as panproto on PyPI. It uses native PyO3 bindings, not WASM.

Installation

pip install panproto

Python 3.13 or newer is required. The wheel ships with eleven core tree-sitter grammars (Python, JavaScript, TypeScript, Java, C#, C++, PHP, Bash, C, Go, Rust). Additional grammar packs are available as separately-installable companions:

PackLanguages
panproto-grammars-functionalHaskell, OCaml, Elm, Gleam, Erlang, Elixir, PureScript, F#, Clojure, Scheme, Racket
panproto-grammars-webHTML, CSS, SCSS, Vue, Svelte, Astro, …
panproto-grammars-systemsZig, Nim, V, Crystal, …
panproto-grammars-jvmScala, Kotlin, Groovy, …
panproto-grammars-scriptingLua, Perl, R, Julia, …
panproto-grammars-dataSQL dialects, GraphQL, JSON variants, …
panproto-grammars-devopsTerraform, Dockerfile, Helm, …
panproto-grammars-mobileSwift, Objective-C, Dart, …
panproto-grammars-musicSuperCollider, LilyPond, ABC, Csound, ChucK, Glicol, Tidal, Strudel
panproto-grammars-allUmbrella package containing every pack above.

Each pack auto-registers its grammars with panproto.AstParserRegistry() on import.

Top-level surface

import panproto

proto = panproto.get_builtin_protocol("atproto")
b = proto.schema()
b.vertex("post", "record", "app.bsky.feed.post")
b.vertex("post:body", "object")
b.edge("post", "post:body", "record-schema")
schema = b.build()

There is no Panproto umbrella class; the entry points are free functions on the panproto module. The full re-export list (errors, schema types, protocol registry, migration, check, instance, I/O, lens, GAT, expression-language, VCS, parse, project, git bridge) is in bindings/python/src/panproto/__init__.py. Selected entry points:

SurfaceEntry point
Protocol registryget_builtin_protocol(name), list_builtin_protocols(), define_protocol(...)
Schema constructionProtocol.schema() returns a SchemaBuilder. Each .vertex(id, kind) / .edge(src, tgt, kind, name=None) / .constraint(vid, sort, value) mutates the builder in place and returns None; call .build() on the final builder. Chain syntax is a TypeScript-only convenience; Python is statement-by-statement.
MigrationMigrationBuilder, compile_migration, compose_migrations, invert_migration, pipeline
Checkdiff_and_classify, diff_schemas, check_existence, check_coverage
LensLens, ProtolensChain (with from_dsl_json / from_dsl_yaml / from_dsl_nickel / from_dsl_path loaders), auto_generate_lens, auto_generate_lens_candidates
GATTheory (with from_json / from_yaml / from_nickel / from_path DSL loaders and to_yaml / from_dict_yaml for the flat-shape round-trip), TheoryBuilder, Model, colimit_theories, free_model, migrate_model, check_model, check_morphism
SchemaSchema.constraints_for(vertex_id) lists every constraint; Schema.field_text(vertex_id, field_name) reads the text of a tree-sitter field('<name>', anonymous-token) child
Expression languageExpr, parse_expr, pretty_print_expr
VCSRepository, VcsRepository, BisectState
Parseparse_source_file, available_grammars, ParseEmitLens, AstParserRegistry() (with .override_grammar(name, extensions, language_ptr, node_types, ...) for dev-time grammar swapping)
ProjectProjectBuilder, parse_project, build_project

Full API reference, including every method signature, lives at the dedicated mkdocs site:

The package source is at bindings/python/.

Type stubs

The package ships _native.pyi so that signatures are visible to type checkers (mypy, pyright). Stub signatures are kept in lockstep with the PyO3 runtime by CI.

See also

Protocol catalogue

A protocol in panproto is a schema language: Avro, CDDL, OpenAPI, ATProto Lexicons, Parquet, FHIR, a Kubernetes CRD. Each one is defined by a pair of GATs (a schema theory and an instance theory) composed by colimit from reusable building-block theories. Every protocol provides both a parser (native format → Schema) and an emitter (Schema → native format), so panproto can round-trip data through any pair.

For the model behind these registrations, see Schemas as theories and Composing protocols by colimit.

Categories

The built-in protocols are organised by category in panproto-protocols. Each category is a Rust submodule.

CategoryModuleProtocols
Serialization and IDLsserializationAvro, FlatBuffers, ASN.1, Bond, MessagePack Schema
Data schemadata_schemaCDDL, BSON
API specificationsapiOpenAPI, AsyncAPI, RAML, JSON:API
DatabasedatabaseMongoDB, Cassandra, DynamoDB, Neo4j, Redis
Web and documentweb_documentATProto Lexicons, DOCX, ODF
Data sciencedata_scienceParquet, Arrow, DataFrame schemas
DomaindomainGeoJSON, FHIR, RSS/Atom, vCard/iCal, EDI X12, SWIFT MT
ConfigurationconfigKubernetes CRDs, CloudFormation, Ansible
Linguistic annotationannotationAMR, bead, BRAT, Concrete, CoNLL-U, Decomp/UDS, ELAN, FoLiA, FOVEA, ISO-Space, LAF/GrAF, NAF, NIF, PAULA/Salt, TEI XML, TimeML, UCCA, UIMA/CAS, W3C Web Annotation
Raw fileraw_fileNon-code files (README, LICENSE, images)

The authoritative list is in the panproto-protocols source tree. Each submodule’s register_* function documents the building-block theories it composes.

Registration shape

A protocol registration is a sequence of theory colimits applied in a determined order. For example, the constrained-multigraph-with-W-types theory used by MessagePack Schema (and several other protocols) is built as colimit(colimit(ThGraph, ThConstraint; Vertex), ThMulti; Vertex, Edge), with ThWType as the instance theory. If any colimit step fails, registration panics with a message naming the failing intermediate step. This is intentional: a registration failure is a build-time bug in the theory composition, not user input that can fail at runtime.

Source-of-truth

FormatSource
Built-in protocol listcrates/panproto-protocols/src/lib.rs
Building-block theoriescrates/panproto-protocols/src/theories.rs
Tree-sitter grammar list (261 languages)crates/panproto-grammars/

Defining a new protocol

To add a custom protocol, see Build a custom protocol. The minimal recipe is: declare schema and instance GATs (Rust or via the theory DSL), register a parser and emitter, and add a registration call to the relevant submodule.

Source-code grammars and emit verification

In addition to the schema-language protocols catalogued above, panproto ships 261 tree-sitter grammars under crates/panproto-grammars/. Each grammar registers a tree-sitter Language plus its node-types.json AST signature; the resulting parser walks source code into a full-AST schema, and emit_pretty renders the schema back to bytes via the structural pipeline described in Source-code emission.

The emitter’s correctness varies by grammar; ParserRegistry::emit_verification_status reports which tier each protocol falls into:

TierMeaningCurrently
VerifiedEvery entry of the grammar author’s own test/corpus/ round-trips under the strict emit_corpus_audit oracle (byte fixed point plus vertex-kind and edge-shape multiset preservation), or the protocol is pinned by a quivers backend testthe 255 names in VERIFIED_EMIT_PROTOCOLS
GenericRegistered grammar; emit uses the generic dispatch + universal cassette path; no test asserts correctnessthe remaining vendored grammars not yet in the verified set
UnsupportedNo grammar.json vendored, or protocol not registeredgrammars whose upstream did not ship grammar.json

Downstream tooling (notably quivers’s transpile backends) should call this API upfront and refuse emit on protocols that return Generic or Unsupported. The full list of verified protocols is maintained as VERIFIED_EMIT_PROTOCOLS in crates/panproto-parse/src/registry.rs.

See also

Expression-language reference

The panproto-expr language is pure, total, and bounded: every well-typed expression terminates within a fixed step budget, no IO is permitted, and evaluation is deterministic. It is used inside migrations to describe field-level transforms, and inside queries to describe predicates and projections.

For the model behind the language, see Expression language: denotational semantics.

Surface syntax

The Haskell-style surface syntax supports literals, variables, lambdas, application, let, pattern matching, and list comprehensions. The full grammar lives in panproto-expr-parser. Inside JSON or YAML migration files, expressions appear as strings; the parser is invoked on read.

Types

TypeDescription
Int64-bit signed integer with checked arithmetic. Overflow is an error, not a wrap.
FloatIEEE 754 64-bit float.
StrUTF-8 string.
BoolBoolean.
List aHeterogeneous list (the type parameter is the element constraint when known).
RecordMap from string keys to values.
NullSingleton type for the absence of a value.
AnyTop type used in builtins that accept any input.

Builtins

The 59 built-in operations are in panproto_expr::BuiltinOp, grouped by family in crates/panproto-expr/src/builtin.rs.

Arithmetic

Add, Sub, Mul, Div, Mod, Neg, Abs, Floor, Ceil, Round. Div and Mod raise DivisionByZero on a zero divisor. Integer arithmetic uses checked operations; overflow raises Overflow.

Comparison

Eq, Neq, Lt, Lte, Gt, Gte. Returns Bool.

Boolean

And, Or, Not. Short-circuit evaluation.

String

Concat, Len, Slice, Upper, Lower, Trim, Split, Join, Replace, Contains.

List

Map, Filter, Fold, FlatMap, Append, Head, Tail, Reverse, Length. Head and Tail raise on the empty list.

Record

MergeRecords, Keys, Values, HasField.

Utility

DefaultVal, Clamp, TruncateStr.

Type coercion

IntToFloat, FloatToInt, IntToStr, FloatToStr, StrToInt, StrToFloat. The *ToInt and *ToFloat parses raise ParseError on invalid input.

Type inspection

TypeOf, IsNull, IsList.

Graph traversal (instance-aware)

Edge, Children, HasEdge, EdgeCount, Anchor. These return Null in the standard evaluator. To use them, evaluate in an instance environment via panproto_inst::instance_env.

Errors

The full list of ExprError variants is in crates/panproto-expr/src/error.rs.

VariantCause
StepLimitExceeded(u64)Evaluation exceeded the configured step budget (EvalConfig::max_steps, default 100,000).
DepthExceeded(u32)Evaluation exceeded the configured recursion depth.
UnboundVariable(String)Referenced a variable not bound in the environment.
TypeError { expected, got }Operand had an incompatible type.
ArityMismatch { op, expected, got }Builtin called with the wrong number of arguments.
IndexOutOfBounds { index, len }List index out of range.
FieldNotFound(String)Record field missing.
NonExhaustiveMatchNo pattern arm matched the scrutinee.
DivisionByZeroDiv or Mod with a zero divisor.
ListLengthExceeded(usize)List operation exceeded the configured maximum length.
ParseError { value, target_type }A *ToInt / *ToFloat coercion failed to parse its input.
NotAFunctionApplication of a non-function value.
OverflowChecked integer arithmetic overflowed.
FloatNotRepresentable(String)Float value (NaN, infinity, out-of-range) cannot be represented as an integer.

Authoritative source

Builtin enum: crates/panproto-expr/src/expr.rs. Implementation: crates/panproto-expr/src/builtin.rs. Evaluator: crates/panproto-expr/src/eval.rs.

See also

Lens combinator reference

A lens in panproto is a triple of functions over a source S, a view V, and a complement C:

get        : S -> V
put        : S × V × C -> S
complement : S -> C

Every constructor in panproto-lens produces a lens whose round-trip laws (GetPut, PutGet, PutPut) are verified by the property tests in panproto-lens/src/laws.rs. The complement carries the data discarded by get so that put can restore it.

For the model behind these combinators, see Lenses and round-trip laws and Lens DSL: denotational semantics.

Optic kinds

Each protolens or protolens chain is classified into an OpticKind. The classification is structural: it follows from the shape of the underlying TheoryTransform, which in turn reflects the schema edge a combinator is applied at.

OpticKindTypical schema edgeComplement
Isobijective relabeling (e.g. edge rename)Unit
Lensprop (single-value projection)dropped data
Prismsum / variant injectionvariant tag
AffineLens composed with Prism(variant tag, dropped data)
Traversalitem (collection, multi-focus)per-position complements

Composition follows the standard optics lattice: Iso is the identity, Traversal is absorbing, Lens + Prism collapses to Affine. The constructor functions for these kinds live in protolens::elementary (atomic steps: add_edge, drop_edge, rename_edge_name, …) and protolens::combinators (composite builders: pipeline, map_items, …). Browse the panproto_lens module index for full signatures.

Combinator families

FamilyModulePurpose
Asymmetric lensesasymmetricClassical S → V transforms with put.
Symmetric lensessymmetricA ↔ B transforms with shared complement.
CompositioncomposeSequential and parallel composition of lenses.
OpticsopticOpticKind enum (Iso, Lens, Prism, Affine, Traversal) and the composition lattice.
FibrationfibrationThe Grothendieck fibration of lenses over schemas.
ProtolensprotolensSchema-parameterized lens families with vertical and sequential composition.
Enrichment registryenrichment_registryCross-crate LayoutEnricher trait and registry for schema-level enrichment fibres (e.g. the parse / decorate / emit lens; see Layout enrichment).
LawslawsProperty-test harness for the three lens laws.

Complement composition

Complement::compose is a partial commutative monoid:

  • It returns Result<Complement, LensError>.
  • It refuses to merge complements whose source-schema fingerprints disagree (ComplementFingerprintMismatch).
  • It refuses to merge complements that disagree on a key (ComplementConflict).

A pre-flight check is available as Complement::is_compatible. The full discussion is in Lenses and round-trip laws.

Protolens composition

Protolenses are natural transformations between schema endofunctors. protolens_composable requires structural equality of the intermediate endofunctor (same precondition, same transform) before vertical_compose will run; otherwise CompositionMismatch. Two instantiation modes are available:

ModeFunctionUse
FusedinstantiateSingle morphism preserving migration metadata. Default for production.
Sequentialinstantiate_sequentialStep-by-step folding through intermediate schemas. Used by property tests to inspect each step.

Both satisfy the lens laws.

See also

Configuration reference

The project manifest is panproto.toml at the project root. It is read by panproto-project::load_config and consumed by every CLI subcommand that operates on a project rather than a single file.

Top-level shape

[workspace]
name = "my-project"
exclude = ["build/**", "**/.cache"]

[[package]]
name = "user-api"
path = "schemas/user"
protocol = "json-schema"

[[package]]
name = "user-events"
path = "schemas/events"
protocol = "avro"

Sections

[workspace]

FieldTypeRequiredDescription
namestringyesWorkspace identifier; used for VCS commits and emitted artifacts.
excludelist of glob patternsnoGlobs to exclude from project assembly. Evaluated relative to the manifest directory.

[[package]]

One entry per schema package. Packages are assembled into a single project by schema coproduct.

FieldTypeRequiredDescription
namestringyesPackage identifier, unique within the workspace.
pathpathyesDirectory containing the package’s schemas, relative to the manifest.
protocolstringnoProtocol name (json-schema, atproto, protobuf, …). When omitted, the protocol is inferred from file extensions and contents.

Generating a manifest

schema init [PATH] initialises a panproto repository in the given directory (default .). The repository’s bookkeeping lives in .panproto/; the user-authored manifest is panproto.toml and is created by hand.

For programmatic creation, the panproto-project crate exposes generate_config and serialize_config.

Authoritative source

Manifest schema: crates/panproto-project/src/config.rs.

See also

Crate map

The panproto-* crates in the workspace, with one-line descriptions and dependency direction. The full dependency graph appears in explanation/architecture. Source for any crate lives at crates/<name>/ in the repository.

Core

CrateDescription
panproto-gatGeneralized algebraic theories: sorts, operations, equations, and the colimit machinery that composes them.
panproto-gat-macrosclass! and inductive! macros for declaring GATs in Rust source.
panproto-schemaSchema representation: vertices, edges, the schema graph, validation. Hosts the AbstractSchema / DecoratedSchema typed-newtype distinction and the layout-fibre forgetful U (Schema::forget_layout).
panproto-instInstances: data records over a schema, including Value::List and the typed value lattice.
panproto-migMigration engine: morphisms between schema theories, restrict and lift.
panproto-lensBidirectional lenses: get/put/complement, the three round-trip laws, fibration over schemas, optic kinds, protolenses, the cross-crate enrichment_registry for layout and other schema-level fibres.
panproto-lens-dslDeclarative lens DSL with Nickel, JSON, and YAML surfaces.
panproto-theory-dslDeclarative theory DSL for defining custom protocols.
panproto-checkBreaking-change detection: classifies schema diffs as fully compatible, backward compatible, or breaking.
panproto-protocolsBuilt-in protocol definitions composed via theory colimits.
panproto-exprPure, total expression language: terms, types, environment evaluation.
panproto-expr-parserHaskell-style surface syntax parser for panproto-expr.

I/O and parsing

CrateDescription
panproto-ioParse/emit codecs that bridge native formats (JSON, Avro, Protobuf, …) to panproto-inst.
panproto-parseTree-sitter full-AST parsing across 261 languages, with interstitial preservation. Hosts the put-direction (decorate) of the parse / emit lens and the LayoutEnricher adapter installed in panproto-lens’s enrichment registry.
panproto-grammarsPre-compiled tree-sitter grammars used by panproto-parse.
panproto-grammars-allUmbrella grammar companion pack for the Python wheel.
panproto-grammars-functionalFunctional-language grammar pack.
panproto-grammars-webWeb-language grammar pack.
panproto-grammars-systemsSystems-language grammar pack.
panproto-grammars-jvmJVM-language grammar pack.
panproto-grammars-scriptingScripting-language grammar pack.
panproto-grammars-dataData-language grammar pack.
panproto-grammars-devopsDevOps-language grammar pack.
panproto-grammars-mobileMobile-language grammar pack.
panproto-grammars-musicMusic-language grammar pack (SuperCollider, LilyPond, ABC, Csound, ChucK, Glicol, Tidal, Strudel).
panproto-projectMulti-file project assembly via schema coproduct, manifest loading.

Version control

CrateDescription
panproto-vcsSchematic version control: DAG, refs, commit/branch/merge, pushout-based merge with universal-property verification, data versioning.
panproto-gitBidirectional bridge between panproto-vcs and git.
panproto-git-remoteCustom git remote helper for cloning panproto repositories over git.
panproto-xrpcXRPC client for cospan-node VCS operations.

Bindings and surfaces

CrateDescription
panproto-corePublic re-export facade. The single dependency a downstream Rust user needs.
panproto-cliThe schema binary.
panproto-wasmWebAssembly bindings; consumed by the TypeScript SDK.
panproto-pyNative Python bindings via PyO3.
panproto-cC ABI for non-Rust language bindings (Haskell first).
panproto-replREPL engine for theories, terms, and morphisms.

Acceleration

CrateDescription
panproto-jitLLVM JIT compilation for accelerated migrations.
panproto-llvmLLVM IR protocol definition and lowering.

Repository tasks

CrateDescription
xtaskRepository tasks (not published). Includes gen-cli-docs.

See also

  • Architecture for the dependency direction and layering.
  • The repository Cargo.toml for authoritative workspace membership.

Explanation

Explanation pages cover why the system is the way it is. They are essays, not recipes. Each one is built in two tiers:

  • An In plain terms opener (3-5 sentences) that names the concept and gives the working-developer analogue. You can read the entire quadrant at this level and come away with a working mental model.
  • A formal section that names the underlying construction and points into the reference and semantics pages where it is precisely defined.

The pages here, in increasing order of formality:

PageTier
What panproto solvesPlain
Schemas as theoriesPlain, with one formal section
Migrations as morphismsPlain, with one formal section
Lenses and round-trip lawsPlain, with the three laws stated formally
Layout enrichmentPlain, with the section law stated formally
Composing protocols by colimitPlain, with one formal section
Schema version control semanticsPlain, with one formal section
What panproto verifiesCatalogue of mechanically-checked properties
ArchitectureCrate dependency graph and the layering that holds the system together

The denotational semantics cluster is separate and load-bearing. It is the place where the implementation is pinned to a precise mathematical specification: the expression language, the lens DSL, the theory DSL, the pushout-based merge, and protolens composition. Each of those pages still opens with a plain-terms section, but the body is dense.

What panproto solves

In plain terms

Whenever you change the shape of your data: add a column, rename a field, split one record into two, retire an enum value, switch from JSON Schema to Protobuf, you have to write migration code to bring existing data forward. That code is tedious, error-prone, and easy to get subtly wrong in ways that only surface in production.

panproto reads two versions of a schema, computes the difference, and produces the migration code for you. It also flags whether the change is backward-compatible or breaking, version-controls the schema history with git-style commands, and lets the same diff/migrate/version-control workflow run across schema languages: JSON Schema, Protobuf, GraphQL, ATProto Lexicons, SQL DDL, Avro, and 45 others.

The thing that makes this work uniformly across all those languages is that panproto treats every schema as an instance of a common structure. The shared structure is what the rest of the explanation quadrant unpacks.

The three concrete jobs

panproto exists to do three jobs that today are usually done by hand:

  1. Diff and classify schema changes. Given two versions of a schema, identify what was added, removed, renamed, or had its type changed; classify each change as fully compatible, backward compatible, or breaking; and surface the result in a way CI can gate on.
  2. Generate the migration code. Produce a bidirectional transform between the two versions: a function that lifts old records to the new shape, paired with a function that pushes new records back to the old shape without losing data. The pair is checked to satisfy round-trip laws, so the migration cannot silently corrupt data.
  3. Version-control schemas as first-class objects. Treat the schema history like a git history: commits, branches, merges, diffs, blame, tags. Merge conflicts at the schema level are resolved by a precise, well-defined operation rather than a manual three-way text merge.

What it does not solve

panproto does not write your application code, validate your runtime behaviour beyond what the schema describes, or replace your database. It is a layer between schema files and the data they describe. The output is migration code and a diff classification, not infrastructure.

See also

Schemas as theories

In plain terms

A schema is a description of what shapes your data can take: which fields exist, which types they hold, which combinations are allowed. JSON Schema, Protobuf, GraphQL, SQL DDL, ATProto Lexicons all do this, each in their own syntax.

panproto treats a schema as an algebraic theory: a set of basic kinds of thing (records, fields, types, references), a set of operations on them (add a field, set a constraint, link two records), and a set of equations describing when two ways of building the same shape are equivalent. The theory is a precise mathematical object; an actual JSON Schema file or Protobuf .proto file is a presentation of one.

The reason this matters is that once you have schemas as theories, you can talk about morphisms between them with the same precision. A morphism is a structured map from one theory to another: a translation that respects every operation. Migrations between schema versions, translations between schema languages, and the merging of two divergent schema branches are all morphisms. The whole pipeline downstream of the schema treats them uniformly.

The model

A schema is an instance of a generalized algebraic theory (GAT). A GAT consists of:

  • Sorts. The basic kinds of thing the theory talks about: vertices, edges, types, constraints.
  • Operations. Constructors that produce sorts from other sorts: vertex of a given kind, edge between two vertices, prop edge with a name, item edge for collection membership, variant edge for sum-typed alternatives.
  • Equations. Equalities between operation chains that the theory imposes: composition is associative; the identity edge is a no-op; certain constraints are equivalent rewrites of others.

Each protocol panproto recognises is a GAT plus a parser/emitter pair. JSON Schema is one such GAT; Protobuf is another. The shared building-block theories (graph, multigraph, constraint, W-type, …) are composed by colimit to produce each protocol’s specific theory. See Composing protocols by colimit.

A schema in a protocol is a model of that protocol’s theory: a concrete choice of vertices, edges, and constraint values that satisfies the theory’s equations. An instance is a model of the schema’s instance theory: actual data records that conform to the schema.

This separation, schema-theory versus instance-theory, runs through the entire system. Schemas have their own GAT; instances have their own GAT, parameterised over a schema; migrations are morphisms between schema theories; lift is the operation that takes records over the source schema and produces records over the target.

What is not modelled

  • Performance characteristics (a schema with 10,000 vertices is a valid model just like one with 5).
  • Wire-format minutiae beyond what the schema theory describes (whitespace, comment syntax, ordering conventions).
  • Application-level semantic constraints not expressible in the protocol’s theory.

The first two are handled by the format-preserving codec layer. The third is by design: panproto is precise about exactly what it can guarantee.

The schemas-as-theories framing belongs to a lineage that runs from Cartmell’s generalised algebraic theories through Spivak’s functorial data model and the algebraic-databases programme of Schultz and Wisnesky to Patterson’s ACSets and Lu’s recent multi-model unification. panproto presents each protocol as its own GAT and takes a colimit in the category of GAT presentations; ACSets and the Spivak-Wisnesky tradition fix a single meta-theory and parameterise schemas inside it. See Related work for the full discussion.

See also

Migrations as morphisms

In plain terms

A migration is what gets you from version 3 of your schema to version 4, with all your existing version-3 data carried forward into version-4 shape. Hand-written migrations are usually a script that walks each record, renames fields, fills in defaults, drops what is no longer there.

panproto represents a migration as a structured map between the two schemas: for every vertex (record kind) in the new schema, where does it come from in the old one; for every edge (field, item, variant), how is it derived. The structured map is called a morphism. Once you have a morphism, two operations follow:

  • Restrict moves the morphism backwards: it tells you what part of the old schema is needed to produce a given part of the new one.
  • Lift moves data forwards: it takes a record that conforms to the old schema and produces a record that conforms to the new one, using the morphism to know what to put where.

Lift is the operation you usually want; restrict is the operation panproto uses internally to figure out which old fields are required for the migration to succeed. Both are total functions on the things they apply to; if a migration cannot be lifted (because some required input is missing), panproto-check says so up front rather than failing partway through.

The formal picture

Schemas live in a category whose objects are schema theories and whose morphisms are theory morphisms. A migration from schema to schema is a morphism in this category. The migration engine is split into two functors:

  • Restrict : pulls a -instance back to an -instance along . Used to check existence conditions: which -records does require to be present?
  • Lift : pushes an -instance forward to a -instance along . Used to actually migrate data.

The pair forms an adjunction in the categories of instances: lift is left adjoint to restrict. (This follows the convention in Spivak & Wisnesky (2015), where is the dependent sum over the fibres of and is its right adjoint by base change.) The adjunction is the structure that lets panproto check, before any data moves, whether a migration is well-defined.

A migration may also include a value-level transform: not just where a field comes from, but how its value is computed. These are written in the expression language and applied during lift.

Three classes of migration

ClassDiff classificationExampleLift behaviour
Fully compatibleRefinementAdd an optional field with a default.Total; old records lift unchanged.
Backward compatibleInclusionAdd a required field whose value is computed from existing fields.Total; old records lift via the value-level transform.
BreakingNeitherRemove a required field with no recovery.Partial; some old records cannot be lifted. panproto-check flags this.

panproto-check runs the existence check on a migration without applying it, so CI can gate on the classification before merge. See Breaking-change gate.

What this gives you

  • You write the migration map (what goes where), not the migration script (how to execute it). The script is generated from the map and the schema theories.
  • You get a precise classification of whether the change is breaking, before any data is touched.
  • You get a bidirectional artifact: the lift forward is paired with a put backward, so the migration is also a lens. Round-trip laws apply.
  • Migrations compose. If and , then is a migration whose lift agrees with applying ’s lift then ’s. The migration history of a schema is a chain of these compositions.

See also

Lenses and round-trip laws

In plain terms

A lens is a pair of functions for moving between two shapes of data: one for going forward (often called get), one for going backward (often called put). The going-backward function is what makes a lens a lens rather than a one-way transform. It lets you take an edited new-shape record, push the edit back into the old-shape record, and recover whatever data the new shape did not have room for.

The data the new shape does not have room for has to live somewhere during the round trip. In a panproto lens, it lives in a third value called the complement. You can think of it as a sidecar that holds whatever get discarded, so that put can put it back.

A lens is lawful when it satisfies three round-trip identities. Roughly: getting then putting unchanged data is a no-op; putting then getting recovers exactly what you put in; and putting twice in a row is the same as putting just the second value. panproto verifies these three laws by property-based testing on every lens combinator. A lens that fails any of them is rejected.

The reason lenses matter for panproto: every migration is a lens. The lift function is the get (forward) and the put function is the backward direction. Together they let you migrate data forward, edit the new data, and push it back to the old shape if you ever need to.

The triple

A lens between source S and view V with complement C is three functions:

get        : S -> V
put        : S × V × C -> S
complement : S -> C

The complement function records what get is about to throw away; the put function uses the complement to reconstruct the parts of S that V does not determine.

The three laws

For every lawful lens:

  1. GetPut. . Applying get to extract a view, then putting that view back without changes, returns the original source.
  2. PutGet. . Putting a new view in returns that view when you read it back.
  3. PutPut. . Two consecutive puts to the same complement collapse to the second put.

PutPut is the third law. It is checked by panproto_lens::laws::check_put_put against every lens combinator, with random perturbations of the second view generated to ensure the law holds across the full input space, not just at fixed points.

Complement composition

When two lenses are composed, their complements need to combine. Complement::compose is a partial commutative monoid:

  • It returns Result<Complement, LensError>.
  • Two complements compose only if they share the same source-schema fingerprint. Otherwise composition fails with ComplementFingerprintMismatch.
  • For overlapping keys, the two complements must agree on the value. Disagreement fails with ComplementConflict.

This is the part of the lens machinery that prevents silent data loss. Earlier versions of panproto merged complements with a “first writer wins” rule; that rule could swallow disagreements between two paths through a lens diagram. The partial-monoid rule makes any such disagreement a hard error. Pre-flight check: Complement::is_compatible.

Where lenses come from

You almost never write a lens from scratch. They are produced by:

  • Migration compilation. Every migration morphism compiles to a lens whose get is lift and whose put is the backward transform.
  • The lens DSL (panproto-lens-dsl). A declarative spec in Nickel, JSON, or YAML compiles to the lens combinator algebra.
  • Protolenses (panproto-lens::protolens). Schema-parameterized lens families whose instantiations cover whole fleets of related schemas at once.

Schema-level lenses: layout as an enrichment

There is a second flavour of lens that lives one level up. parse / emit is not a WInstance lens; it is a relation between bytes and schemas, with parse recording the source layout into a fibre of constraints over each vertex and emit_pretty consuming that fibre to render bytes back. The relation has the shape of a lens at the schema level: stripping the layout fibre is the get, attaching it via a grammar walk is the put.

panproto names this construction explicitly. The EnrichmentKind::Layout tag in panproto-gat classifies the constraint sorts that make up the fibre. TheoryTransform::StripEnrichment and TheoryTransform::AddEnrichment are the two directions at the protolens level; their schema-level interpretation runs in apply_theory_transform_to_schema (strip drops the fibre constraints; add dispatches to a registered synthesis driver). The ComplementConstructor::Enrichment variant names the fibre and the driver in the complement vocabulary.

The schema-level lens does not plug into the WInstance-level get / put pair the way an elementary protolens does. The byte-level operational entry points live in panproto-parse: ParserRegistry::decorate for the put direction, ParserRegistry::parse_with_protocol for the get. The protolens captures the schema-level relationship those byte-level operations sit over; it composes with elementary protolenses for chain-law reasoning but is not the implementation of decorate or parse. See Layout enrichment for the full treatment.

The asymmetric get/put/create triple is from Foster et al. (2007), with totality and well-behavedness laws stated there. The first-class complement comes from the symmetric and edit-lens line of Hofmann et al. (2011) and Hofmann et al. (2012), with the edit-lens module structure giving the partial-monoid merge that this chapter relies on. The dispatch from edge kind to optic kind (Lens, Prism, Affine, Traversal) is licensed by the profunctor-optics theorem of Pickering et al. (2017), restated categorically by Clarke et al. (2024). The migration-as-lens-graph idiom is from Litt et al. (2020). See Related work for the full discussion.

See also

Layout enrichment

In plain terms

When panproto parses source code, it does two things at once. It records the structure of the program (which nodes are children of which, what kinds they are) and it records the layout of the source (where each token started in the byte stream, what whitespace sat between adjacent tokens, which alternative the parser took at each branch point). The structure is what most downstream operations care about. The layout is what the emitter needs to render bytes back.

The two parts live together in one schema, but they are separable: you can strip the layout and have a perfectly good abstract description of the program, or you can keep it and round-trip back to source bytes. decorate is the operation that takes the abstract description and rebuilds the layout on top.

The reason this matters: any tool that wants to generate source code from a panproto schema (a code refactorer, a reverse bridge from a domain model to a target language, a migration that materialises a new file) needs to produce a schema with the layout fibre attached. Before decorate, generators had to manually populate the layout constraints: disguised string concatenation, brittle to grammar revisions. With decorate, the operation is mechanical: write the abstract content, hand it to a grammar, get a schema the emitter can render byte-for-byte.

This page is the categorical framing: the parse / emit pair as a Grothendieck-style fibration, the section law that ties decorate and forget_layout together, and the cross-crate registry that makes the parse-side machinery available to the lens layer. For the operational mechanics of the emitter itself (how it derives spacing, dispatches CHOICE alternatives, and resolves external tokens through the cassette system), see Source-code emission.

The forgetful U and its section

Take a schema S produced by parsing some source bytes. Every vertex carries some constraints. Split them into two groups: the layout constraints (start-byte, end-byte, interstitial-N, chose-alt-fingerprint, chose-alt-child-kinds) and everything else (vertex kind, edges, literal-value, anonymous-token field:*, plus any protocol-defined sorts). The first group is parser-only metadata; the second is the abstract content of the program.

Stripping the first group is a function:

forget_layout : DecoratedSchema → AbstractSchema

Going the other way is harder. Given just the abstract content, you have to choose whitespace, choose which CHOICE alternative to dispatch through, and synthesise the byte spans. decorate is one canonical choice:

decorate : AbstractSchema × LayoutPolicy → DecoratedSchema

The two satisfy the section law:

where ≅_kind means equal up to vertex-id renaming and the kind / edge multiset.1 You can think of decorate as a one-sided inverse of forget_layout, picking a canonical representative of the parse-preimage at every abstract schema.

The pair (forget_layout, decorate) is the schema-level version of the parse / emit pair. parsing is bytes → DecoratedSchema; emitting is DecoratedSchema → bytes. Composing both gives the round-trip:

The image of pretty lands in the parse-preimage of its input modulo the same kind / edge multiset equivalence. This is the round-trip law that ships with the existing ParseEmitLens machinery, lifted to the typed-newtype level.

Layout as a Grothendieck-style enrichment

Across the broader panproto type system, layout is one of several enrichments a schema can carry over its abstract base. The framing is the same one panproto already uses for coercions, with a Grothendieck fibration over the abstract schema and the layout data living in the fibre.

The base of the fibration is the abstract schema. The fibre over each vertex is the layout data that vertex carries. The total space is the decorated schema. forget_layout is the cartesian projection back to the base. decorate is a section of that projection, picking out one chosen layout-data assignment for each vertex.

The EnrichmentKind enum in panproto-gat is the classifying tag. Today there is one variant, Layout. The infrastructure is shaped so other enrichments (provenance witnesses, refinement evidence, anything else that extends the schema without changing its underlying GAT theory) can slot in alongside without re-engineering the lens framework.

Two pieces of machinery name this fibration directly:

  • TheoryTransform::StripEnrichment(kind) and TheoryTransform::AddEnrichment { kind, enricher, policy } describe the two directions at the protolens level. Their schema-level effect is to remove or attach the fibre constraints; their theory-level effect is identity (the underlying GAT is the same in both source and target).
  • ComplementConstructor::Enrichment { kind, enricher } names a registered synthesis driver in the complement vocabulary. This is metadata for the lens framework’s chain-law reasoning; the driver itself lives behind the LayoutEnricher trait in panproto-lens::enrichment_registry, populated by panproto-parse at ParserRegistry::new time so the lens crate stays grammar-agnostic.

The cross-crate registration mechanism

The schema-level synthesis for Layout needs a grammar walker, and grammar walkers live in panproto-parse. The lens crate cannot depend on the parse crate (the dependency arrow would invert), so the bridge is a thin global registry of LayoutEnricher implementations keyed by (EnrichmentKind, enricher_name).

When ParserRegistry::register accepts a parser, it installs that parser into the global registry under its protocol name. A subsequent Protolens::instantiate against a parse_emit_protolens("lilypond", policy) dispatches the AddEnrichment { kind: Layout, enricher: "lilypond", .. } arm of apply_theory_transform_to_schema to the registered driver. The driver runs emit_pretty + parse against the input and returns the result.

The registry is process-global, single-keyed, and tolerant of re-registration (the last writer for a (kind, name) pair wins). Lock poisoning is recovered transparently because the critical sections do not invoke user code.

What the protolens does and does not do

parse_emit_protolens(grammar, policy) returns a Protolens whose source endofunctor strips the layout fibre, whose target endofunctor adds it back via the registered driver, and whose complement constructor names the fibre. It composes with every other protolens in panproto-lens (vertical and horizontal composition both work) and contributes to chain-law reasoning.

What it does not do is plug into the WInstance-level get / put pair the way an elementary protolens does. The lens framework’s Complement struct holds WInstance-level discarded data (dropped nodes, dropped arcs, contraction choices). It has no field for a per-vertex constraint fibre, and Protolens::instantiate produces a Lens whose source and target schemas differ in vertex IDs because the synthesis driver invents fresh ones. The schema-level shuffling happens in apply_theory_transform_to_schema; the operational byte-level entry points are on ParserRegistry.

That asymmetry is intentional. Parse and emit are conversions between bytes and schema-typed data structures; they are not WInstance lenses. The protolens captures the schema-level relationship that those byte-level operations sit over, and the operational API is where the byte-level work happens.

Two threads bear directly on the parse-emit fibration. The lens-based ancestry runs from Foster et al. (2007) through Bohannon et al. (2008)’s resourceful lenses (dictionary skeletons, quasi-obliviousness) and Foster et al. (2008)’s quotient lenses (lenses modulo equivalences, canonizers via lquot and rquot), with Lutterkort (2008) as the closest framework-level analogue. The grammar-based ancestry is Zhu et al. (2015)’s BiYacc, whose reflective printer takes both the AST and the original concrete string. de & Visser (2012)’s token-stream-and-origin-tracking algorithm is the closest match to the byte-level reconstruction strategy panproto uses inside the layout enrichment driver. See Related work for the full discussion.

See also


  1. The kind / edge multiset granularity, rather than pointwise vertex-id equality, is the standard granularity throughout panproto’s round-trip law machinery. The parse walker invents fresh vertex IDs at every call; insisting on pointwise identity would make even parse ∘ emit_pretty ∘ parse ill-typed. The multiset is the natural invariant: vertex kinds and edge-shape signatures are preserved by every layer of the parse / emit pipeline, and the section law is stated at the same level.

Source-code emission

In plain terms

panproto reads source code in 261 languages by sitting on top of tree-sitter, which has a community-maintained grammar for each one. The reverse direction, writing source code back out from a schema, is harder. tree-sitter grammars are written for parsing; they record the production rules the parser uses but not the spacing, indentation, or alternative-disambiguation choices an emitter needs. panproto’s source-code emitter, emit_pretty, closes that gap by deriving emission behaviour structurally from the same grammar.json the parser already ships.

The output of every emit is a function of three things: the schema, the production grammar, and a small per-language cassette that supplies defaults for the cases grammar.json cannot describe. Everything else (token roles, bracket pairs, spacing rules, indent triggers, CHOICE dispatch) is derived at construction time from the grammar’s own structure. There is no per-language emitter code.

A separate format-preserving codec covers structured-data formats (JSON, YAML, TOML, XML, CSV) and guarantees byte-for-byte round-trip. The two systems are independent; this page is about the tree-sitter source-code path.

The two halves of the gap

grammar.json lists every rule (SEQ, CHOICE, REPEAT, OPTIONAL, FIELD, ALIAS, SYMBOL, STRING, PATTERN, IMMEDIATE_TOKEN, PREC, TOKEN, RESERVED) the parser walks. It does not record:

  1. Whitespace and layout. Whether a + b or a+b; whether if (x) { ... } indents its body; whether def f(): x = 1 is one line or two. Tree-sitter strips whitespace during parse; nothing in grammar.json says how to put it back.
  2. CHOICE disambiguation for synthesised schemas. When two CHOICE alternatives both admit the same children (binary_expression CHOICE[+, -, *, /, ...] over the same operand kinds), the parser picks via its parse-table state. A schema built from scratch (no parse history) has no record of which alt was taken.

emit_pretty handles the first half structurally and the second half via a layered fallback hierarchy.

Grammar-derived token roles

Every STRING literal in a rule body is classified by its structural role at Grammar::from_bytes time. There are eight roles, classified per (rule, position) pair, not per token text:

RoleIdentification rule
BracketOpenFirst STRING in a SEQ whose first / last STRINGs satisfy the matched-pair predicate
BracketCloseLast STRING of the same SEQ
SeparatorFirst STRING in a REPEAT body’s inner SEQ
KeywordSTRING matching [a-zA-Z_][a-zA-Z0-9_]*
OperatorNon-alphanumeric STRING between content members inside a CHOICE alternative
ConnectorNon-alphanumeric STRING between content members in a standalone SEQ (a structural connector such as . or ::, not an algebraic operator); emits no surrounding space
TerminalText from a leaf vertex’s literal-value constraint
ImmediateA token the grammar wraps in IMMEDIATE_TOKEN, glued to its neighbour with no intervening whitespace

The matched-pair predicate identifies (), [], {}, <>, begin/end, do/done, |...|, <<>>, ${}, ⟨⟩, and every other bracket-like construct from grammar structure alone. There is no fixed character set: a per-SEQ check that the first and last STRINGs are different (or same-text with at least one wrapped in IMMEDIATE_TOKEN, e.g. regex /.../), with non-STRING content between them, identifies the pair.

The role-pair lookup table (needs_space_by_role) takes two adjacent token roles and returns whether to emit a space. No token text is inspected by the layout pass; spacing decisions are entirely role-driven. The table is small (one fixed function) and the same for every grammar.

Indent triggers

A bracket-open token triggers indentation when the content between open and close contains REPEAT or REPEAT1 (recursively through CHOICE / SEQ wrappers). This is the structural witness of “block-level content the source author would have indented”. Function bodies, statement lists, struct fields, namespace contents all match; inline constructs like type parameter lists, function arguments, and string interpolation do not, even though they may use the same bracket pair characters in other rules.

Word-like bracket pairs (function/end, if/end, module/end, etc., the matched-pair shape with alphanumeric delimiters) always trigger indentation: they only appear in block constructs across the 261 vendored grammars.

CHOICE dispatch

At emit time, pick_choice_with_cursor selects which CHOICE alternative to walk for a given vertex. The pipeline composes several structural filters; each rejects alternatives that cannot be the right pick given the cursor’s unconsumed edges and the schema’s constraint witnesses.

The categorical core is the acceptance predicate, accepts_first_edge(prod, edge_field, target_kind), an inductive function over the production tree:

ProductionAcceptance rule
STRING / PATTERN / BLANKreject (consumes no edges)
SYMBOL X (concrete)edge_field == "child_of" AND target_kind ⊑ X
SYMBOL X (hidden / supertype)accepts(X.rule, edge)
ALIAS{c, named:true, value:V}edge_field == "child_of" AND target_kind == V
FIELD{name, content}edge_field == name AND content.yield admits target_kind
SEQ[m1, m2, ...]accepts(m1, edge) OR (m1 ε-able AND accepts(SEQ[m2..], edge))
CHOICE[a1, a2, ...]any of accepts(ai, edge)
OPTIONAL / REPEAT / REPEAT1 / wrappersaccepts(inner, edge)

accepts_first_edge fuses four previously-separate ad-hoc checks (FIELD-name matching, SYMBOL subtype dispatch, ALIAS rewrite, yield-set admission) into one rule.

Three discriminators layer on top when more than one alternative passes accepts_first_edge:

  1. Token-set restriction. An alt’s FIELD body of shape ALIAS{CHOICE[STRING_1, STRING_2, ...], value: V} constrains the field child’s literal to that set. If the cursor’s field-named edge target carries a literal outside the set, the alt is rejected. This is what disambiguates Go call_expression (alt 0 has function: ALIAS{CHOICE["new","make"], value:"identifier"}, valid only when the function name is literally new or make).
  2. Alias-source filter. An alt’s FIELD body of shape ALIAS{SYMBOL X, named: true, value: _} requires the cursor’s field-named edge target to carry a pre-alias-symbol constraint equal to X. The walker records pre-alias-symbol from tree_sitter::Node::grammar_name() whenever it differs from kind(); this is the only ALIAS-disambiguation signal tree-sitter 0.25 / 0.26 actually exposes.
  3. Positional interstitial scoring. When the above filters leave more than one candidate, alts are scored against the slice of recorded interstitials from the current cursor position forward. The cursor’s consumed count gives the slice offset, so a trailing CHOICE-with-BLANK at position N sees only the interstitial at gap N, not the comma separators from earlier REPEAT iterations.

On yield-set ties after every filter, tree-sitter precedence (PREC annotations on alternatives) breaks the tie. This honours the grammar author’s explicit disambiguation rule.

The subtype closure

accepts_first_edge consults a precomputed subtype closure K ⊑ Y (“a vertex of kind K can appear where the grammar says SYMBOL Y”). The closure is computed once at Grammar::from_bytes time:

  1. Walk every hidden rule (_-prefixed) and supertype rule’s body to identify which concrete kinds satisfy each dispatch point.
  2. Walk every named ALIAS (ALIAS{c, named:true, value:V}) and record that the value V is satisfied by every kind reachable from c.
  3. Close the relation under composition: a Tarjan SCC over the dispatchable-only subgraph (hidden / supertype names form the nodes; satisfying edges are the transitions), producing an exact O(V + E) transitive closure without any iteration cap. The closure is keyed by concrete kind name, so accepts_first_edge’s lookup is O(1).

Cassettes

The output of every external scanner token must come from somewhere. tree-sitter’s external scanners are C code that produces tokens whose text varies at runtime; grammar.json declares the token name but not the text. Four resolution tiers cascade:

  1. Anonymous ALIAS. ALIAS{SYMBOL ext, named:false, value: V} declares that the external token emits exactly V. Read directly from grammar structure.
  2. CHOICE equivalence. CHOICE[SYMBOL ext, STRING s] declares that the external token is equivalent to the literal s. Read directly from grammar structure.
  3. CstComplement. For tokens whose text was captured at parse time, the schema’s literal-value constraint on the vertex carries the actual text. Used for heredoc bodies, raw-string content, regex patterns, escape sequences.
  4. Cassette default. When none of the above applies, a per-grammar GrammarCassette supplies a default.

The cassette layer itself composes two parts:

  • Universal layer (common_external_default). A closed table of name-pattern → default-text mappings derived from a structural audit of every vendored grammar. Recognises layout markers (_concat, _no_space, _brace_start), immediate-position markers (_immediate_*), error sentinels (_error_*, error_sentinel), automatic-semicolon family (_automatic_semicolon, _optional_semi), generic string delimiters (string_start, string_end), heredoc / raw-string content (returns ""; actual text comes from literal-value), and the descendant-operator family from CSS-like grammars.
  • Per-grammar override (GrammarCassette::external_token_default). A small per-grammar implementation that overrides specific tokens. The lookup composes per-grammar first, universal fallback second, via resolve_external_token.

The per-grammar overrides exist only where the language needs an override on top of the universal layer. The default empty cassette (used for ~230 of the 261 grammars) delegates entirely to the universal layer.

IMMEDIATE_TOKEN as a layout marker

tree-sitter’s IMMEDIATE_TOKEN is the grammar’s explicit signal that the wrapped token must have no preceding whitespace. emit_pretty lifts this to a single NoSpace marker emitted at the unique structural site where IMMEDIATE_TOKEN is declared:

  • When the production walker enters Production::ImmediateToken, emit NoSpace.
  • When emit_vertex enters a vertex whose rule body is IMMEDIATE_TOKEN(...) at the head, emit NoSpace before any content the leaf shortcut or rule-body walk produces.

The layout pass reads the marker; downstream code does not need to inspect production shapes to recover the property. This is what makes regex literals like /abc/g round-trip tight on both delimiters: regex_pattern’s rule body starts with IMMEDIATE_TOKEN, and the trailing / is wrapped in IMMEDIATE_TOKEN in the regex SEQ.

The verification tier API

emit_pretty is structurally sound for any grammar with a vendored grammar.json, but structural soundness is not the same as an output that round-trips. The fixed-point law emit_pretty(parse(emit_pretty(s))) == emit_pretty(s) is the actual correctness witness, and it has to be exercised per grammar. ParserRegistry::emit_verification_status reports which tier a protocol falls into:

StatusMeaning
VerifiedThe protocol round-trips under the strict oracle described below; downstream tooling may trust its emit
GenericThe protocol is registered and the generic dispatch path applies, but no test asserts emit correctness; output is structurally derived and likely correct, yet unverified
UnsupportedThe protocol is not registered, or its grammar lacks the vendored grammar.json that emit_pretty requires

Downstream tooling (notably quivers and other transpile pipelines) calls this API upfront and refuses emit on any backend returning Generic or Unsupported. There are no runtime warnings; the silence-versus-noise tradeoff falls on the caller.

A protocol earns Verified on one of two bases. The strong basis is corpus verification: every entry in the grammar author’s own test/corpus/ round-trips under the full oracle, which the strict emit_corpus_audit test checks over the whole corpus rather than one hand-written sample. The oracle is three conjuncts on each corpus entry s. Writing e1 for emit_pretty(parse(s)) and e2 for emit_pretty(parse(e1)), it requires e1 == e2 (emit reaches a fixed point), kind_multiset(parse(s)) == kind_multiset(parse(e1)) (no vertex kind is gained or lost), and the edge multisets equal likewise (no edge is gained or lost). Notice what the oracle does not require: e1 == s. A verified protocol may legitimately reformat, as json re-indents arrays and go applies gofmt spacing; what it may not do is alter the abstract syntax tree or fail to reach a fixed point. The second basis is backend verification: a quivers transpile backend (python, stan, bugs, jags, julia, scheme, javascript) covered by dedicated emit regression tests over the construct surface quivers actually emits, with full corpus pass tracked as follow-on work.

The current Verified set covers 255 protocols, kept in sorted order in VERIFIED_EMIT_PROTOCOLS in crates/panproto-parse/src/registry.rs. Adding a protocol requires the corpus audit to pass for it (or a backend regression test to cover it), then appending the name to that array. An earlier expansion to 149 protocols on single hand-written samples was reverted after the corpus audit showed most failed their own grammar’s test corpus: one sample is not a sufficient witness.

Limitations

Three classes of input fall outside what the structural pipeline can recover:

  • By-construction CHOICEs with no constraint signal. When a schema is built from scratch (no parse history) and a CHOICE is over multiple alternatives that all admit the same children (CHOICE[STRING "model", STRING "data"] for a BUGS block keyword), nothing in the abstract content picks. The universal cassette + per-grammar cassette provide deterministic defaults, but the choice is opinionated and may not match the source author’s intent. Decorate an abstract schema is the canonical workflow.
  • Heredoc / raw-string synthesis. The universal cassette returns "" for heredoc / raw-string content because the actual text is parse-time-dependent. Synthesised vertices with such kinds but no captured literal-value will emit empty. Quivers and similar synthesis pipelines should populate literal-value explicitly.
  • Operator precedence in synthesised expressions. No precedence-driven parenthesisation pass exists. Parsed schemas preserve original parens via interstitial constraints, but synthesised binary_expression schemas may emit ambiguously and re-parse to a different tree. A precedence pass on the schema before emit would close this gap; deferred.

The deeper limitation, structurally: tree-sitter’s parse-table state (the actual record of which CHOICE alternative the parser took at each step) is not exposed through the C API. The only ALIAS-disambiguation signal tree-sitter 0.25 / 0.26 surfaces is grammar_name() (the pre-alias SYMBOL name), which is consumed via pre-alias-symbol. A future upstream tree-sitter change exposing production / reduce IDs would let the emitter trace the parse exactly, eliminating the heuristic tiers entirely.

See also

Composing protocols by colimit

In plain terms

Every schema language has a lot in common with every other. They all have records, fields, references between records, ways to constrain what values a field can take, and ways to express that something is one of a fixed set of alternatives. Building each protocol’s schema model from scratch would mean reimplementing those shared pieces dozens of times.

panproto solves this by defining each shared piece as a small theory, then combining them. The theory of graphs (vertices and edges) is one piece. The theory of constraints (predicates on edge values) is another. The theory of multigraphs (graphs that allow multiple parallel edges between the same pair of vertices) is a third. Each protocol is built by gluing these pieces together at the points where they share structure.

The gluing operation is called a colimit. Conceptually, it takes several theories plus a description of how their shared parts identify, and produces a single combined theory whose vertices and edges are the union of the inputs, with the shared parts collapsed. The result is a new theory that has all the structure of the inputs and respects all their equations.

This is why adding a new protocol to panproto is mostly a matter of declaring which building-block theories it uses and how they fit together: the colimit construction does the rest.

The construction

A protocol’s schema theory is built as the colimit of a diagram of building-block theories. The diagram is a small category whose objects are theories and whose morphisms are theory inclusions describing the shared structure.

For example, the typed-multigraph-with-W-types theory used by JSON Schema is constructed as:

ThGraph + ThConstraint + ThMulti + ThWType

where each + is a pushout (a binary colimit) along the shared sort Vertex. The result has:

  • The vertices, edges, and identity laws from ThGraph.
  • The constraint sort and predicate operations from ThConstraint.
  • The multi-edge labelling from ThMulti.
  • The instance-level W-type structure (tree-shaped data anchored at schema vertices) from ThWType.

If a colimit step fails, registration panics with a message naming the failing intermediate step (colimit ThGraph + ThConstraint over ThVertex failed: ...). This is intentional: a failed registration is a build-time bug in the theory composition, not user input.

Why colimits, specifically

The colimit construction has three properties that matter for panproto:

  1. Universality. The colimit is the minimal theory containing all the inputs and respecting their shared structure. No spurious extra equations are introduced.
  2. Existence checking. The construction is mechanical and can fail predictably. If two building blocks have incompatible equations on a shared sort, the colimit step fails at registration time.
  3. Functoriality of migration. Because protocols are colimits, a migration between two protocols can be defined componentwise on the building blocks; the colimit assembles the components into a single protocol-level migration.

The merge operation in schema version control is also a colimit (specifically a pushout); the same machinery powers both.

Reusable building blocks

The shared theory library lives in crates/panproto-protocols/src/theories.rs. The major pieces:

TheoryPurpose
ThGraphVertices and edges, with source and target operations.
ThConstraintVertex-attached constraints (a dependent Constraint(v: Vertex) sort).
ThMultiParallel edges distinguished by edge labels.
ThWTypeRecursive type constructors (W-types) at the instance level.
ThMetaMetadata edges on instance nodes.

The library also exposes higher-level pieces built by composing these (ThSimpleGraph, ThHypergraph, ThInterface, ThFunctor, ThFlat, ThGraphInstance) for protocols that want to start from a richer base.

A protocol’s registration function is a recipe for combining these. To define a new protocol, see Build a custom protocol.

Cross-protocol translation has two mature precedents. The data-exchange line (Fagin, Kolaitis, Miller, and Popa on universal solutions and the chase (Fagin et al. 2005); Fagin, Kolaitis, Popa, and Tan on second-order tgds as a closed-under-composition class (Fagin et al. 2005)) sets the logical machinery: a universal solution is a colimit; composition needs a Skolemised second-order extension; cores pick out the canonical materialisation. The categorical version is the CQL line of Schultz & Wisnesky (2017) and Schultz et al. (2017), where a schema mapping is a functor, universal solutions are colimits in the category of instances, and data integration is a pushout. The engineering-IR line (Apache Calcite (Begoli et al. 2018), Substrait, Apache Arrow, MLIR) supplies the hub-and-spoke adapter pattern. panproto’s colimit and pushout_by_name run the same construction over GAT-presented wire-format schemas. See Related work for the full discussion.

See also

Schema version control semantics

In plain terms

panproto-vcs is git, but for schemas. It tracks a history of schemas the way git tracks a history of source files: commits, branches, tags, merges, diffs, blame. The CLI verbs are the same (init, add, commit, branch, merge, log, diff).

Two things make it different from git applied to the schema files themselves:

  1. The diff and merge operate on the schema, not the text. schema diff does not show you a unified diff of the JSON; it shows you what changed structurally: which vertices were added, which edges renamed, which constraints tightened. Merge does not three-way-merge the bytes; it merges the schema graph at the structural level, so you cannot end up with a syntactically valid but semantically broken schema after a merge.
  2. Data and migrations are versioned alongside the schemas. Every commit records a schema snapshot and (optionally) the data instances that conformed to it; migrations between schemas are stored as their own content-addressed objects, paired with the complements needed to invert them. Branches diverge with their data; merges reconcile both.

The merge operation is the place where this gets interesting. Three-way text merge fails when both sides edit the same line. The schema-level analogue is two branches that both add a field with the same name but different types. panproto-vcs has a precise, well-defined operation for resolving this: the schemas are pushed out along their common ancestor. The result is the smallest schema containing both branches’ additions, with the conflict surfaced as an explicit refinement constraint that the user resolves.

The DAG

panproto-vcs is structured exactly like git: a content-addressed DAG of immutable objects.

ObjectWhat it holds
FileSchema / SchemaTree / FlatSchemaA schema at a point in time, in per-file, tree, or migration-endpoint form.
MigrationA morphism between two schemas, identified by their object IDs.
ComplementThe complement data needed to invert a data migration.
DataSetA set of instances conforming to a specific schema.
CstComplementThe format-preserving CST data for byte-identical reconstruction.
Protocol / Theory / TheoryMorphism / Expr / EditLogSupporting objects referenced by commits and migrations.
CommitA pointer to a schema, an optional pointer to data, a parent commit list, an author, a message.
TagAn annotated tag object pointing to another object.
BranchA mutable reference to a commit; lives under .panproto/refs/heads/.

Every object is content-addressed with a blake3 hash of its canonical serialisation. Refs (branches under refs/heads/, tags under refs/tags/) live under .panproto/refs/. Objects live under .panproto/objects/. The structural similarity to .git/ is intentional: the existing mental model transfers.

Merge as pushout

A three-way merge in git is: take base , ours , theirs , and produce a result that contains the changes from relative to and the changes from relative to . When the changes overlap on the same line, conflict.

The schema analogue: , , are schemas; and are both descendants of . The merge result is the pushout of and along :

        B ------> O
        |         |
        |         |
        v         v
        T ------> M

The pushout is the unique smallest schema containing both and and respecting their shared structure from . “Unique smallest” is made precise by a universal property: any other schema that also contains and admits a unique morphism from to .

panproto-vcs does not just compute the pushout: it verifies the universal property. vcs::merge::verify_pushout_universal checks that the merge result mediates uniquely from any alternative cocone, returning the mediator vertex map. If the universal-property check fails, the merge raises PushoutError::UniversalFactorizationFailure rather than producing a wrong result.

For the formal pushout construction, the cocone definition, and exactly what is checked, see Pushouts and merge.

Conflicts

A merge conflict arises when the pushout would introduce an inconsistency: two branches add a field with the same name but incompatible types, or one branch removes a vertex the other branch still references. Conflicts are reported as explicit objects (rather than text markers) and resolved by editing the conflict descriptor.

Data versioning

Commits can carry data instances. When a branch’s schema migrates, the data carried by its commits is automatically lifted forward by the migration’s lens. Branches can therefore diverge in both schema and data; merging both kinds of divergence in one operation is what schema merge does.

A consequence: history rewriting (rebase, amend) on a branch carrying data must lift the data through the rewritten history. panproto-vcs does this; the data is not a passive blob.

Two threads sit directly behind panproto-vcs. The categorical-VCS lineage (Mimram and Di Giusto on patches as morphisms with merge as pushout (Mimram and Giusto 2013), Angiuli and colleagues’ homotopical patch theory (Angiuli et al. 2014), Roundy’s Darcs (Roundy 2005)) supplies the “merge is the pushout of the divergent patches against the common ancestor” semantics and the diagnosis of conflicts as failures of the pushout to exist. The schema-evolution lineage (Curino, Moon, and Zaniolo’s PRISM workbench (Curino et al. 2008) and Litt, van Hardenberg, and Henry’s Cambria [Litt et al. (2020); Litt et al. (2021)]) supplies the engineering vocabulary: schema-modification operators with forward and backward mappings, quasi-inverses for the operators that lose information, and a directed graph of schema versions connected by lenses. panproto-vcs is the four-artifact unification of these lines, with the protocol theory, schema, data, and lens complement committed together into a single content-addressed DAG. See Related work for the full discussion.

See also

What panproto verifies

panproto’s correctness rests on a small set of properties that are mechanically checked. Some are verified at compile time (panic on failure during protocol registration); some are verified at runtime when the operation is invoked; some are verified by property-based tests in CI. This page is the catalogue.

If a property is in this list, the implementation enforces it. If you can construct a counterexample, that is a bug.

PropertyWhere checkedFailure modeSource
Protocol registration produces a valid theoryCompile-time (panic at registration)Named intermediate colimit step in panic messagepanproto-protocols/src/theories.rs
Schema validates against its protocolRuntimeschema validate exits non-zero with the failing equationpanproto-schema
Migration existence conditions holdRuntime, before any data is movedschema check exits non-zero, naming the missing inputpanproto-check
Migration type-checks at the GAT levelRuntime, on demand via --typecheckschema check --typecheck exits non-zero, naming the offending sort or operationpanproto-mig
Lens GetPut law: put(s, get(s), complement(s)) = sCI property tests, every lens combinatorProperty-test failure with shrunk counterexamplepanproto-lens/src/laws.rs
Lens PutGet law: get(put(s, v, c)) = vCI property tests, every lens combinatorProperty-test failure with shrunk counterexamplepanproto-lens/src/laws.rs
Lens PutPut law: put(put(s, v₁, c), v₂, c) = put(s, v₂, c)CI property tests, every lens combinatorProperty-test failure with shrunk counterexamplepanproto-lens/src/laws.rs
Source-code emit round-trips its grammar’s full corpus (emit(parse(emit(s))) == emit(s) plus vertex-kind and edge-shape multiset preservation, on every corpus entry, for every protocol in the verified set)CI test (emit_corpus_audit)Test panic naming the protocol and first divergent corpus entrypanproto-parse/tests/emit_corpus_audit.rs
Complement composition compatibilityRuntime, on Complement::composeLensError::ComplementFingerprintMismatchpanproto-lens/src/asymmetric.rs
Complement composition agreementRuntime, on Complement::composeLensError::ComplementConflict (with offending key)panproto-lens/src/asymmetric.rs
Protolens composition: structural equality of the intermediate endofunctorRuntime, on vertical_composeLensError::CompositionMismatchpanproto-lens/src/protolens.rs
Pushout cocone commutativityRuntime, on every colimit constructionReturned as part of ColimitResultpanproto-gat/src/colimit.rs
Pushout universal property: every alternative cocone factors uniquely through the pushoutRuntime, on demand via verify_universalEquationNotPreservedpanproto-gat/src/colimit.rs
Schema merge universal propertyRuntime, on vcs::merge::verify_pushout_universalPushoutError::UniversalFactorizationFailurepanproto-vcs/src/merge.rs
Expression evaluation totality (within step budget)Runtime, on every evaluationExprError::StepLimitExceededpanproto-expr/src/eval.rs
Expression arithmetic overflow checkRuntime, on every arithmetic opExprError::Overflowpanproto-expr/src/builtin.rs
Expression division by zero checkRuntime, on Div/ModExprError::DivisionByZeropanproto-expr/src/builtin.rs

Source-code emit coverage

The source-code emitter (emit_pretty) is verified against the full upstream test/corpus/ of 255 of the 261 vendored tree-sitter grammars: every corpus entry round-trips under the strict oracle described in the row above. The remaining six are blocked upstream, not by the emitter:

  • comment, todotxt, wolfram model their content as opaque free-text spans, so the grammar gives the emitter no structure to reconstruct and the captured text is dropped on emit (a corruption the char-multiset detector flags).
  • less is compiled against an older tree-sitter ABI than the 0.26 runtime loads, so its parser yields only error nodes; there is nothing to round-trip until the grammar is re-vendored.
  • move has no let-binding production in the vendored grammar, so real source already parses to an error tree on the way in; this is a parse-layer defect, not an emit one.
  • test parses tree-sitter’s own corpus format, whose === and --- delimiters collide with the corpus reader, so it cannot be exercised inside the harness.

The six are the irreducible residual under the current grammars and runtime; closing any of them needs an upstream grammar fix, an ABI re-vendor, or a harness change rather than emitter work.

What is not verified

The following properties are not mechanically checked and should not be assumed:

  • Performance characteristics. The implementation does not guarantee any particular complexity bound on lens composition, colimit construction, or migration application.
  • Round-trip stability of value-level transforms across data with information loss. A migration that drops a field cannot round-trip the dropped data; the lens laws apply only to the surviving structure.
  • Equivalence of two protocols with isomorphic theories but different parsers. Two protocols whose theories are the same up to isomorphism are still distinct from panproto’s perspective.
  • Application-level invariants not expressible in the schema theory. “Email addresses must contain @” is checked only if the schema actually carries a constraint expressing it.

See also

Architecture

panproto is a layered Rust workspace. The bottom layer is the GAT engine; everything else builds on it. The top layer is the surfaces (CLI, WASM, PyO3, C ABI) that downstream users actually call. Each layer depends only on layers below it; the dependency graph is acyclic.

Layering

graph TD
    subgraph "Surfaces"
        CLI[panproto-cli<br/>schema binary]
        WASM[panproto-wasm]
        PY[panproto-py]
        C[panproto-c]
        REPL[panproto-repl]
    end

    subgraph "Facade"
        CORE[panproto-core]
    end

    subgraph "Higher operations"
        VCS[panproto-vcs]
        GIT[panproto-git]
        XRPC[panproto-xrpc]
        GITREMOTE[panproto-git-remote]
        PROJ[panproto-project]
        CHECK[panproto-check]
    end

    subgraph "Codegen"
        LLVM[panproto-llvm]
        JIT[panproto-jit]
    end

    subgraph "Pipeline"
        MIG[panproto-mig]
        LENS[panproto-lens]
        LENSDSL[panproto-lens-dsl]
        IO[panproto-io]
        PARSE[panproto-parse]
        GRAMMARS[panproto-grammars<br/>+ grammars-{all,web,data,jvm,<br/>scripting,systems,functional,<br/>devops,mobile,music}]
    end

    subgraph "Protocols and theories"
        PROTOS[panproto-protocols]
        THEORYDSL[panproto-theory-dsl]
        EXPR[panproto-expr]
        EXPRPARSER[panproto-expr-parser]
    end

    subgraph "Foundation"
        GAT[panproto-gat]
        GATMACROS[panproto-gat-macros]
        SCHEMA[panproto-schema]
        INST[panproto-inst]
    end

    CLI --> CORE
    WASM --> CORE
    PY --> CORE
    C --> CORE
    REPL --> CORE

    CORE --> VCS
    CORE --> GIT
    CORE --> PROJ
    CORE --> CHECK
    CORE --> MIG
    CORE --> LENS
    CORE --> IO
    CORE --> PARSE
    CORE --> LLVM
    CORE --> JIT

    VCS --> MIG
    GIT --> VCS
    XRPC --> VCS
    XRPC --> SCHEMA
    GITREMOTE --> VCS
    GITREMOTE --> GIT
    GITREMOTE --> XRPC
    PROJ --> SCHEMA
    CHECK --> MIG
    MIG --> LENS
    MIG --> SCHEMA
    LENS --> SCHEMA
    LENSDSL --> LENS
    IO --> INST
    PARSE --> INST
    PARSE --> LENS
    PARSE --> GRAMMARS

    LLVM --> PROTOS
    LLVM --> SCHEMA
    JIT --> EXPR

    LENS --> EXPR
    MIG --> EXPR
    EXPRPARSER --> EXPR
    PROTOS --> SCHEMA
    PROTOS --> INST
    THEORYDSL --> GAT
    GATMACROS --> GAT

    SCHEMA --> GAT
    INST --> GAT
    EXPR --> GAT

The diagram shows the direction of dependency, not every individual edge. The full set is in the workspace Cargo.toml. The ten panproto-grammars-* pack crates are grouped together in the diagram: each is a leaf re-exporting a subset of the tree-sitter grammars under feature flags, consumed transitively by panproto-parse.

One arrow worth pointing out: panproto-parse depends on panproto-lens, not the other way around. The two crates meet through the enrichment_registry module in panproto-lens, a thin trait-and-registry pair the lens crate exposes for downstream crates to populate. panproto-parse installs an adapter for every parser it accepts so that protolens machinery in panproto-lens can dispatch grammar-driven enrichment synthesis without depending on tree-sitter. The mechanism is documented in Layout enrichment; the registry pattern keeps the lens crate grammar-agnostic and the dependency direction acyclic.

The boundaries

Three places in the system are boundary layers in the sense that they convert between panproto’s internal representation and an external one. They are deliberately thin and concentrated in single crates so they can be audited independently.

WASM boundary

panproto-wasm is the only crate that knows about wasm-bindgen. It exposes panproto-core to JavaScript through a slab of opaque integer handles, with MessagePack encoding for data crossing the boundary. The TypeScript SDK (@panproto/core) is a thin wrapper over the WASM exports; it adds the fluent builder ergonomics but no logic.

Python boundary

panproto-py is the only crate that knows about PyO3. It exposes panproto-core to Python with native bindings (no WASM). The Python wheel ships eleven core tree-sitter grammars; the rest are in companion panproto-grammars-* packs.

C boundary

panproto-c is the only crate that knows about C ABI. It exposes a stable C interface used by the Haskell binding (and any other non-Rust language). CBOR is the over-the-boundary format here.

The generated CLI reference

The schema binary’s --help text is the source of truth for the CLI surface. The reference/cli page is regenerated by an xtask (see xtask/src/bin/gen-cli-docs.rs) and CI fails if the page is out of date. This is the only generated file in the docs site.

Versioning

All workspace crates are versioned in lockstep at the workspace level (workspace.package.version). A release of panproto-core 0.46.0 implies every panproto-* crate at 0.46.0. The TypeScript SDK and Python SDK follow the same version. See Release process for cadence.

See also

Related work

In plain terms

panproto sits at the intersection of five older research lines: schemas as algebraic theories, bidirectional lenses, format-preserving parsing, structured version control, and cross-protocol data exchange. Each of those lines has its own canonical papers, its own working systems, and its own vocabulary. This chapter walks each one in turn, names the works that shaped panproto’s design, and says where panproto’s contribution differs.

The chapter is a capstone. Each individual idea is treated in depth in the topical explanation chapters; the goal here is the relationship between panproto and the literature, in one place.

1. Schemas as theories

The idea that a database schema is an algebraic object, and that data conforming to a schema is a model of that object, is mature. The line panproto follows most closely begins with Cartmell’s generalised algebraic theories in the 1970s and continues through David Spivak’s functorial data model in the 2010s, the CQL line developed by Schultz and Wisnesky, the ACSets framework built by Patterson, Lynch, and Fairbanks, and a recent thread of multi-model unification work led by Jiaheng Lu. panproto’s panproto-gat and panproto-protocols crates are an instance of this lineage rather than a departure from it.

Cartmell (1986) introduced generalised algebraic theories in a thesis (1978) and the closely related 1986 paper in the Annals of Pure and Applied Logic. GATs extend many-sorted equational logic with dependent sorts: the sort of a term may depend on the value of another term, so a theory can express constructions like “an edge with vertex parameters” without leaving the algebraic-theory framework. This is the formal apparatus panproto uses; the comment block at the top of crates/panproto-gat/src/lib.rs cites Cartmell directly. The Sort, Operation, Equation, and Theory types in panproto-gat cover Cartmell’s four ingredient classes (sorts/types, operator symbols, axioms, theories). The feature that makes GATs generalised rather than ordinary many-sorted equational logic is dependent sorts, the sort of a term may itself depend on terms; the type-equality / term-equality axiom split (which Cartmell treats as two separate axiom kinds) is a consequence of dependent sorts, since type-equality becomes nontrivial in their presence. panproto collapses both axiom kinds into one Equation constructor with a discriminating tag. GATs sit one rung above many-sorted equational logic and one rung below full dependent type theory, with enough expressive power to handle dependent record-style schemas without inheriting decidability problems from the higher rung.

Spivak (2012) is the headline categorical-database paper. A schema is a finitely-presented category, an instance is a Set-valued functor on it, and a translation between schemas is a functor of categories. The contribution is the equivalence between schemas-as-categories and the category of small categories: every theorem about small categories transfers to schemas. From a translation , Spivak constructs three migration functors. The pullback transports D-instances back to C-instances contravariantly; the right pushforward performs joins; the left pushforward performs Skolemized unions. The triple is an adjoint chain. panproto’s migrate operation in panproto-mig::lift and the three Σ/Δ/Π modes of CQL-style migration are direct descendants of these constructions, although panproto works with GAT presentations rather than free categories with path equivalences.

Schultz & Wisnesky (2017) is the more directly load-bearing precedent for panproto’s design. Where Spivak treats schemas as categories, Schultz and Wisnesky treat schemas as multi-sorted equational theories extending a fixed type-side: sorts are entities, function symbols are foreign keys and attributes, equations are integrity constraints. An instance is itself a theory that extends the schema by adjoining 0-ary constant symbols and ground equations; its denotation is the term model, which is initial among models of the extended theory. A schema mapping is a morphism of theories that is the identity on the type side. The same Σ⊣Δ⊣Π adjoint triple appears, but now derived from substitution along a theory morphism rather than precomposition with a functor. The TAC version of the algebraic-databases programme (Schultz et al. 2017) develops the mathematical theory at length; the JFP article is the implementer-facing presentation. Schultz and Wisnesky also push the design pattern further than the older paper does, using pushouts of schemas and instances as a global-as-view design pattern for data integration, with the integrated schema literally the pushout of source schemas. This is the construction that panproto’s Composing protocols by colimit imports wholesale and turns into the colimit and pushout_by_name functions in panproto-gat::colimit. The choice to make schemas theories rather than categories, with the type-side fixed by a meta-theory, is also panproto’s choice; the ThWType, ThGraph, ThConstraint, ThMulti, and ThMeta building blocks in crates/panproto-protocols/src/theories.rs play the role Schultz and Wisnesky’s type-side plays.

Patterson et al. (2022) sit in the same family but optimise for a different setting. An attributed C-set is a functor from a fixed finitely-presented category together with attribute data attached via a slice over a chosen C-set. The paper unifies graphs, data frames, Petri nets, and wiring diagrams in one framework and gives a Julia implementation in Catlab.jl that is performance-competitive with hand-written domain-specific data structures. Limits and colimits are computed pointwise, which makes pullback, pushout, and limit-based queries cheap. panproto and the ACSets work agree about the goal (one categorical framework, many concrete data shapes) but differ in mechanics: Patterson, Lynch, and Fairbanks define a schema as a small category equipped with a partition map separating combinatorial sorts from attribute-type sorts, then build the attributed C-set category as a slice in the C-set topos (Theorem 2 of the paper) and prove functoriality in both the schema and the attribute-type structure . panproto fixes a different meta-theory (the category of GAT presentations rather than the category of finite-product categories) and takes colimits there. The two approaches are most easily compared as instances of the same idea (a categorical schema with a parameterised instance), with ACSets giving an unusually clean implementation story for scientific computing and panproto giving an unusually direct path to cross-protocol schema engineering.

Lu (2025) is, as of 2025, the most direct adjacent claim of “categorical framework that covers several data models at once.” Lu unifies relational, XML, and property-graph data via a thin set category: a thin category with three object kinds (entity, attribute, relationship), where pullback realises the join operator and is also used to characterise multivalued dependencies, pushout realises the connected-component construction of graph data, and two reduced representations (1RR and 2RR) recover Boyce-Codd/XML normal form and fourth normal form respectively. The work is competitive with panproto on the unification axis but covers a different set of formats (relational, XML, and property graph rather than the wire-format protocols panproto targets) and stays close to the relational-DB tradition with its emphasis on normal forms and dependency closures. The honest framing for panproto is that the unification ambition is now common [Lu (2025); Patterson et al. (2022); Schultz & Wisnesky (2017)]; what panproto adds is coverage of the schema languages developers actually ship against (Avro, OpenAPI, ATProto, FHIR, the eighteen linguistic-annotation formats listed in crates/panproto-protocols/src/) and a colimit-of-presentations methodology rather than a fixed meta-theory.

Two older lines deserve named mentions even though they do not directly drive panproto’s code. The theory-colimit construction itself is older than the institutions framework: Burstall and Goguen introduced “putting theories together” via pushouts in (1977) (IJCAI 1977), developed the construction as the semantics of the Clear specification language (Burstall and Goguen 1980), and later abstracted it institution-theoretically (Goguen and Burstall 1992), with the modern textbook treatment in Sannella & Tarlecki (2012). An institution packages signatures, sentences, models, and a satisfaction relation; theory presentations in any institution form a category with colimits, computable in the signature category, and pushouts of theories then merge specifications. Concrete institutions have been worked out for equational logic, conditional equational logic, Horn-clause logic, first-order logic, and (more recently) dependent type theories; a GAT institution has not been published under that name to our knowledge, though organising GAT presentations as an institution is straightforward (signatures = GAT signatures, sentences = type and term equations, models = standard GAT models, satisfaction = the usual semantic relation). panproto’s pushout_by_name is the Burstall & Goguen (1977) move at the GAT level: pushout of GAT presentations along a shared sub-presentation. The relationship is that panproto is one instance of the theory-colimit construction, organising GAT presentations of wire-format schema languages into a category and pushing them out there. Spivak and Kent’s (2012) olog paper is the categorical-knowledge-representation cousin. An olog is a finite-limit, finite-colimit sketch with English-labelled boxes and arrows, intended as a user-facing way of authoring categorical schemas. The intuition that a schema in any of these formalisms is a category-shaped object with labels in natural language is the olog intuition; the difference is that panproto enforces the labels through the GAT type system and uses tree-sitter for the layout layer rather than freehand authoring.

Against this background, panproto’s specific contribution comes from three moves. First, presenting each industrial schema language as its own GAT and taking a colimit in the category of GAT presentations to obtain a polyglot schema theory carries the Burstall & Goguen (1977) theory-colimit construction (later abstracted institution-theoretically in Goguen & Burstall (1992) and given textbook treatment in Sannella & Tarlecki (2012)) over to wire formats. Spivak’s (2012) framework, the Schultz & Wisnesky (2017) line (CQL), the ACSets work of Patterson et al. (2022) (Catlab.jl), and Lu (2025) each fix a single meta-theory (finitely-presented categories, multi-sorted equational logic, attributed C-sets, thin set categories) and vary schemas inside it; panproto varies the theory presentation and composes presentations. Second, panproto’s crates/panproto-protocols/src/atproto.rs is the first formal categorical treatment of ATProto Lexicon. Third, the combination of GAT-presented schemas with format-preserving CST complements (Section 3) and four-artifact pushout merge (Section 4) runs the categorical machinery against the byte-level representation that the precedents do not address.

2. Bidirectional lenses and the round-trip laws

Lenses are pairs of functions and satisfying laws that make updating the small piece and propagating back to behave coherently. The modern form is from Foster et al. (2007), and the family has since grown to cover relational lenses (Bohannon et al. 2006), symmetric and edit lenses [Hofmann et al. (2011); Hofmann et al. (2012)], profunctor optics [Pickering et al. (2017); Clarke et al. (2024)], and generic-deriving libraries that automatically dispatch optic kind (Lens, Prism, Traversal) from a structural classification of focus. panproto’s panproto-lens crate, and especially its Protolens type, sit inside this family.

Foster et al. (2007) set the asymmetric template: a lens is a pair between source and view (partial functions and ), with totality, well-behavedness (GetPut, PutGet), and the stronger PutPut law for the “very well-behaved” subclass. Creation of a fresh source from a view alone is handled by extending the universe with a missing-element marker so that put a Ω defines the creation case. The Foster paper does not stop at the algebra: it builds a combinator library (id, composition, xfork, filter, prune, focus, rename, map, wmap, copy, conditional combinators). Bohannon and colleagues’ (2006) relational lenses then port the discipline from trees to relations. Their select, join_dl, and drop primitives have typing rules that require the schema’s functional dependencies to be in tree form (the attributes appearing in the FDs split into pairwise disjoint sets whose dependency graph is a forest), so that an internal record revision operation (used to define the putback of each combinator) can re-write tuples to restore the FDs after a view update. The combination gives a small calculus in which projection-, join-, and union-views admit lawful updaters, which is essentially what panproto’s ThConstraint building block in crates/panproto-protocols/src/theories.rs captures at the schema-language level: foreign-key constraints become equations in the schema theory, and any lens that respects those equations propagates them.

The closest direct precedent for “optic kind determined by edge type” is the profunctor-optics work that licenses it. Pickering and colleagues (2017) represent each optic as a polymorphic function , where the constraint class on the profunctor selects the optic kind: gives lenses (because first lifts a transformer on the component to a transformer on the pair), gives prisms (because left lifts to the sum), and gives traversals. Profunctor optics compose by ordinary function composition; concrete representations do not. The paper’s Figure 5 is a diamond of inclusions (Adapter into Lens and into Prism, Lens and Prism into Traversal) that drops out of which constraints are conjoined. The categorical update by Clarke et al. (2024) unifies the framework further: an optic is the coend , varying the two monoidal actions and recovers fifteen optic families in a single table (adapter, lens, monoidal lens, algebraic lens, monadic lens, linear lens, prism, coalgebraic prism, grate, glass, affine traversal, traversal, kaleidoscope, setter, fold), and the construction extends to mixed and -enriched settings. panproto’s OpticKind enumeration in crates/panproto-lens/src/optic.rs and the prop / variant / item edge classification in panproto-gat::sort implement the dispatch as a runtime classifier over a cross-protocol IR; the contribution is not the dispatch itself but that the indexing object is panproto’s schema graph rather than a single host-language ADT or a single-protocol schema.

Hofmann et al. (2011) introduced the symmetric variant of the lens laws. A symmetric lens carries a complement object that captures information about not present in , plus a pair that propagates in both directions through the complement. The symmetric category has a symmetric monoidal tensor product but Theorem 5.1 of the paper proves it has neither categorical products nor categorical sums. Hofmann et al. (2012) pushes the framework further by replacing whole-state transformations with edits: each object is a module pairing a set with a monoid of edits and a partial monoid action, and a lens is a pair of stateful monoid homomorphisms threading a complement set under a consistency relation . Theorem 7.1 of the edit-lens paper proves a one-to-one correspondence between equivalence classes of edit lenses over canonical “differ” modules (whose edit monoid is overwrite-only) and the state-based symmetric lenses of Hofmann et al. (2011). This is what licenses panproto treating the complement as a first-class versioned artifact: an edit-lens-style complement is exactly the thing the schematic VCS commits alongside data and schema. Pre-flight check Complement::is_compatible, the Object::Complement and Object::CstComplement variants in panproto-vcs::object, and the partial-monoid merge rule on the complement (see Lenses and round-trip laws) all sit on this foundation.

Litt and colleagues’ (2020) Cambria essay is the engineering precedent for “schema migration as a graph of composable lenses” and was directly influential on the panproto migration story. The PaPoC 2021 paper that grew out of the essay (Schema Evolution in Distributed Systems with Edit Lenses) makes the connection to Hofmann et al. (2012) explicit: a schema migration is an edit lens, replays of past edits compose through the lens, and divergent branches reconcile through the symmetric structure. panproto’s panproto-mig::compose function composes morphisms in essentially the Cambria way; the difference is that the migration is decided in the GAT category rather than over JSON-schema lenses alone, and the four-artifact bundle (protocol theory, schema, instance data, lens complement) is versioned together in panproto-vcs.

Pacheco & Cunha (2011) give the algebraic counterpart: an equational calculus over point-free lenses in which the standard point-free laws (id, -assoc, -functor, -fusion, distl-nat, -nat) lift to the bidirectional setting, and bidirectional fold and unfold satisfy uniqueness so that fold-fusion can be applied directly at the lens level. They show that the lens category is not as well-behaved as (no categorical products, sums, or exponentials, only the corresponding bi-functors and projections-with-defaults) and mechanise the calculus in the pointless-lenses and pointless-rewrite Haskell libraries. Pacheco et al. (2012) carries the same idea over to delta lenses, where the change-propagation account replaces whole-state updates. panproto’s symbolic simplification of ProtolensChain in crates/panproto-lens/src/protolens.rs extends the same move: where the Pacheco line operates on XML schemas and inductive data types in Haskell, panproto’s chains span protocol theories joined by coercion morphisms across the 50-protocol catalogue.

The categorical foundations of lenses have been deepened considerably since Foster et al. (2007). Diskin and colleagues’ (2011) delta-lens line gives the change-propagation account that panproto’s data-migration story implicitly relies on. The dependent-optics and categorical-cybernetics line (Riley’s Categories of Optics; Spivak’s Generalized Lens Categories via Functors; Capucci, Gavranović, Hedges, and Rischel on cybernetic systems; Niu and Spivak on polynomial functors) generalises the framework to indexed and parameterised optic families and to dependent types. The contrast with the recent dependent-optics literature is that panproto ships a runtime for prop/variant/item dispatch over a cross-protocol IR, applying the existing categorical machinery to the 50-protocol catalogue.

3. Format-preserving parse and emit

The byte-equal round-trip law has been the goal of a long line of lossless-CST work: rust-analyzer’s rowan, Microsoft Roslyn, Instagram’s LibCST, the Prettier and Biome formatter trees, and a research line around Spoofax and Stratego. Each of these targets one language. Two threads from the bidirectional-transformations literature give the more general design: lens-based config editors (Augeas, Boomerang) and grammar-based bidirectional pretty-printers (BiYacc and its cousins). panproto’s ParseEmitLens (in crates/panproto-parse/src/parse_emit_lens.rs) is positioned between the two.

Lutterkort (2008) is the closest analogue to panproto’s approach at a framework level. Augeas treats every config file as an XPath-like tree backed by the file bytes, and ships hand-written lenses (in a small ML-style DSL) that map file ↔ tree. The lens primitives (key, label, seq, store, del) act on regular-expression matches; the combinators (subtree, concatenation, union, iteration) compose them with a Boomerang-derived typechecker that enforces unambiguous concatenation, union, and iteration over regular languages. Augeas’s grammar formalism is restricted to regular languages (the paper’s Section 6 lists removing this restriction as future work, since it precludes context-free formats like httpd.conf), so it handles /etc/hosts, /etc/passwd, and the like, but not nested-context formats. panproto’s ParseEmitLens reuses the same lens factorisation but lifts the parser layer to tree-sitter (or one of the structured format codecs in crates/panproto-io), so the regular-language ceiling is gone.

The Boomerang line is the algebraic core that Augeas builds on. Foster et al. (2007) give the original combinators for bidirectional tree transformations; Bohannon et al. (2008) adds resourceful lenses with key and match primitives so that the put direction can align reorderable chunks by key rather than by position. They prove that resourceful lenses are quasi-oblivious: a lens is quasi-oblivious with respect to an equivalence if (the EquivPut law), which is the condition that lets the put direction respect reorderings as if they were equalities. Foster et al. (2008) then generalises to quotient lenses: lenses well-behaved modulo programmer-specified equivalences and , with canonizers (canonize / choose) inserted anywhere in a chain via lquot and rquot operators. The quotient-lens typechecker uses a coarse Identity / Any approximation that scales to the 4200-line Boomerang lens for the UniProtKB/SwissProt protein-sequence format reported in Section 8 of the quotient-lens paper. panproto’s complement carries the same role as the dictionary skeleton in resourceful lenses and the canonizer state in quotient lenses: a sidecar that records what get discarded so that put can put it back in a position-and-key-preserving way. The ParseEmitLens does not yet expose lquot/rquot to users (the canonizer story for parse/emit is implicit in the layout-policy machinery), but the schema-level constraint vocabulary already carries the equivalence relations the canonizers operate on.

The grammar-based thread is the other ancestor. Zhu et al. (2015)’s BiYacc shows that a single DSL program can denote both a parser and a reflective printer: a putback that takes both the abstract syntax tree and the original concrete string and produces an updated concrete string that preserves syntactic characteristics (extra parentheses, whitespace, encoded negations) of the original. BiYacc programs satisfy GetPut and PutGet, and compile to BiFluX for execution. panproto does not compile from a bidirectional grammar; it inherits the grammar from tree-sitter and supplies the bidirectional layer separately, but the contract is the same one BiYacc proves: parse ∘ emit is the identity on the canonical schema, and emit ∘ parse preserves layout where the lens complement records it. Brunsfeld (2018) is the underlying parser-generator citation; the panproto contribution is the lens factorisation on top of 261 grammars, not the parser technology itself.

Two adjacent threads belong in the same neighbourhood. de & Visser (2012) give a token-stream-and-origin-tracking algorithm for layout preservation under AST refactorings: each AST node points into the token stream, the ConstructText reconstruction algorithm (abbreviated CT in proofs) reuses original text for unchanged subterms and pretty-prints only newly constructed ones, and a DIFF over the new list against origin tags handles inserted and deleted children. They prove correctness () and a maximal-layout-preservation theorem, and round out the system with heuristic comment-binding patterns (Preceding(1), Preceding(2), Succeeding(1..3)). Their Section 7.5 notes that the lens framework loses the fine-grained relation between subterms and text fragments needed for layout preservation, and singles this out as the obstacle the reflective-printer trick in BiYacc responds to. panproto sits closer to the de Jonge–Visser side: the complement records exactly the byte-position and interstitial information the AST omits, and emit_pretty reuses it when it is present and falls back to a layout policy when it is not. Pombrio & Krishnamurthi (2014) takes the same lens framing in a different domain: their CONFECTION system desugars and resugars syntactic-sugar pattern rules, Section 6.1 frames the desugar/unexpand pair explicitly as a lens, and Theorem 1 in Section 6.1.2 proves PutGet iff a disjointness condition on the rule list holds. The resugaring properties are Emulation and Abstraction (Theorems 3, 4, 5) plus Coverage (informal; the paper explicitly says Coverage is not formalised and can only be strived for in practice), and they translate directly to what panproto’s layout-enrichment fibration is doing one level up: stripping a constraint sort is the desugaring step, and re-attaching it via a registered synthesis driver is the resugaring step.

4. Schema versioning as structured merge

Two distinct precedents shape panproto’s VCS layer. The categorical-VCS line owns the merge-as-pushout semantics. Mimram & Giusto (2013) gave the formal account; Pijul is the working CLI implementation; Angiuli and colleagues’ (2014) homotopical patch theory recasts the same construction inside homotopy type theory. In all three, files (or patches between files) are objects and morphisms in a category, and a three-way merge is the pushout of the patches against the common ancestor. Conflicts are failures of the pushout to be defined, or of the universal property to factor. Roundy’s (2005) Darcs is the engineering ancestor that motivated this thread.

Mimram & Giusto (2013) fix the simplest interesting case: a file is a function from a length- index set into an alphabet of lines, and a patch is an injective increasing partial function on the index sets that preserves labels wherever it is defined (undefined positions are deletions; positions in the codomain outside the image are insertions). The category of files and patches is the free monoidal category on insertion generators and deletion generators subject to (deleting an inserted line is identity); the subcategory of total injective patches is the free monoidal category on the insertion generators alone. Concurrent edits give a span , and the natural definition of “merging” is the pushout of this span. The central observation is that is not finitely cocomplete: the patches that insert different lines at the same position have no pushout in , which is the categorical face of a textual conflict. Theorem 15 of the paper describes the free finite conservative cocompletion of the insertion-only subcategory as the category of finite sets equipped with a transitive relation and morphisms that are functions respecting the relation; the labelled and partial-function generalisation (Theorem 20, p. 12) adds labels and replaces functions with partial functions, giving the full cocompletion of . Pushouts in are computed as disjoint unions quotiented by the span equivalence and equipped with the transitive closure of the inherited relations; cycles can arise during cocompletion (Example 18 of the paper exhibits one explicitly), which is how the framework records “the user must pick between two orderings of a conflicting line.” The single-file insertion-only construction generalises to insertion-and-deletion (Theorem 20, partial functions instead of total ones) and is left in the paper as a target for richer base categories (multi-file, XML trees, event-structure repositories). panproto’s merge.rs does not implement the partial-function version literally, but the move it makes is the same one: detect that the naive pushout fails in the schema-theory category , cocomplete by adjoining a conflict descriptor that records the divergent additions as parallel arrows, and resolve by choosing a section.

Angiuli, Morehouse, Licata, and Harper (2014) make the same construction but inside homotopy type theory, where the answers come out cleaner because the groupoid laws are free. A patch theory in their setting is a higher inductive type whose points are repository contexts, whose 1-paths are patches, and whose 2-paths are patch laws. Identity, inverse, and composition come from the path operations refl, !, and ; the groupoid laws (associativity, units, inverses cancel) are automatic because the identity type is already a -groupoid. A model of a patch theory is then a function out of the HIT into a universe of sets and bijections, and functorial semantics makes the model preserve patch laws automatically: every program that consumes the HIT must respect the equations between paths. The paper’s three running examples (an integer counter that is literally the circle; a fixed-length document with a swap-lines patch and an indep / noop law saying edits to independent lines commute; an add/remove patch theory indexed by histories of compositions) culminate in a merge operation that takes a span of diverging patches and returns a cospan completing it into a commuting square (Section 3.2). Their merge is more flexible than the Mimram–Di Giusto pushout in one direction (it accepts arbitrary divergent patches, with a “merge undoes both” fallback when nothing better exists) and more restrictive in another (it requires inverses to live inside the theory, which the authors flag in Section 7 as “conceptually questionable and practically problematic”). Their workaround is to index contexts by patch histories and define merge over History rather than over the underlying file (Section 6.4), which is structurally what panproto-vcs already does when it commits the lens complement alongside the schema: the complement carries the history information that a pure file-level patch category cannot. The homotopical framing also makes one of panproto’s design choices explicit. The Mimram–Di Giusto allows “cyclic” objects to represent conflicts; the Angiuli et al. patch-with-laws example uses a 2-path to equate the two orderings rather than equipping the merge result with a cyclic order. panproto sits between the two: a vertex-set conflict is reified as a conflict object the user resolves, but a permutation of independent additions does not produce a new object because the underlying GAT theory already quotients by the appropriate equations.

The schema-and-data VCS line owns the engineering of “a Merkle DAG of commits with schema + data inside each commit.” TerminusDB, Dolt, and the Noms prolly-tree lineage are working examples; Merkle (1988) is the hash-tree primitive they all rely on, and the operational git tradition that Chacon & Straub (2014) document is the immediate engineering precedent. None of these systems frames merge categorically; they treat schema conflicts operationally (cell-by-cell, table-by-table) and have no notion of a lens complement attached to a commit.

Curino, Moon, and Zaniolo’s PRISM workbench (2008) is the load-bearing precedent at the operator level. PRISM defines eleven Schema Modification Operators (SMOs): CREATE TABLE, DROP TABLE, RENAME TABLE, COPY TABLE, MERGE TABLE, PARTITION TABLE, DECOMPOSE TABLE, JOIN TABLE, ADD COLUMN, DROP COLUMN, RENAME COLUMN (plus NOP), each translated to a pair of Disjunctive Embedded Dependencies (DEDs): forward DEDs that describe the migration of data from the old to the new schema, and backward DEDs that describe the reverse. The DED pairs feed a chase-and-backchase query rewriter (Deutsch and Tannen’s MARS engine) so that legacy queries written against an old schema version are automatically translated to run against the new one. Section 4.2 of the paper lists six operators with a perfect unique inverse (RENAME COLUMN, RENAME TABLE, PARTITION TABLE, CREATE TABLE, ADD COLUMN, plus NOP) and Table 3 marks JOIN TABLE and DECOMPOSE TABLE as perfect-inverse-only under additional conditions (“yes/no”); the remaining operators (DROP TABLE, COPY TABLE, MERGE TABLE, DROP COLUMN) require a quasi-inverse in the sense of Fagin, Kolaitis, Popa, and Tan: the inverse is not the strict mapping inverse but the “best one can do to recover ground instances.” Figure 3 places each operator on a redundancy / information-preservation grid that tells the database administrator what they are getting up front. PRISM was stress-tested against the full 170-version, 4.5-year schema-evolution history of MediaWiki/Wikipedia and reported 97.2% of evolution steps fully automated; on the 66 most common query templates from the Wikipedia query log, 74% of queries (worst-case across the version timeline) were automatically supported, compared to 84% under manual rewriting. A separate 27-query subset of those templates was used for the runtime-performance comparison in Figure 6 of the paper. panproto inherits two diagnostic ideas directly. First, the PRISM observation that schema change without a record-level lift story is incomplete survives intact: panproto’s panproto-mig::existence check is the equivalent gate, refusing to certify a migration whose data side is not constructible. Second, the quasi-inverse / perfect-inverse partition is the precedent for the CoercionClass lattice (Iso / Retraction / Projection / Opaque) in crates/panproto-gat/src/sort.rs: each class is a precise statement about which kind of inverse a lens carries, the same diagnostic move PRISM makes at the SQL-operator level. Where panproto goes elsewhere is the embedding into a Merkle DAG with categorical merge: PRISM treats each schema-evolution step in isolation and rewrites queries through a chain of DEDs; panproto commits each step into a content-addressed DAG and merges divergent branches by pushout, with the SMO-style information-preservation report as a side output rather than the central artifact.

Litt and colleagues’ (2020) Project Cambria essay, and the PaPoC 2021 paper it grew into (Litt et al. 2021), are the immediate lens-based precedent. Cambria’s data model is a directed graph in which nodes are schema versions and edges are bidirectional lenses; a document tagged with its writing schema is translated to a reading schema by composing the lenses along the shortest path through the graph. The essay credits Hofmann, Pierce, and Wagner’s edit lenses (Hofmann et al. 2012) as the direct inspiration; the PaPoC paper extends the construction with the framing that JSON-patch operations are the edits a Cambria lens propagates. The system was integrated with the Automerge CRDT to demonstrate read-time, per-replica translation in a peer-to-peer setting. panproto’s panproto-vcs::merge builds on the same idea but takes a step the Cambria graph does not: the schema-version graph is promoted from a navigation aid (find a path of lenses) to a content-addressed DAG with pushout-based three-way merge, and the lens complement, the schema, the protocol theory, and the data instances are committed together as one Merkle object rather than living in separate artifacts. The Cambria authors’ own list of open problems (recursive schemas, cross-document migrations, data augmentation, lens repositories) reads as a roadmap that panproto’s GAT-and-CST machinery is set up to answer, although none of those answers should be claimed without further work.

panproto’s contribution is the four-artifact unification of these two lines. A panproto-vcs::CommitObject (in crates/panproto-vcs/src/object.rs) carries a protocol theory ID, schema ID, data set IDs, lens complement IDs, and CST complement IDs together. Three-way structural merge (crates/panproto-vcs/src/merge.rs) detects overlap via pullback and promotes it to pushout in the underlying theory category, with verify_pushout_universal checking the real universal property at the vertex level. Vertex-level factorisation is verified; edge-level factorisation is deferred. No prior system places protocol theory, schema, instance data, and lens complement into one DAG with pushout-based merge semantics; the combination is what panproto-vcs contributes on top of the two precedents.

A second contrast worth recording is with Shapiro and colleagues’ (2011) conflict-free replicated data types. CRDTs and structured-merge VCS solve nearby problems by very different means: CRDTs guarantee strong eventual convergence by restricting the algebra of allowed operations; panproto-vcs allows arbitrary schema operations and decides merge by category-theoretic universal property. The two are complementary rather than competing; cf. Shapiro et al. (2011) for the canonical statement of the CRDT side, and the Cambria integration with Automerge (Litt et al. 2021) for one way the two layers can compose.

5. Cross-protocol translation and data exchange

Translation between schema languages has two well-developed precedents: the data-exchange tradition that runs from Fagin, Kolaitis, Miller, and Popa’s ICDT 2003 paper through Fagin, Kolaitis, Popa, and Tan and the Clio project, and the engineering-IR tradition that runs from Apache Calcite through Substrait, Apache Arrow, MLIR, and ONNX. panproto sits between them: it inherits the universal-solution and second-order-tgd machinery from the data-exchange line and the hub-and-spoke adapter architecture from the IR line, and turns both into one pipeline over GAT-presented schemas.

Fagin et al. (2005) establish the core of a universal solution as the canonical materialised target instance. A data-exchange setting is a quadruple of source schema, target schema, source-to-target tuple-generating dependencies (tgds) , and target tgds and equality-generating dependencies (egds). A universal solution for a source instance is a target instance from which every other solution receives a homomorphism; universal solutions exist iff solutions exist, and one can be computed by the chase when is weakly acyclic (Theorem 2.6, originally from the 2003 paper). Universal solutions are not unique up to isomorphism: Example 3.1 exhibits an infinite family of nonisomorphic universal solutions of different sizes for one source instance. The paper picks out the core of (the smallest substructure that is also a homomorphic image of ) as the unique smallest universal solution up to isomorphism and proves that it is preserved under chase steps for tgds and egds (Proposition 3.4). Computing the core of an arbitrary structure is intractable: Theorem 4.2 shows CORE IDENTIFICATION is DP-complete, even on undirected graphs. Inside data exchange the picture is cleaner: Theorem 5.2 (greedy algorithm) gives a polynomial-time algorithm for computing the core when is a set of tgds and is a set of egds, Theorem 5.9 (blocks algorithm, ) extends to the case where the source is unavailable (relevant when Clio has materialised a universal solution and discarded the source), and Theorem 5.15 (Algorithm 5.14) carries the blocks algorithm to the egd case. The core is also the right answer for certain-answer semantics: Proposition 6.4 proves that for existential queries , equals the certain answers on universal solutions, and the core is the unique universal solution (up to isomorphism) with this property for every conjunctive query with inequalities . The CoercionClass lattice (Iso / Retraction / Projection / Opaque) in crates/panproto-gat/src/sort.rs is the Fagin-style invariant transplanted to the schema-presentation level: each class records what kind of homomorphism a cross-protocol coercion carries, which directly determines whether the lifted instance is universal, sub-universal, or projection-only at the target.

Fagin et al. (2005) answer the next question: what is the right language for composing schema mappings? A composition takes the source-to-target relation defined by two successive mappings and asks whether it can be defined by a single mapping in the same logic. Proposition 3.4 of the paper exhibits a composition definable by an infinite set of source-to-target tgds but by no finite set; Proposition 3.5 strengthens this to a composition undefinable by any (finite or infinite) set of source-to-target tgds; Theorem 3.6 sharpens further with an explicit construction whose composition query is NP-complete (by reduction from 3-COLORABILITY) and is not definable in , hence not in least-fixed-point logic. The paper introduces second-order tgds (SO tgds): existential second-order formulas with Skolem function symbols and equalities between terms, satisfying a safety condition that makes them domain-independent. Theorem 4.6 proves that SO tgds are closed under composition with an explicit composition algorithm (introduce a second-order existential for each existential first-order in ; substitute into ; collect implications). Section 5 extends the chase to SO tgds and proves the universal-solution / polynomial-time-certain-answers results carry through (Proposition 5.2, Corollary 5.3). The takeaway for panproto: the right object to compose across protocols is not the underlying schema mapping but a Skolemised second-order generalisation of it, with explicit function symbols standing for the choices the existential quantifiers make. panproto’s ProtolensChain in crates/panproto-lens/src/protolens.rs carries exactly this structure: the lens chain is a sequence of mappings with explicit complement objects standing in for the Skolem choices, composed by the chain combinator with the same closure property the SO tgd composition theorem provides.

The categorical version of the same construction is in the CQL line. Schultz & Wisnesky (2017) (CQL; previously called AQL) presents a schema mapping as a functor between schemas-as-theories, defines the adjoint triple of data migration functors, computes universal solutions as colimits in the category of instances, and gives a uniform pushout-based design pattern for data integration. Schultz et al. (2017) is the long-form mathematical treatment. The CQL implementation is the closest existing analogue of what panproto’s cross-protocol pipeline does, and the construction panproto uses for panproto-gat::colimit and pushout_by_name is structurally the same one CQL implements for relational schemas; panproto carries the construction to GAT-presented wire-format schemas and to instance data that lives natively in those formats (JSON, CBOR, Protobuf, Avro, FHIR) rather than relational tuples.

Apache Calcite (Begoli et al. 2018) is the engineering precedent for hub-and-spoke query-language IRs. Calcite’s architecture is a modular query optimiser (rules, metadata providers, two planner engines: a cost-based Volcano-style dynamic-programming engine and an exhaustive rule-application engine), a relational-algebra core, and an adapter architecture in which each backend (Cassandra, Druid, Elasticsearch, MongoDB, Pig, Spark, Splunk, JDBC dialects) supplies a SchemaFactory, a Schema, and a calling convention trait that the optimiser uses to push down operators into the backend’s native query language. The paper reports Calcite is embedded in Apache Drill, Hive, Solr, Phoenix, Kylin, Apex, Flink, Samza, Storm, MapD, Lingual, and Qubole Quark (Table 1), with several hundred optimisation rules in the rule set. panproto’s protocol-colimit machinery is the same hub-and-spoke shape applied to schemas rather than queries: each protocol crate in crates/panproto-protocols/ plays the role of a Calcite adapter, the universal protocol-graph plays the role of Calcite’s logical relational algebra, and panproto-mig::hom_search is the analogue of Calcite’s rule-application engine. The substantive difference is that Calcite’s IR is SQL-typed relational algebra (with full column types including complex ARRAY, MAP, and MULTISET constructors), augmented with pluggable traits that record physical properties such as ordering, partitioning, and the calling convention used to drive the optimiser, and the adapter contracts are written by hand; panproto’s IR is a GAT-presented schema theory, and the adapter contracts are derived by colimit from the per-protocol theory presentations. Substrait and MLIR are the same hub-and-spoke pattern in other domains: Substrait defines a cross-engine query-plan IR with producers and consumers across DuckDB, Velox, Apache Arrow, and others; MLIR provides a dialect framework for compiler IRs spanning machine learning, parallel computation, and hardware synthesis. Apache Arrow plays the role of the in-memory columnar data layout that several of these systems use as the physical exchange format.

panproto’s cross-protocol pipeline is an instance of the data-exchange/IR construction with one substantive addition: the schema, the instance data, and the lens complement are all transported together along the chain. The pipeline composes pushouts in panproto-gat::colimit, hom-search migrations in panproto-mig::hom_search, and coverage reports in panproto-mig::coverage; the CoercionClass lattice records translation loss at the schema level the way Fagin’s universal-solution / core distinction records it at the instance level; the four-artifact bundle (protocol theory, schema, instance, complement) carries the data exchange forward in a single step that a Calcite adapter or a Substrait plan cannot represent because they have no notion of a complement object.

panproto ships 50 protocols today, drawn from wire formats (Protobuf, Avro, Thrift, FlatBuffers, Cap’n Proto), JSON family (JSON Schema, OpenAPI, AsyncAPI, CloudEvents, JSON-LD, GeoJSON), graph and RDF (RDF/Turtle, SPARQL, GraphQL), columnar (Parquet, Arrow, ORC), document and configuration (XML, YAML, TOML, CUE, Dhall, Nickel), domain-specific (FHIR, ATProto Lexicon, GTFS, OpenAPI Routes), and linguistic-annotation (CoNLL-U, Universal Dependencies, PropBank, FrameNet, TimeML, etc.). Each is presented as a GAT inside crates/panproto-protocols/, and the polyglot schema theory is the colimit of those presentations in . The compositional construction (pushout of two presentations derived migration coverage report) is what Calcite gives you for SQL dialects and what CQL gives you for relational schemas. Doing it across 50 industrial wire-format schema languages with one categorical engine is, as an engineering artifact, what panproto’s cross-protocol layer adds to the literature.

What panproto adds

The five sections above place panproto in five mature research lines. Across those lines, panproto delivers the following:

  • A polyglot schema theory constructed as a colimit in the category of GAT presentations, with each of 50 industrial wire-format schema languages (Protobuf, Avro, Thrift, FlatBuffers, Cap’n Proto, JSON Schema, OpenAPI, AsyncAPI, CloudEvents, JSON-LD, GeoJSON, GraphQL, SPARQL, RDF/Turtle, Parquet, Arrow, ORC, XML, YAML, TOML, CUE, Dhall, Nickel, FHIR, ATProto Lexicon, GTFS, and the linguistic-annotation family) presented as its own GAT and composed with the rest. This is the Goguen & Burstall (1992) theory-colimit move transplanted from logics to wire formats, run at industrial scale rather than as a single worked example.
  • A universal CST factorisation across 261 tree-sitter grammars plus the major data-format codecs, with the formatting residue committed as an explicit lens complement. No prior system carries layout as a first-class versioned artifact across this breadth of languages.
  • A four-artifact pushout merge over a content-addressed DAG: protocol theory, schema, instance data, and lens complement are merged together in one operation that verifies the categorical universal property at the vertex level. The Mimram–Di Giusto, Pijul, and homotopical-patch-theory line gives the pushout-as-merge semantics for files; the PRISM line gives schema-modification operators; the Cambria line gives schema-version graphs with edit lenses; panproto unifies the three with shared content-addressing and shared merge semantics.
  • The first formal categorical semantics for ATProto Lexicon, produced as one entry in the protocol catalogue rather than as a standalone study.
  • An end-to-end pipeline (parse → decorate → lens → migrate → emit) that runs the categorical machinery against real bytes: GAT colimits for protocol composition, hom-search for migration derivation, profunctor-optics dispatch for runtime lens application, CST complements for layout preservation, and pushout merge for version control. Each stage is one named precedent from the literature; running them as one composable engine against a 50-protocol catalogue is the engineering contribution.

The topical explanation chapters elsewhere in this book work out each stage in detail.

Denotational semantics

This cluster pins panproto’s three DSLs and two structural constructions to a precise mathematical specification. Each page opens with an “In plain terms” section and then proceeds through a six-step skeleton:

  1. Surface syntax. BNF for what a user types.
  2. Abstract syntax. The Rust enum the parser produces.
  3. Semantic domain. The mathematical universe the syntax interprets into.
  4. Interpretation function. A semantic function defined by structural recursion on the abstract syntax, in the Scott-Strachey idiom. Typing is presented separately as an inductive relation ; evaluation is not a relation.
  5. Soundness. Statement of what the implementation guarantees, and which property tests or runtime checks enforce it.
  6. What is intentionally not modelled. The boundary of the formal account.

The pages in turn:

PageWhat it pins down
Shared notationThe judgement forms, environments, and meta-notation used across the others. Read this first.
Expression languagepanproto-expr: terms, types, total evaluation under a step budget.
Lens DSLpanproto-lens-dsl: the lens triple (get, put, complement), the three round-trip laws, complement composition as a partial commutative monoid.
Theory DSLpanproto-theory-dsl: GAT presentations, sort/operation/equation judgements, the colimit interpretation.
Pushouts and mergeThe pushout construction in the category of GATs, the universal property, and what the implementation verifies.
Protolens compositionProtolenses as natural transformations between schema endofunctors, the structural-equality criterion for composition, sequential vs fused instantiation.
REPL command languagepanproto-repl: state model, command interpretation, and the bare-term typecheck path.

The cluster is meant to be read by anyone who wants to know exactly what panproto guarantees. Familiarity with category theory helps but is not required: every page restates its formal content in plain terms before invoking it.

Shared notation

This page defines the meta-notation reused across the other denotational-semantics pages. Read this first.

In plain terms

We use two complementary notations:

  • Inference rules, of the form , define a relation (typically a typing relation ) inductively as the smallest relation closed under the rules.
  • Semantic equations, of the form , define a function by structural recursion on the syntax. The double brackets emphasise that we are speaking of meaning, not surface text.

The DSL pages in this cluster present the type system as inference rules and the value semantics as semantic equations: the standard denotational idiom, in the sense of Scott and Strachey. Where evaluation can fail to terminate, we lift the value domain to so that is total.

A context (also called an environment) is a finite map from names to data. Typing contexts map variable names to types; evaluation contexts map variable names to values; theory contexts map sort names to their definitions.

Environments

Three environment kinds appear repeatedly:

  • : a typing environment mapping variable names to types. Notation: extends with a fresh binding .
  • : a value environment mapping variable names to values. Notation: extends with .
  • : a theory context mapping sort names to their definitions in a GAT. Used in the theory DSL and in the pushout construction.

A judgement of the form asserts that under typing context , the expression has type . We do not use a separate evaluation judgement in this cluster: evaluation is presented as the semantic function rather than as a relation.

Inference rules

An inference rule has the form

Each premise and the conclusion are judgements. The rule asserts that whenever the premises hold, the conclusion follows. A derivation is a tree of rule applications whose leaves are axioms (rules with no premises) and whose root is the judgement being proved.

Semantic functions

The semantic function for a syntactic category is written where is the semantic domain. The subscript is omitted when context determines the category.

Common semantic functions:

  • : the set of values of type .
  • : the meaning of expression under value environment , where adjoins a bottom element to handle non-terminating or budget-exceeded computations.
  • : the meaning of a lens as a triple of functions.
  • : the category-with-families generated by a GAT presentation.

The bottom element

In the expression language, evaluation is total within a step budget but the budget can be exceeded. We model this by adjoining a bottom element to the value domain: . Reaching corresponds to ExprError::StepLimitExceeded at runtime.

Equality

Equality between values, types, and theories is definitional unless otherwise stated. Equality between schemas is structural up to alpha-renaming of bound names. Equality between morphisms is judgemental: when and have the same source, the same target, and agree on every input.

See also

  • Expression language for the first user of these conventions.
  • The other pages in this cluster all build on these conventions.

Expression language: denotational semantics

In plain terms

The expression language panproto-expr is what you use to describe field-level transforms inside a migration (“the new full_name field is the old first plus a space plus the old last”) and predicates inside a query (“only records where created_at > '2024-01-01'”). Everything in the language is pure: no IO, no mutation, no clock reads, no random numbers. Two expressions that look the same always do the same thing. Every expression terminates within a fixed number of evaluation steps.

This page describes what those properties mean and what the evaluator actually computes.

Surface syntax

The Haskell-style surface, parsed by panproto-expr-parser:

expr  ::= literal
        | ident
        | expr expr                    -- application
        | "\\" ident "->" expr         -- lambda
        | "let" ident "=" expr "in" expr
        | "if" expr "then" expr "else" expr
        | "case" expr "of" alts        -- pattern match
        | builtin "(" expr,... ")"     -- builtin application
        | expr "." ident               -- field access
        | expr "[" expr "]"            -- index
literal ::= int | float | str | bool | "null"
        | "[" expr,... "]"             -- list
        | "{" ident ":" expr,... "}"   -- record

The parser desugars if c then a else b to a Match over a boolean scrutinee with two arms, and case e of ... to a Match directly. There is no separate If or Case node in the abstract syntax.

Abstract syntax

pub enum Expr {
    Var(Arc<str>),
    Lam(Arc<str>, Box<Self>),
    App(Box<Self>, Box<Self>),
    Lit(Literal),
    Record(Vec<(Arc<str>, Self)>),
    List(Vec<Self>),
    Field(Box<Self>, Arc<str>),
    Index(Box<Self>, Box<Self>),
    Match { scrutinee: Box<Self>, arms: Vec<(Pattern, Self)> },
    Let { name: Arc<str>, value: Box<Self>, body: Box<Self> },
    Builtin(BuiltinOp, Vec<Self>),
}

Match covers both if/then/else and case/of; pattern matching is the only branching primitive. Each arm pairs a Pattern (whose variants include Wildcard, Var, Lit, Record, List, and Constructor) with a body expression. The full enum lives at crates/panproto-expr/src/expr.rs.

Type system

The type-formation grammar:

A typing context is a finite map from variable names to types. The typing relation is defined inductively by the usual rules; selected:

Builtin signatures have type schemes given in reference/expression-language; each Builtin(op, \overline{e}) rule plugs in ’s scheme and checks that the arguments match.

Semantic domain

Let be the recursive sum

interpreting Null as the singleton and the function space as partial continuous maps. Lift to to adjoin a bottom for divergence under the step budget. Environments live in , and ranked environments in to track the remaining step budget.

Semantic function

The denotational semantics is the family

defined by structural recursion on . Write and for the budget decrement. The equations:

The budget rule fires before any equation if the remaining steps are zero; otherwise the relevant equation applies and the recursive sub-denotations are evaluated with the budget decremented. Operationally this is EvalState::tick in crates/panproto-expr/src/eval.rs; when is returned the implementation surfaces ExprError::StepLimitExceeded(max_steps).

The auxiliary is the standard pattern-match search: try each in order, attempting to unify against the scrutinee value; on the first success bind the pattern variables into and evaluate ; on exhaustion raise NonExhaustiveMatch. The auxiliary is the partial function defined in crates/panproto-expr/src/builtin.rs.

Builtin side conditions

The builtins listed under reference/expression-language are individually total or partial. The partial ones return a non- error rather than :

  • Div, Mod with zero divisor: DivisionByZero.
  • Integer arithmetic overflow: Overflow (i64::checked_*).
  • *ToInt / *ToFloat on unparseable input: ParseError.
  • List index out of bounds: IndexOutOfBounds; list operations past the configured maximum: ListLengthExceeded.
  • Record access on a missing field: FieldNotFound.
  • Match exhaustion: NonExhaustiveMatch.
  • App of a non-function value: NotAFunction.

Errors are distinct from . models resource exhaustion (StepLimitExceeded and DepthExceeded); errors model defined failure and propagate as Err(ExprError) from the implementation.

Soundness

The evaluator satisfies:

  • Type preservation. If and and with , then .
  • Totality (within the budget). For every well-typed and well-typed , terminates in finitely many steps with either a value or .
  • Determinism. If and then .

Type preservation is enforced by the type-checker in crates/panproto-expr/src/typecheck.rs and by builtin signatures rejecting ill-typed arguments. Totality follows from the budget rule. Determinism follows from the absence of mutation and IO.

What is intentionally not modelled

  • Performance. Two expressions can have the same denotation but very different cost. The semantics fixes only what is computed, not how much it costs.
  • Step-budget tuning. The budget is a parameter set at the outermost call. The semantics treats it as fixed; the language itself does not expose it.
  • Floating-point determinism across architectures. Float operations follow IEEE 754, but bit-level reproducibility across hardware is not guaranteed.

See also

Lens DSL: denotational semantics

In plain terms

A lens DSL spec is a recipe for building a bidirectional transform between two schemas. You declare which fields map to which (with optional value-level expressions), and the compiler produces a triple of functions that satisfies three round-trip laws by construction. This page pins down what the spec compiles to and what “satisfies the laws” means.

Surface syntax

The Nickel surface (canonical authoring form). JSON and YAML surfaces are isomorphic via serde.

{
  id = "user.v3-to-v4",
  description = "Rename `name` and replace `age` with `years`",
  source = "dev.example.user.v3",
  target = "dev.example.user.v4",
  steps = [
    { rename_field = { old = "name", new = "display_name" } },
    { remove_field = "age" },
    { add_field = { name = "years", kind = "integer", default = 0, expr = "old.age" } },
  ],
}

Each step is a single-key object whose key selects the variant. The full step grammar is in crates/panproto-lens-dsl/src/document.rs.

Abstract syntax

pub struct LensDocument {
    pub id: String,
    pub description: String,
    pub source: String,
    pub target: String,

    // Body: exactly one of the four variants is present.
    pub steps:   Option<Vec<Step>>,
    pub rules:   Option<Vec<Rule>>,
    pub compose: Option<ComposeSpec>,
    pub auto:    Option<AutoSpec>,

    // Rule-variant metadata.
    pub passthrough: Option<Passthrough>,
    pub invertible:  Option<bool>,

    // Protocol-specific extension metadata.
    pub extensions: HashMap<String, serde_json::Value>,
}

pub enum Step {
    // High-level field combinators
    RemoveField { remove_field: String },
    RenameField { rename_field: RenameSpec },
    AddField    { add_field: AddFieldSpec },

    // Value-level transforms
    ApplyExpr    { apply_expr: ApplyExprSpec },
    ComputeField { compute_field: ComputeFieldSpec },

    // Structural combinators
    HoistField { hoist_field: HoistSpec },
    NestField  { nest_field: NestSpec },
    Scoped     { scoped: ScopedSpec },
    Pullback   { pullback: PullbackSpec },

    // Sort-level coercions and merges
    CoerceSort { coerce_sort: CoerceSortSpec },
    MergeSorts { merge_sorts: MergeSortsSpec },

    // Elementary theory operations
    AddSort      { add_sort: AddSortSpec },
    DropSort     { drop_sort: String },
    RenameSort   { rename_sort: RenameSpec },
    AddOp        { add_op: AddOpSpec },
    DropOp       { drop_op: String },
    RenameOp     { rename_op: RenameSpec },
    AddEquation  { add_equation: EquationSpec },
    DropEquation { drop_equation: String },
}

The top-level type is LensDocument, not LensSpec. The document carries source and target NSID fields naming the two schemas, and exactly one body variant (steps, rules, compose, or auto). The resolver loads the source schema and the compiler applies the body to it, then checks that the result matches the named target. See panproto_lens_dsl::compile.

Semantic domain

For schemas , and complement type , the set of lenses on is

with elements written as triples . The implementation in panproto-lens represents this triple by the Lens struct; the categorical model is the standard asymmetric-lens construction of Foster et al. (2007) with complements after Litt et al. (2020). The semantic function

takes a document and a source schema and returns a concrete lens. (The target schema and the complement type are determined by the document and the source.)

The three laws

A lens is lawful iff for all and :

panproto_lens::laws::check_get_put, check_put_get, and check_put_put are property-test runners that sample , , , from the schema’s value space and assert each equation.

Semantic equations

For each step constructor, is a function , where a is a schema-parameterised lens (see Protolens composition). The document-level semantics is the left-to-right composition of the per-step semantics, applied to the source schema:

where and . The composition operator is the sequential lens composition

For a representative subset of steps:

where is the trivial complement (a singleton) and on the right-hand side of is the expression-language semantics. The remaining 15 step constructors follow the same pattern; the implementation of each lives under crates/panproto-lens-dsl/src/steps.

Composition between adjacent step semantics is gated at construction time by protolens_composable: see Protolens composition. Compilation in the implementation is handled by panproto_lens_dsl::compile, which produces a CompiledLens carrying the ProtolensChain corresponding to along with the value-level FieldTransforms extracted from any steps.

Complement composition

Sequential composition of lenses requires composing their complements. Complement::compose is a partial commutative monoid:

  • Identity. The empty complement satisfies .
  • Commutativity. When defined, .
  • Associativity. When defined, .
  • Partiality: is defined iff:
    • and have the same source-schema fingerprint (otherwise ComplementFingerprintMismatch); and
    • For every key in both, (otherwise ComplementConflict with the offending key).

Pre-flight predicate: Complement::is_compatible(c1, c2).

The fingerprint is a 64-bit hash of the source schema (computed with the standard library’s DefaultHasher in panproto_lens::asymmetric::schema_fingerprint), so complements computed against syntactically distinct but structurally equal schemas are still compatible.

Soundness

The compilation function preserves lawfulness: if every step compiles to a lawful lens (which the combinator algebra guarantees), the composed result is lawful. Property tests in crates/panproto-lens/src/laws.rs verify each combinator against random inputs sampled from the schema’s value space.

What is intentionally not modelled

  • Lossy migrations as full lenses. A migration that drops information cannot satisfy GetPut. The DSL allows DropField, but the resulting object is a partial lens; the laws hold only on the surviving structure, and CI tests skip the GetPut law for steps annotated as lossy.
  • Time complexity of put. Some combinators have linear put cost in the size of the source; the semantics fixes the value, not the cost.
  • Equivalence of two distinct DSL specs that compile to the same lens. The DSL deliberately exposes step ordering even when steps commute; canonicalisation is left to the user.

See also

Theory DSL: denotational semantics

In plain terms

The theory DSL is what you use to declare a new protocol’s schema language. You write down the basic kinds of thing the protocol talks about (its sorts), the constructors that build them (its operations), and the equations they satisfy. The DSL compiles to a generalized algebraic theory (GAT) that the rest of panproto consumes uniformly.

This page pins down what a GAT presentation is, what category it generates, and what counts as a valid presentation.

Surface syntax

The Nickel surface (canonical authoring form). JSON and YAML surfaces are isomorphic via serde. Every document carries an id, a description, and exactly one body variant (theory, morphism, composition, protocol, bundle, class, instance, or inductive type).

A bare theory body:

{
  id = "dev.example.thgraph",
  description = "Directed multigraph with identity edges",
  theory = "ThGraph",
  sorts = [ { name = "Vertex" }, { name = "Edge" } ],
  ops = [
    { name = "src", inputs = [{ name = "e", sort = "Edge" }], output = "Vertex" },
    { name = "tgt", inputs = [{ name = "e", sort = "Edge" }], output = "Vertex" },
    { name = "id",  inputs = [{ name = "v", sort = "Vertex" }], output = "Edge" },
  ],
  equations = [
    { name = "src-id", lhs = "src(id(v))", rhs = "v", context = [{ name = "v", sort = "Vertex" }] },
    { name = "tgt-id", lhs = "tgt(id(v))", rhs = "v", context = [{ name = "v", sort = "Vertex" }] },
  ],
}

The full grammar is in crates/panproto-theory-dsl/src/document.rs.

Abstract syntax

pub struct TheoryDocument {
    pub id: String,
    pub description: String,
    pub body: TheoryBody,
}

pub enum TheoryBody {
    Theory(TheorySpec),
    Morphism(MorphismSpec),
    Composition(CompositionBody),
    Protocol(Box<ProtocolSpec>),
    Bundle(Box<BundleSpec>),
    Class(ClassSpec),
    Instance(InstanceSpec),
    Inductive(InductiveSpec),
}

pub struct TheorySpec {
    pub theory: String,                       // theory name
    pub extends: Vec<String>,                 // parent theories
    pub imports: Vec<ImportSpec>,             // imports with optional aliases
    pub sorts: Vec<SortSpec>,                 // sort declarations (with dependent params)
    pub ops: Vec<OpSpec>,                     // operation declarations
    pub equations: Vec<EquationSpec>,         // judgemental equalities
    pub directed_equations: Vec<DirectedEqSpec>,  // rewrite rules
    pub policies: Vec<PolicySpec>,            // conflict policies
}

The DSL document compiles to panproto_gat::Theory (and, for non-Theory bodies, to TheoryMorphism or Protocol). The intermediate surface types (TheorySpec, OpSpec, etc.) are deserialisation targets, not the categorical objects; the categorical objects are the GAT types.

Sort, operation, and equation judgements

A sort is well-formed in a theory context when it appears in the sort list:

An operation is well-typed when its inputs are well-typed sorts and its output is a well-typed sort dependent on the inputs:

An equation is well-formed when both sides type-check at the same sort under the same context:

Semantic domain

The semantic universe is , the (2-)category of categories with families (Cartmell-style: contexts as objects, substitutions as morphisms, types and terms forming a presheaf over substitutions). The denotational semantics is the functor

defined by case on the document body (with the bundle case mapping to a tuple of CwFs). For the bare theory case:

where is the initial CwF satisfying the sort, operation, and equation declarations of . Initiality is the semantic content of the GAT construction: for every CwF that interprets ’s sorts and operations and validates its equations, there is a unique structure-preserving functor . The GAT presentation framework is due to Cartmell (Cartmell (1986)), and the category-with-families packaging of dependent type theory is Dybjer’s (Dybjer (1996)); the two compose to give the initial-model semantics used here.

A schema in panproto is a model of in the CwF of finite sets and functions: equivalently, a CwF morphism .

Semantic equations for the body cases

where is the iterated pushout described under Pushouts and merge and the auxiliary class-to-theory and inductive-to-theory are the desugarings implemented in panproto-theory-dsl/src/compile_class.rs and compile_inductive.rs.

The interpretation of the composition body factors through the colimit construction in : , exploiting the fact that preserves the relevant colimits.

Soundness and registration

A protocol registration is the construction of a colimit diagram. If any pushout step in the diagram fails to satisfy the universal property (because two equations contradict on a shared sort), registration panics with a message naming the failing intermediate step:

panic: colimit ThGraph + ThConstraint over ThVertex failed:
       equation `src(id(v)) = v` contradicts `src(id(v)) = source(v)`

This is a build-time bug in the theory composition; the panic is intentional. See crates/panproto-protocols/src/theories.rs.

What is intentionally not modelled

  • Higher dimensions. The DSL is for 1-categorical GATs only. We do not model homotopical structure, 2-cells, or coherent equations.
  • Infinitary signatures. Operations have finite arity. There is no way to declare an operation that takes an unbounded list of arguments.
  • Term rewriting decidability. Equation orientation and confluence are the user’s responsibility; the colimit construction does not check for a complete rewriting system.

See also

Pushouts and merge

In plain terms

When two branches of a schema repository diverge from a common ancestor and you want to merge them, you need a precise rule for combining their changes. panproto’s rule is the pushout construction from category theory.

A pushout is the smallest object that contains two given objects and respects the way they share a common subobject. For schemas, “smallest” means containing exactly the union of the two branches’ changes, with their shared subschema identified rather than duplicated. The construction has a universal property: any other schema that also contains both branches admits a unique morphism from the pushout. That uniqueness is what makes the merge result the answer rather than an answer.

panproto-vcs does not just compute the pushout. It also verifies the universal property at runtime: it constructs an alternative cocone and checks that the unique mediator exists. If the check fails, the merge raises an error rather than producing a wrong result.

The categorical setup

Let be the category of GAT presentations and theory morphisms. A pushout in is defined as follows. Given:

  • An object (the base or shared ancestor),
  • Two morphisms and (the changes on each branch),

the pushout is an object together with morphisms and satisfying the cocone condition and the universal property: for any other object and morphisms , with , there exists a unique morphism such that and .

The pushout square:

The universal property says that for any alternative cocone , the mediator exists and is unique:

with and .

Construction

The pushout is constructed as the disjoint union of the sorts of and , quotiented by the equivalence relation that identifies with for every sort , with the operations and equations from and assembled accordingly.

The implementation is in crates/panproto-gat/src/colimit.rs. The result type is ColimitResult, which carries:

  • The pushout object .
  • The two inclusion morphisms , .
  • A method verify_universal to check the universal property against an alternative cocone.

Helper accessors merge_mediator_assignments and pushout_by_name expose the inclusions for downstream use.

The verified universal property

ColimitResult::verify_universal takes an alternative cocone and computes the unique mediator . It then checks that and . If either equation fails, the check returns EquationNotPreserved carrying the offending sort or operation.

In the VCS layer, vcs::merge::verify_pushout_universal runs the check against a constructed alternative cocone derived from the merge candidates. Failure raises PushoutError::UniversalFactorizationFailure and the merge does not produce a result.

This is the new behaviour introduced in a7fb636 (the correctness pass). Previously, panproto-vcs computed the pushout but did not verify it. The merge could produce an object satisfying cocone commutativity (the pushout square commutes) without satisfying the universal property (without being the minimal such object). Cocone commutativity is necessary for a pushout but not sufficient. Verifying the universal property is what makes the result the pushout rather than a cocone.

What this guarantees

  • Determinism. Two repositories with the same base and the same branch changes produce the same merge result, up to isomorphism.
  • Minimality. No spurious sorts, operations, or equations are introduced.
  • Safety. A failing universal-property check raises an error rather than silently producing a wrong merge.

What is intentionally not modelled

  • Conflict resolution policy. When the pushout would introduce a contradiction (two branches add fields with the same name but incompatible types), panproto raises a conflict object for the user to resolve. The resolution policy is up to the user; the pushout construction does not invent compromises.
  • Three-way merges with non-pushout common ancestors. If the branches’ divergence cannot be expressed as a span (for instance, if the ancestor was rewritten by a rebase), the merge falls back to an interactive resolution; the formal pushout is not defined.
  • Merge time complexity. The construction is polynomial in the size of the inputs but no specific bound is guaranteed.

See also

Protolens composition

In plain terms

A protolens is a lens recipe parameterised over a schema rather than a fixed pair of schemas. Where a lens lives between two specific schemas and , a protolens is a rule that says “for any schema satisfying some precondition, here is a lens from to a derived schema .” Applying a protolens to a fleet of related schemas is one operation; you do not write one lens per schema.

Composing protolenses requires more care than composing plain lenses. Two protolenses can be glued together end-to-end only when the schema produced by the first is precisely the schema consumed by the second. “Precisely” here is structural: the intermediate schemas have to agree as endofunctors on theories, not just on names. This page pins down that condition.

Semantic domain

A protolens is a natural transformation between schema endofunctors. Concretely:

where and are functors on the category of schemas , and for each schema , is a lens

satisfying the lens laws (see Lens DSL).

The naturality condition is: for every schema morphism , the square

commutes: . Applying the protolens then transporting along gives the same result as transporting then applying the protolens.

Composition

Two protolenses and compose vertically into pointwise, by :

The composition is well-defined only when the intermediate functor of matches the source functor of . In panproto-lens, this match is checked by protolens_composable:

pub fn protolens_composable(eta: &Protolens, theta: &Protolens) -> bool {
    matches!(theta.source.transform, TheoryTransform::Identity)
        || theory_endofunctor_equiv(&eta.target, &theta.source)
}

A Protolens carries its source and target TheoryEndofunctors as public fields. The equality theory_endofunctor_equiv is structural: the endofunctors agree iff their preconditions and their transforms agree, ignoring the human-readable name field. A trivial special case: when the source of is the identity functor (i.e., is the identity), the match is automatic regardless of ’s target.

vertical_compose enforces the check at construction time and returns LensError::CompositionMismatch on failure, naming the offending intermediate functor. This catches an entire class of bug at the type-construction stage rather than at instantiation time.

Sequential vs fused instantiation

When a chain of composable protolenses is applied to a base schema , two instantiation strategies are exposed:

  • Fused instantiation (instantiate): construct the composed protolens first, then apply it to once. Produces a single morphism with the migration metadata preserved as one chain.
  • Sequential instantiation (instantiate_sequential): apply to to get , apply to to get , and so on. Produces a list of morphisms.

Both satisfy the lens laws. Fused is the production default because it preserves migration metadata as a single object; sequential is exposed for property tests that need to inspect each intermediate schema.

Soundness

The composition operation preserves naturality: if and are natural transformations and protolens_composable(P, Q) holds, then is a natural transformation. The pointwise lens laws follow from the laws on each and .

The structural-equality check on intermediate functors is the necessary condition for the composite to be well-defined. Without it, vertical composition could be invoked with mismatched functors and the result would silently fail naturality on some schemas. The check makes such composition a runtime error at construction time rather than a semantic bug at application time.

What is intentionally not modelled

  • Horizontal composition of protolenses. Naturality also supports horizontal composition (whiskering), but the implementation does not currently expose it. Adding it would require a notion of natural transformation between protolens-shaped functors, which is not yet defined in the codebase.
  • Identity-sourced protolens equivalence beyond structural. Two protolenses that compute the same transform via different intermediate forms are treated as distinct.
  • Performance characteristics of fused vs sequential. The choice is semantic-equivalent; fused is preferred for metadata reasons, not performance.

See also

REPL command language: denotational semantics

In plain terms

The REPL is the interactive surface for inspecting theories and terms: load a theory document, switch the active theory, ask for the type of a term, normalize a term under the directed equations, enumerate the free model. Every interactive command is a small operation on a stateful environment that holds the loaded theories, the loaded morphisms, and a pointer to the currently active theory.

This page pins down what each command does, what state it touches, and what the bare-term-typecheck path means.

Surface syntax

line     ::= command | term
command  ::= ":" cmd args
cmd      ::= "load" | "theories" | "use" | "sorts" | "ops"
           | "type" | "normalize" | "model" | "instance"
           | "quit" | "q"
args     ::= /* command-specific, see table below */

A line that does not begin with : is treated as a term and routed through the typecheck path under the active theory. Comments and blank lines are ignored. The grammar lives in crates/panproto-repl/src/lib.rs.

Abstract syntax

pub enum ReplCommand {
    Load(PathBuf),
    Theories,
    Use(String),
    Sorts,
    Ops,
    TypeOf(String),
    Normalize(String),
    Model(Option<u32>),
    Instance { class: String, target: String, bindings: Vec<(String, String)> },
    Quit,
}

pub enum ReplLine {
    Command(ReplCommand),
    Term(String),
}

(The actual implementation does not export this enum; commands are dispatched directly from string parsing in Repl::handle_command. The shape above is the denotational model.)

Semantic domain

The REPL state is

The semantic codomain is the set of user-visible effects (printed string, typed-term result, error, quit signal). The denotational semantics is the pair of functions

where is state-preserving: .

Semantic equations

Write for a state and for the obvious update. The equations:

The bare-term path:

When (no active theory), all -dependent equations short-circuit to . When any auxiliary (compile, typecheck_term, normalize, free_model, compile_instance) returns an error, the outcome is and the state is unchanged.

Repl::handle_command and Repl::handle_term_typecheck in crates/panproto-repl/src/lib.rs implement these equations pointwise.

Soundness

The REPL is a thin orchestration layer over the GAT engine and the theory DSL compiler. It introduces no new failure modes; every error it reports comes from one of:

  • panproto_theory_dsl::LoadError (during :load or :instance)
  • panproto_gat::GatError (during :type, :normalize, :model)
  • A REPL-level UnknownCommand / UnknownTheory for command-shape errors

State updates are atomic per line: a failed compile in :load rolls back the partial insertions before returning the error, so the post-state is either fully updated or unchanged.

The bare-term path is total in the technical sense: every input string either parses and produces a TypeOf or Error outcome; the REPL does not deadlock or spin.

What is intentionally not modelled

  • Multi-line input. Commands and terms are single-line. Continuations are the user’s responsibility (concatenate into a single line before submission).
  • Macro expansion. There is no :define or :macro; the REPL is a pure inspection interface, not a programming environment.
  • Concurrent state. A Repl instance is single-threaded. The shape above does not model concurrent line submission.
  • Persistent history. The REPL relies on rustyline for history; the persistence model is rustyline’s, not panproto’s.

See also

Glossary

Each entry gives the formal definition first and a one-sentence intuition second.

Abstract schema

A Schema whose constraint set contains no sort in the layout enrichment fibre: no start-byte, end-byte, interstitial-N, chose-alt-fingerprint, or chose-alt-child-kinds. The Rust newtype is panproto_schema::AbstractSchema. Intuition: the schema you would build by hand with SchemaBuilder before any parser or decorate has attached layout data.

Decorated schema

A Schema carrying a complete layout enrichment fibre. The Rust newtype is panproto_schema::DecoratedSchema. Constructed by ParserRegistry::parse_with_protocol, by ParserRegistry::decorate, or by explicit wrapping via DecoratedSchema::wrap_unchecked. Intuition: the schema you get back from parsing source code, with every byte position and inter-token whitespace recorded.

Decorate

The function (AbstractSchema, LayoutPolicy) → DecoratedSchema that attaches a layout fibre to an abstract schema by running emit_pretty_with_policy to produce canonical bytes and re-parsing those bytes. The result satisfies forget_layout(decorate(a, p)) ≅ a up to vertex-id renaming and kind / edge multiset equivalence. Intuition: the section of the schema-level forgetful U; the put-direction of the parse / emit lens at the schema level.

Forget layout

The function Schema → Schema (or DecoratedSchema → AbstractSchema in typed form) that drops every constraint whose sort belongs to the layout enrichment fibre. Implemented as Schema::forget_layout, Schema::forget_layout_in_place, and DecoratedSchema::forget_layout. Intuition: the schema-level forgetful functor stripping parser-only metadata to leave the abstract content.

Layout enrichment / Layout fibre

The family of constraint sorts (start-byte, end-byte, interstitial-N for any N, chose-alt-fingerprint, chose-alt-child-kinds) that attach byte-position and parser-discriminator data to vertices of a parsed schema. Classified by panproto_gat::EnrichmentKind::Layout and identified by the panproto_gat::is_layout_sort predicate. Intuition: the parser-only metadata the emitter needs to render bytes back; everything parse adds that SchemaBuilder does not produce by hand.

Layout policy

The configuration object passed to decorate and pretty_with_protocol controlling whitespace, indentation, separators, newline conventions, and the line-break / indent-open / indent-close token sets that the put direction of the parse / emit lens uses. Aliased to panproto_parse::emit_pretty::FormatPolicy; the wire-serialisable projection is panproto_gat::LayoutPolicySpec. Intuition: the put-direction complement of the parse / emit lens, namely what whitespace and CHOICE-alternative defaults to apply when parsing is not there to dictate them.

Layout enricher

A trait implementation registered in panproto-lens::enrichment_registry that materialises a layout fibre on a schema. The one in-tree implementation, panproto_parse::decorate::ParserLayoutEnricher, runs emit_pretty_with_policy + parse to recover the fibre. Intuition: the cross-crate bridge that lets panproto-lens dispatch enrichment synthesis without depending on tree-sitter.

Parse / emit lens

The lens between byte sequences and decorated schemas. The get direction is parse; the put direction is emit_pretty. The complement (the data the schema does not pin down) is the byte-position layout fibre. Verified at the schema-level retraction law granularity by panproto_parse::parse_emit_lens::check_emit_parse and check_parse_emit. Intuition: the lens whose get reads source code into a schema and whose put writes a schema back to source code.

Parse / decorate / emit lens

The schema-level version of the parse / emit lens, with the byte step skipped. The get direction is forget_layout : DecoratedSchema → AbstractSchema; the put direction is decorate : AbstractSchema → DecoratedSchema. The section law forget_layout ∘ decorate ≅ id holds up to kind / edge multiset equivalence. Intuition: the lens between abstract and decorated schemas, parameterised by a LayoutPolicy.

Grammar cassette

A per-language implementation of GrammarCassette supplying default text for external scanner tokens that grammar.json cannot describe (variable-text delimiters, layout markers, scanner-state markers). Composed with the universal pattern table common_external_default via resolve_external_token: per-grammar override first, universal layer as fallback. Intuition: the small per-language patch sitting on top of the grammar-derived emit pipeline, supplying text for tokens whose actual content grammar.json alone cannot pin down.

Token role

Structural classification of every STRING literal in a grammar rule, derived from the literal’s position in the production body. Eight variants of panproto_parse::emit_pretty::TokenRole: BracketOpen, BracketClose, Separator, Keyword, Operator, Connector (a non-algebraic structural connector such as . or ::), Terminal (text from a leaf vertex’s literal-value), and Immediate (a token the grammar wraps in IMMEDIATE_TOKEN, glued to its neighbour with no whitespace). Computed once at Grammar::from_bytes time and stored as the per-rule token_roles map; consumed by the layout pass via the needs_space_by_role table. Intuition: what the emitter uses instead of inspecting the token text; every spacing decision follows from the role pair, not from any character set.

Acceptance predicate

The inductive function accepts_first_edge(production, edge_field, target_kind) over the production tree that decides whether a given alternative is structurally compatible with the cursor’s first unconsumed edge. Fuses FIELD-name matching, SYMBOL subtype dispatch, ALIAS rewrite, and yield-set admission into a single categorical rule. Implemented in panproto-parse::emit_pretty::accepts_first_edge. Intuition: the categorical core of CHOICE dispatch; the predicate the emitter consults before any heuristic tiebreaker.

Pre-alias symbol

The walker-recorded pre-alias-symbol constraint capturing tree_sitter::Node::grammar_name() (the SYMBOL name as it appears in the rule body before ALIAS { value: V } rewriting). Only recorded when it differs from the post-alias kind(). Consumed by alt_satisfies_pre_alias_constraints as the alias-source discriminator: an alt with a named ALIAS over a SYMBOL is structurally compatible iff the cursor edge’s pre-alias-symbol matches that SYMBOL. Intuition: the only ALIAS-disambiguation signal tree-sitter 0.25 / 0.26 surfaces through its C API.

Emit verification status

The programmatic tier reported by ParserRegistry::emit_verification_status classifying every protocol as Verified (every entry of the grammar author’s own test/corpus/ round-trips under the strict emit_corpus_audit oracle, or the protocol is pinned by a quivers backend test), Generic (registered with vendored grammar.json, no test asserts emit correctness), or Unsupported (no grammar, emit will fail). The verified set is the 255 names in VERIFIED_EMIT_PROTOCOLS. Downstream tooling calls this upfront to refuse emit on protocols whose correctness has not been exercised. Intuition: panproto’s own honesty signal about which protocols its test suite verifies for round-trip correctness.

Fixed-point law (emit)

The correctness witness for source-code emission: emit(parse(emit(s))) == emit(s). Asserted per-protocol by <lang>_emit_is_fixed_point regression tests in crates/panproto-parse/tests/emit_pretty_regressions.rs, and enforced over every grammar author’s full test/corpus/ by the strict emit_corpus_audit gate, which conjoins this fixed point with kind- and edge-multiset preservation. Stronger than the section law (which holds at the kind / edge multiset level); equality is byte-for-byte after the first emit. Intuition: the emitter has reached a fixed point of the parse / emit cycle, which is what guarantees that downstream re-parsing pipelines remain stable.

Section law

For the parse / decorate / emit lens at protocol under policy:

The equivalence is up to vertex-id renaming and the vertex-kind / edge-shape multiset. Verified for every grammar with a parse fixture in crates/panproto-parse/tests/decorate_section_law.rs. Intuition: decorating an abstract schema and stripping the layout back returns the same abstract content modulo the fresh IDs the parser invents.

See also

For longer treatments: Source-code emission, Schemas as theories, Lenses and round-trip laws, Layout enrichment.

Bibliography

Angiuli, Carlo, Edward Morehouse, Daniel R. Licata, and Robert Harper. 2014. “Homotopical Patch Theory.” In “Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, ICFP 2014.” Special issue, Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, ICFP 2014, 243–56. https://doi.org/10.1145/2692915.2628158.
Begoli, Edmon, Jesús Camacho-Rodr\́iguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. 2018. “Apache Calcite: A Foundational Framework for Optimized Query Processing over Heterogeneous Data Sources.” In “Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data (SIGMOD 2018).” Special issue, Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data (SIGMOD 2018), 221–30. https://doi.org/10.1145/3183713.3190662.
Bohannon, Aaron, Benjamin C. Pierce, and Jeffrey A. Vaughan. 2006. “Relational Lenses: A Language for Updatable Views.” In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2006, edited by Stijn Vansummeren, Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2006. ACM. https://doi.org/10.1145/1142351.1142399.
Bohannon, Aaron, J. Nathan Foster, Benjamin C. Pierce, Alexandre Pilkiewicz, and Alan Schmitt. 2008. “Boomerang: Resourceful Lenses for String Data.” In “Proceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2008.” Special issue, Proceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2008, 407–19. https://doi.org/10.1145/1328438.1328487.
Brunsfeld, Max. 2018. “Tree-Sitter: A New Parsing System for Programming Tools.”. https://www.thestrangeloop.com/2018/tree-sitter%E2%80%94a-new-parsing-system-for-programming-tools.html.
Burstall, Rod M., and Joseph A. Goguen. 1977. “Putting Theories Together to Make Specifications.” In “Proceedings of the 5th International Joint Conference on Artificial Intelligence (IJCAI 1977).” Special issue, Proceedings of the 5th International Joint Conference on Artificial Intelligence (IJCAI 1977), 1045–58. https://dl.acm.org/doi/10.5555/1622943.1622999.
Burstall, Rod M., and Joseph A. Goguen. 1980. “The Semantics of Clear, A Specification Language.” In “Abstract Software Specifications, 1979 Copenhagen Winter School.” Special issue, Abstract Software Specifications, 1979 Copenhagen Winter School, Lecture Notes in Computer Science, vol. 86 : 292–332. https://doi.org/10.1007/3-540-10007-5_41.
Cartmell, John. 1978. “Generalised Algebraic Theories and Contextual Categories.” D.Phil. thesis. https://ncatlab.org/nlab/files/Cartmell-Thesis.pdf.
Cartmell, John. 1986. “Generalised Algebraic Theories and Contextual Categories.” Annals of Pure and Applied Logic 32 : 209–43. https://doi.org/10.1016/0168-0072(86)90053-9.
Chacon, Scott, and Ben Straub. 2014. Pro Git. 2nd ed. Apress. https://doi.org/10.1007/978-1-4842-0076-6.
Clarke, Bryce, Derek Elkins, Jeremy Gibbons, et al. 2024. “Profunctor Optics, A Categorical Update.” Compositionality 6 : 1. https://doi.org/10.32408/compositionality-6-1.
Curino, Carlo A., Hyun J. Moon, and Carlo Zaniolo. 2008. “Graceful Database Schema Evolution: The PRISM Workbench.” In “Proceedings of the VLDB Endowment, Vol. 1, No. 1 (VLDB 2008).” Special issue, Proceedings of the VLDB Endowment, Vol. 1, No. 1 (VLDB 2008), 761–72. https://doi.org/10.14778/1453856.1453939.
Diskin, Zinovy, Yingfei Xiong, and Krzysztof Czarnecki. 2011. “From State- to Delta-Based Bidirectional Model Transformations: The Asymmetric Case.” Journal of Object Technology 10 : 6:1–25. https://doi.org/10.5381/jot.2011.10.1.a6.
Dybjer, Peter. 1996. “Internal Type Theory.” In Types for Proofs and Programs, TYPES 1995, edited by Stefano Berardi and Mario Coppo, vol. 1158 of Types for Proofs and Programs, TYPES 1995. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/3-540-61780-9\_66.
Fagin, Ronald, Phokion G. Kolaitis, and Lucian Popa. 2005. “Data Exchange: Getting to the Core.” 30 (1): 174–210. https://doi.org/10.1145/1061318.1061323.
Fagin, Ronald, Phokion G. Kolaitis, Lucian Popa, and Wang-Chiew Tan. 2005. “Composing Schema Mappings: Second-Order Dependencies to the Rescue.” ACM Transactions on Database Systems 30 (4): 994–1055. https://doi.org/10.1145/1114244.1114249.
Foster, J. Nathan, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. “Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-Update Problem.” ACM Transactions on Programming Languages and Systems 29 (3): 17. https://doi.org/10.1145/1232420.1232424.
Foster, J. Nathan, Alexandre Pilkiewicz, and Benjamin C. Pierce. 2008. “Quotient Lenses.” In “Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, ICFP 2008.” Special issue, Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, ICFP 2008, 383–96. https://doi.org/10.1145/1411204.1411257.
Goguen, Joseph A., and Rod M. Burstall. 1992. “Institutions: Abstract Model Theory for Specification and Programming.” Journal of the ACM 39 (1): 95–146. https://doi.org/10.1145/147508.147524.
Hofmann, Martin, Benjamin C. Pierce, and Daniel Wagner. 2011. “Symmetric Lenses.” In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, edited by Thomas Ball and Mooly Sagiv, Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011. ACM. https://doi.org/10.1145/1926385.1926428.
Hofmann, Martin, Benjamin C. Pierce, and Daniel Wagner. 2012. “Edit Lenses.” SIGPLAN Notices 47 (1): 495–508. https://doi.org/10.1145/2103656.2103715.
Litt, Geoffrey, Peter van Hardenberg, and Orion Henry. 2020. “Project Cambria: Translate Your Data with Lenses.”. https://www.inkandswitch.com/cambria/.
Litt, Geoffrey, Peter van Hardenberg, and Orion Henry. 2021. “Schema Evolution in Distributed Systems with Edit Lenses.” In “Proceedings of the 8th Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC 2021).” Special issue, Proceedings of the 8th Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC 2021),. https://doi.org/10.1145/3447865.3457963.
Lu, Jiaheng. 2025. A Categorical Unification for Multi-Model Data: Part I Categorical Model and Normal Forms. https://arxiv.org/abs/2502.19131.
Lutterkort, David. 2008. “AUGEAS—a Configuration API.” In “Proceedings of the Linux Symposium.” Special issue, Proceedings of the Linux Symposium 2 : 47–56. https://www.kernel.org/doc/ols/2008/ols2008v2-pages-47-56.pdf.
Merkle, Ralph C. 1988. “A Digital Signature Based on a Conventional Encryption Function.” In Advances in Cryptology: CRYPTO 1987, edited by Carl Pomerance, vol. 293 of Advances in Cryptology: CRYPTO 1987. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/3-540-48184-2\_32.
Mimram, Samuel, and Cinzia Di Giusto. 2013. “A Categorical Theory of Patches.” Electronic Notes in Theoretical Computer Science 298 : 283–307. https://doi.org/10.1016/j.entcs.2013.09.018.
Pacheco, Hugo, and Alcino Cunha. 2011. “Calculating with Lenses: Optimising Bidirectional Transformations.” In “Proceedings of the 20th ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2011.” Special issue, Proceedings of the 20th ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2011, 91–100. https://doi.org/10.1145/1929501.1929520.
Pacheco, Hugo, Alcino Cunha, and Zhenjiang Hu. 2012. “Delta Lenses over Inductive Types.” Electronic Communications of the EASST 49. https://doi.org/10.14279/tuj.eceasst.49.713.
Patterson, Evan, Owen Lynch, and James Fairbanks. 2022. “Categorical Data Structures for Technical Computing.” Compositionality 4 (5). https://doi.org/10.32408/compositionality-4-5.
Pickering, Matthew, Jeremy Gibbons, and Nicolas Wu. 2017. “Profunctor Optics: Modular Data Accessors.” The Art, Science, And Engineering of Programming 1 (2): 7. https://doi.org/10.22152/programming-journal.org/2017/1/7.
Pombrio, Justin, and Shriram Krishnamurthi. 2014. “Resugaring: Lifting Evaluation Sequences Through Syntactic Sugar.” In “Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014.” Special issue, Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, 361–71. https://doi.org/10.1145/2594291.2594319.
Roundy, David. 2005. “Darcs: Distributed Version Management in Haskell.” In “Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell (Haskell '05).” Special issue, Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell (Haskell '05), 1–4. https://doi.org/10.1145/1088348.1088349.
Sannella, Donald, and Andrzej Tarlecki. 2012. Foundations of Algebraic Specification and Formal Software Development. Monographs in Theoretical Computer Science. An EATCS Series. Springer. https://doi.org/10.1007/978-3-642-17336-3.
Schultz, Patrick, and Ryan Wisnesky. 2017. “Algebraic Data Integration.” Journal of Functional Programming 27 : e24. https://doi.org/10.1017/S0956796817000168.
Schultz, Patrick, David I. Spivak, Christina Vasilakopoulou, and Ryan Wisnesky. 2017. “Algebraic Databases.” Theory and Applications of Categories 32 (16): 547–619. http://www.tac.mta.ca/tac/volumes/32/16/32-16.pdf.
Shapiro, Marc, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. “Conflict-Free Replicated Data Types.” In “Stabilization, Safety, And Security of Distributed Systems (SSS 2011).” Special issue, Stabilization, Safety, And Security of Distributed Systems (SSS 2011), Lecture Notes in Computer Science, vol. 6976 : 386–400. https://doi.org/10.1007/978-3-642-24550-3_29.
Spivak, David I., and Robert E. Kent. 2012. “Ologs: A Categorical Framework for Knowledge Representation.” PLOS ONE 7 (1): e24274. https://doi.org/10.1371/journal.pone.0024274.
Spivak, David I. 2012. “Functorial Data Migration.” Information and Computation 217 : 31–51. https://doi.org/10.1016/j.ic.2012.05.001.
Spivak, David I., and Ryan Wisnesky. 2015. “Relational Foundations for Functorial Data Migration.” In Proceedings of the 15th Symposium on Database Programming Languages, DBPL 2015, edited by James Cheney and Thomas Neumann, Proceedings of the 15th Symposium on Database Programming Languages, DBPL 2015. ACM. https://doi.org/10.1145/2815072.2815075.
Zhu, Zirun, Hsiang-Shang Ko, Pedro Martins, João Saraiva, and Zhenjiang Hu. 2015. “BiYacc: Roll Your Parser and Reflective Printer into One.” In Proceedings of the 4th International Workshop on Bidirectional Transformations, Bx 2015, edited by Alcino Cunha and Ekkart Kindler, vol. 1396 of Proceedings of the 4th International Workshop on Bidirectional Transformations, Bx 2015. CEUR Workshop Proceedings. https://ceur-ws.org/Vol-1396/p43-zhu.pdf.
Jonge, Maartje de, and Eelco Visser. 2012. “An Algorithm for Layout Preservation in Refactoring Transformations.” In Software Language Engineering, SLE 2011, edited by Anthony Sloane and Uwe Aßmann, vol. 6940 of Software Language Engineering, SLE 2011. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-642-28830-2_3.