panproto
panproto is a toolchain for evolving schemas: defining them, migrating data across versions, translating between protocols (JSON Schema, Protobuf, GraphQL, Avro, ATProto, SQL DDL, and many more), versioning the whole thing the way git versions source, and round-tripping data through bidirectional transforms whose laws are mechanically checked.
This documentation is organised in four quadrants, following the Diataxis framework. Pick the one that matches what you are trying to do.
Tutorials
Learning by doing. Pick this if you are new and want a guided sit-down with a working example at the end. No prior knowledge of category theory or schema theory is assumed.
How-to guides
Recipes for specific tasks: defining a schema in your language of choice, building a migration, wiring breaking-change detection into CI, querying instances, parsing full ASTs, bridging panproto’s version control to git. Pick this when you know what you want to do and need the steps.
Reference
Authoritative listings: every CLI subcommand, the SDK surfaces for Rust, TypeScript, and Python, the protocol catalogue, the expression-language builtins, the lens combinator algebra, the panproto.toml schema. Tables and signatures, no exposition.
Explanation
Why the system is shaped the way it is. What schemas, migrations, lenses, and merges mean under the hood. The denotational semantics of panproto’s three DSLs, and a precise account of which categorical properties the implementation mechanically verifies. Most pages here are accessible to working developers; the semantics/ cluster is the place where the math is load-bearing.
Where to start
| You are… | Start at |
|---|---|
| New to panproto | Your first schema |
| Adding panproto to an existing project | Install |
| Looking up a CLI flag | CLI reference |
| Wondering what schemas-as-theories means | Schemas as theories |
| Building a custom protocol | Build a custom protocol |
| Setting up CI gates | Breaking-change gate |
Tutorials
Tutorials walk through a complete example end-to-end. They assume no prior knowledge of panproto and no category-theoretic background. Each one ends with a working artifact you can run and modify.
If a tutorial mentions a concept by name without explaining it (a lens, a protocol, a migration), it links forward to the explanation page that defines it. You do not need to follow those links to finish the tutorial.
| Tutorial | What you build | Time |
|---|---|---|
| Your first schema | A typed schema for a small data model in TypeScript, validated against a protocol | ~10 min |
| Your first migration | A migration that adds a field, renames another, and lifts existing data forward | ~15 min |
| Schema version control basics | A schema repository with branches and a merge | ~20 min |
| Cross-protocol translation | A JSON Schema converted into a Protobuf definition with data round-tripped through both | ~20 min |
Your first schema
You will define a schema for a small data model (users with names and ages), validate it against the atproto protocol, and load some data through it. About ten minutes.
By the end you will have: a working panproto setup, a schema you wrote, an instance of that schema parsed from a JSON file, and a sense of how the four pieces (protocol, schema, instance, validation) fit together.
We use atproto because it is the most fully-built-out protocol in the current registry; the same code shape applies to any of the protocols listed by Panproto.listProtocols().
No prior knowledge of category theory or schema theory is assumed. We use ordinary words for everything; if you want the formal treatment of any concept, the explanation chapters are linked at the end.
Setup
Pick a language. The walkthrough uses TypeScript; the Python and Rust versions are at the bottom.
mkdir my-first-schema && cd my-first-schema
npm init -y
npm install @panproto/core
Step 1: load a protocol
Create src/main.ts:
import { Panproto } from '@panproto/core';
const p = await Panproto.init();
const proto = p.protocol('atproto');
console.log('protocol:', proto.name);
Run it: npx tsx src/main.ts. You see protocol: atproto (using Protocol.name). The protocol object knows how to validate, parse, and emit schemas in its native form; it is the starting point for building schemas in this language.
Step 2: build a schema
Add to src/main.ts:
const schema = proto.schema()
.vertex('user', 'object')
.vertex('user.name', 'string')
.vertex('user.age', 'integer')
.edge('user', 'user.name', 'prop', { name: 'name' })
.edge('user', 'user.age', 'prop', { name: 'age' })
.required('user', [{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' }])
.build();
console.log(
'vertices:', Object.keys(schema.vertices).length,
'edges:', schema.edges.length,
);
.vertex() declares a vertex (a kind, e.g. an object or a leaf string). .edge() declares an edge (a field, item, or variant). This schema says: a user is an object with a required string name and an optional integer age.
.build() validates the construction: required edges are present, every reference targets an existing vertex, the protocol’s equations are satisfied. If anything is wrong, you get an error here, before any data is touched. The returned BuiltSchema carries .vertices (an id -> Vertex record), .edges (an array), and .protocol.
Step 3: parse and validate data
Create data/sample.json:
{ "name": "Alice", "age": 30 }
Add to src/main.ts:
import { readFileSync } from 'node:fs';
const bytes = readFileSync('data/sample.json');
const instance = p.parseJson(schema, bytes);
console.log('parsed:', new TextDecoder().decode(p.toJson(schema, instance)));
const validation = instance.validate();
console.log('valid?', validation.isValid, validation.errors);
Run it. You see the parsed record echoed back and valid? true []. Panproto.parseJson(schema, bytes) returns an Instance by walking the JSON against the schema graph (a non-integer age, or JSON not matching the schema’s shape, raises during this call). Panproto.toJson(schema, instance) serialises it back out. instance.validate() runs the separate required-fields/constraints pass.
Step 4: catch a violation
Edit data/sample.json to remove name:
{ "age": 30 }
Run again. The parse still succeeds (parsing only walks the structure that is present), but validation.isValid is now false and validation.errors carries MissingRequiredEdge { ..., edge: "name (prop)" }. Required-field enforcement is performed by instance.validate(), not by parseJson itself; run both whenever you want a hard check.
What you built
Three things:
- A reference to a protocol (
atproto). - A schema (a graph of vertices and edges) within that protocol.
- Instances (data) parsed and validated against the schema.
This same pattern works for every protocol panproto supports. Swap 'atproto' for any other name in the built-in registry (Panproto.listProtocols() lists them), and the rest of the code is identical.
Next
- Your first migration takes the same
userschema, evolves it to v2, and lifts the existing data forward. - The plain-terms explanation of what schemas are is at Schemas as theories.
- The reference for the SDK surface you used is at Reference: TypeScript SDK.
Python version
import panproto
proto = panproto.get_builtin_protocol("atproto")
b = proto.schema()
b.vertex("user", "object")
b.vertex("user.name", "string")
b.vertex("user.age", "integer")
b.edge("user", "user.name", "prop", "name")
b.edge("user", "user.age", "prop", "age")
schema = b.build()
io = panproto.IoRegistry()
with open("data/sample.json", "rb") as f:
instance = io.parse("atproto", schema, f.read())
print(instance.to_dict())
The Python builder uses statement-by-statement mutation (each .vertex() and .edge() mutates in place and returns None); chain syntax does not work. Parsing data through a protocol’s codec goes through IoRegistry().parse(protocol, schema, bytes). The full list of built-in protocols is panproto.list_builtin_protocols().
Rust version
panproto-core is a re-export facade over the sub-crates; there is no single Panproto entry-point struct. You compose the same flow directly from the sub-crates: build a Schema via panproto_schema::SchemaBuilder, validate it against a protocol from panproto_protocols, parse instances via panproto_inst::parse_json. The shape:
use panproto_core::{protocols, schema::SchemaBuilder, inst};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let proto = protocols::atproto::protocol();
let schema = SchemaBuilder::new(&proto)
.vertex("user", "object", Some("app.example.user"))?
.vertex("user:name", "string", None)?
.vertex("user:age", "integer", None)?
.edge("user", "user:name", "prop", Some("name"))?
.edge("user", "user:age", "prop", Some("age"))?
.entry("user")
.build()?;
let bytes = std::fs::read("data/sample.json")?;
let json: serde_json::Value = serde_json::from_slice(&bytes)?;
let instance = inst::parse_json(&schema, "user", &json)?;
println!("{instance:?}");
Ok(())
}
Method signatures track the underlying crates rather than a fluent facade; consult docs.rs/panproto-schema and docs.rs/panproto-inst for current arguments.
Your first migration
You will take the user schema from Your first schema and evolve it to v2 by renaming age to years, then lift existing v1 records to v2 shape. About fifteen minutes.
By the end you will have: a v2 schema, a migration from v1 to v2 with explicit edge mappings, a compatibility report, an existence check that passes, and v2-shape data lifted from your v1 records with a verified round-trip.
Prerequisites
Completed Your first schema. The same my-first-schema/ project.
Step 1: write both schemas in one file
src/migration.ts:
import { Panproto } from '@panproto/core';
const p = await Panproto.init();
const proto = p.protocol('atproto');
const v1 = proto.schema()
.vertex('user', 'object')
.vertex('user.name', 'string')
.vertex('user.age', 'integer')
.edge('user', 'user.name', 'prop', { name: 'name' })
.edge('user', 'user.age', 'prop', { name: 'age' })
.required('user', [{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' }])
.build();
const v2 = proto.schema()
.vertex('user', 'object')
.vertex('user.name', 'string')
.vertex('user.years', 'integer')
.edge('user', 'user.name', 'prop', { name: 'name' })
.edge('user', 'user.years', 'prop', { name: 'years' })
.required('user', [{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' }])
.build();
The only change: age is renamed to years (a structural rename of both vertex id and edge name). We deliberately keep this minimal: adding a new required field with no v1 source would make lift produce records that do not satisfy v2, which the existence check would flag in Step 4.
Step 2: declare the migration
const builder = p.migration(v1, v2)
.map('user', 'user')
.map('user.name', 'user.name')
.map('user.age', 'user.years')
.mapEdge(
{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' },
{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' },
)
.mapEdge(
{ src: 'user', tgt: 'user.age', kind: 'prop', name: 'age' },
{ src: 'user', tgt: 'user.years', kind: 'prop', name: 'years' },
);
const mig = builder.compile();
MigrationBuilder is fluent: .map(srcVertex, tgtVertex) aligns vertices and .mapEdge(srcEdge, tgtEdge) aligns edges. Vertex mappings alone are not enough: without the edge maps the lift drops every field and you get back {}. .compile() returns a CompiledMigration exposing .lift(), .get(), and .put() (a migration is a lens). For value-level transforms that compute one field from another, reach for the lens DSL or panproto-lens-dsl from the SDK.
Step 3: classify the change
const report = p.diffFull(v1, v2).classify(proto);
console.log('compatible?', report.isCompatible);
console.log('breaking changes:', report.breakingChanges);
console.log('non-breaking changes:', report.nonBreakingChanges);
Panproto.diffFull(old, new) returns a FullDiffReport; calling .classify(protocol) returns a CompatReport with three booleans (isCompatible, isBackwardCompatible, isBreaking) and two arrays (breakingChanges, nonBreakingChanges). For human-readable output call report.toText(); for a machine-readable summary report.toJson().
For this rename, isCompatible is false: the diff classifies the removal of user.age/edge age as breaking, even though the migration we declared explicitly carries the field forward under a new name. The classifier looks only at the structural diff, not at the migration mapping, so a “rename” between schemas registers as a removal plus an addition. Whether that is a problem for your downstream consumers depends on whether they have been told about the migration.
Step 4: check before you lift
const existence = p.checkExistence(v1, v2, builder);
console.log('existence valid?', existence.valid);
console.log('existence errors:', JSON.stringify(existence.errors, null, 2));
if (!existence.valid) {
throw new Error('existence check failed');
}
Panproto.checkExistence(src, tgt, builder) runs the existence-condition test: for every required v2 field, is the mapping populated from a v1 source? It returns an ExistenceReport with valid: boolean and errors: ExistenceError[]. Each error is a serde-tagged object (for example { "RequiredFieldMissing": { "vertex": "user", "field": "name" } }), so inspect them as structured data rather than expecting a message string. Inspect valid before lifting; an invalid report means the lift will produce data that does not satisfy the v2 schema.
Step 5: lift the data
import { writeFileSync } from 'node:fs';
const v1Records = [
{ name: 'Alice', age: 30 },
{ name: 'Bob', age: 42 },
];
const v2Records = v1Records.map((r) => mig.liftJson(r, 'user'));
console.log('v2:', v2Records);
writeFileSync('data/v2.jsonl', v2Records.map((r) => JSON.stringify(r)).join('\n'));
mig.liftJson(record, rootVertex) round-trips a JSON-shaped record through the migration: the wrapper parses it as an instance of the source schema rooted at rootVertex, lifts it through the edge mapping, and emits the target shape as a JS object. (The lower-level mig.lift() exists too, but it expects already-encoded Instance bytes rather than JSON-native records.) The complement (the data discarded by the forward direction) is captured on the get/put path; see Step 6.
Step 6: confirm round-trip
const original = v1Records[0];
const { view, complement } = mig.getJson(original, 'user');
const back = mig.putJson(view, complement, 'user');
console.log('view:', view);
console.log('round-trip:', back);
mig.getJson(record, rootVertex) returns { view, complement }; mig.putJson(view, complement, rootVertex) returns the restored record. The complement carries the data the v2 shape does not see; together, get and put satisfy the round-trip laws. (The lower-level mig.get() / mig.put() exist for already-encoded Instance bytes; the *Json wrappers handle the JSON-native path.) The restored record’s field ordering is not preserved (object keys come out alphabetised), so structural equality, not JSON-string equality, is the right round-trip predicate.
What you built
A migration that is type-checked, classified, existence-checked, and reversible via the get/put pair. None of those properties was hand-asserted; each was checked by the panproto tooling.
Next
- Schema version control basics makes the v1/v2 history first-class with commits and branches.
- The plain-terms explanation of migrations is at Migrations as morphisms.
- For when you want to script field transforms beyond a one-liner: Apply field transforms and Reference: expression language.
Schema version control basics
You will turn a small TypeScript source file into a panproto repository, commit two versions of the schema parsed from that file, branch off a feature, and merge it back. About twenty minutes.
By the end you will have: a .panproto/ directory tracking schema history, two commits on main, a feature branch with a third commit merged back, and an attempted conflict you can inspect.
Prerequisites
The schema CLI installed (Install the CLI). This tutorial does not depend on the SDK; it stays at the CLI throughout.
A note on schema input
schema add accepts three kinds of input:
- a source file in a language the parser registry knows (TypeScript, Rust, Python, etc.), parsed via tree-sitter into a full-AST schema;
- a directory, parsed as a project into a unified schema; or
- a
.jsonfile whose content is a serialisedpanproto-schema::Schemastruct.
The hand-authored Schema JSON format is non-trivial (it’s the in-memory representation, not a pretty document language), so most workflows use the first two. This tutorial uses the source-file path.
Step 1: initialise
mkdir vcs-tutorial/src -p && cd vcs-tutorial
schema init
ls .panproto/
A .panproto/ directory appears, structurally similar to .git/. It has objects/, refs/, HEAD, and a small config.
Step 2: commit v1
Save the v1 user shape as src/user.ts:
export interface User {
name: string;
age: number;
}
Stage and commit:
schema add src/user.ts
schema commit -m "v1 user schema"
schema log --oneline
schema add runs the TypeScript tree-sitter parser, extracts a schema from the AST (vertices for the interface and its property types, edges for the properties, plus a fine layer of byte-position constraints used for round-tripping), and stages it. schema log --oneline now shows one commit with a blake3 hash.
Step 3: commit v2
Edit src/user.ts to rename age and add email:
export interface User {
name: string;
years: number;
email: string;
}
Stage and commit:
schema add src/user.ts
schema diff --staged
schema commit -m "v2: rename age to years, add email"
schema log --oneline
schema diff --staged shows the structural changes (a renamed property, a new property, plus the tree-sitter byte-position constraint deltas). Two commits now.
Step 4: branch
schema branch feature/handle
schema checkout feature/handle
You are on a new branch sharing history with main up to the v2 commit. Add a handle field:
export interface User {
name: string;
years: number;
email: string;
handle: string;
}
Then:
schema add src/user.ts
schema commit -m "add handle"
schema log --oneline
You see the linear history with the new commit on feature/handle.
Step 5: merge
schema checkout main
schema merge feature/handle
schema log --oneline
Since main is an ancestor of feature/handle (no conflicting work was done on main in parallel), the merge is a fast-forward and produces no merge commit. The output ends with Merge successful. and reports the merged schema’s vertex/edge counts. For a non-trivial three-way merge, pass --no-ff to force a merge commit.
Step 6: provoke a conflict
For a clean conflict demonstration, start fresh in a new repo so the conflicting branches diverge from the same commit:
cd .. && mkdir vcs-conflict/src -p && cd vcs-conflict
schema init
cat > src/user.ts <<'EOF'
export interface User {
name: string;
age: number;
}
EOF
schema add src/user.ts && schema commit -m "v1"
# Create both branches at v1 (each branch starts at HEAD).
schema branch feature/handle-str
schema branch feature/handle-int
schema checkout feature/handle-str
# edit src/user.ts: add handle: string;
schema add src/user.ts && schema commit -m "handle as string"
schema checkout main
schema merge feature/handle-str # fast-forward, succeeds
schema checkout feature/handle-int
# edit src/user.ts: add handle: number;
schema add src/user.ts && schema commit -m "handle as number"
schema checkout main
schema merge feature/handle-int
The second merge fails with conflict objects. For the type clash above you see entries like:
Merge produced 3 conflict(s):
BothAddedConstraintDifferently {
vertex_id: "src/user.ts::User::$2::$11",
sort: "literal-value",
ours_value: "string",
theirs_value: "number"
}
...
Error: × merge failed with 3 conflict(s)
Each conflict is a structural object reported by the pushout construction, not a textual three-way merge marker. The variants you can hit include BothAddedConstraintDifferently (used here, when two branches set the same constraint to different values) and UniversalFactorizationFailure (raised by panproto-vcs’s pushout when no universal merge object exists at all). To resolve, choose one branch’s shape, edit the source file, schema add, schema commit.
What you built
A schema history that you can navigate, branch, and merge with the same affordances as a git history, but where the merge operation is a precise structural construction rather than a three-way text merge.
Next
- Cross-protocol translation for translating the schema from JSON Schema to Protobuf.
- The plain-terms explanation of merge is at Schema version control semantics.
- The formal pushout construction is in Pushouts and merge.
Cross-protocol translation
You will translate the user schema from one shape to another using an explicit field-by-field mapping, then verify the translation reverses cleanly. About twenty minutes.
By the end you will have: two user schemas with different vertex and edge names expressing the same model, a CompiledMigration connecting them, a record converted from one shape to the other, and the reverse-direction restoration verified.
Prerequisites
Completed Schema version control basics. The my-first-schema/ project with @panproto/core installed.
State of the art
True cross-protocol translation (a JSON Schema document automatically converted into a Protobuf .proto) requires both schemas to be expressible against a single composed theory that the migration generator can align them in. Today that means either:
- both schemas in the same protocol with an explicit edge mapping (demonstrated here), or
- both schemas in a custom composed theory authored via the theory DSL (see Composing protocols by colimit), or
- a hand-authored lens in the lens DSL bridging the two (see Write a lens (DSL)).
The CLI’s schema lens subcommands currently resolve --protocol against the built-in atproto only; everything multi-protocol lives in the SDK. The auto-aligning Panproto.lens(from, to) works at the WASM lens level but returns raw WInstance graphs rather than JS-native records, so it is not the right tool for a tutorial. We drive the translation from Panproto.migration(from, to) instead: its liftJson/getJson/putJson wrappers handle JSON encoding/decoding for you.
Step 1: build the source schema
Create src/cross.ts:
import { Panproto } from '@panproto/core';
const p = await Panproto.init();
const atproto = p.protocol('atproto');
const source = atproto.schema()
.vertex('user', 'object')
.vertex('user.name', 'string')
.vertex('user.email', 'string')
.vertex('user.years', 'integer')
.edge('user', 'user.name', 'prop', { name: 'name' })
.edge('user', 'user.email', 'prop', { name: 'email' })
.edge('user', 'user.years', 'prop', { name: 'years' })
.build();
Step 2: build the target schema
A second atproto schema for the same user model with different field names (display_name instead of name, email_address instead of email):
const target = atproto.schema()
.vertex('user', 'object')
.vertex('user.display_name', 'string')
.vertex('user.email_address','string')
.vertex('user.years', 'integer')
.edge('user', 'user.display_name', 'prop', { name: 'display_name' })
.edge('user', 'user.email_address', 'prop', { name: 'email_address' })
.edge('user', 'user.years', 'prop', { name: 'years' })
.build();
The shapes differ structurally (different vertex ids, different edge names) but represent the same information.
Step 3: declare the migration
const mig = p.migration(source, target)
.map('user', 'user')
.map('user.name', 'user.display_name')
.map('user.email', 'user.email_address')
.map('user.years', 'user.years')
.mapEdge(
{ src: 'user', tgt: 'user.name', kind: 'prop', name: 'name' },
{ src: 'user', tgt: 'user.display_name', kind: 'prop', name: 'display_name' },
)
.mapEdge(
{ src: 'user', tgt: 'user.email', kind: 'prop', name: 'email' },
{ src: 'user', tgt: 'user.email_address', kind: 'prop', name: 'email_address' },
)
.mapEdge(
{ src: 'user', tgt: 'user.years', kind: 'prop', name: 'years' },
{ src: 'user', tgt: 'user.years', kind: 'prop', name: 'years' },
)
.compile();
.map(srcVertex, tgtVertex) aligns vertices; .mapEdge(srcEdge, tgtEdge) aligns edges. You need both: vertex-only mappings produce a migration that drops every field on lift. For larger schemas, Panproto.lens(from, to) and LensHandle.autoGenerate will infer many of these alignments via name similarity and structural priors, but the resulting LensHandle operates on opaque WInstance bytes rather than JS-native records.
Step 4: convert a record
const alice = { name: 'Alice', email: 'alice@example.com', years: 30 };
const converted = mig.liftJson(alice, 'user');
console.log('converted:', converted);
mig.liftJson(record, rootVertex) round-trips JSON through the migration: it parses the input against source, lifts it through the edge mapping, and emits the result in target’s shape as a plain JS object. You see:
{ display_name: 'Alice', email_address: 'alice@example.com', years: 30 }
Step 5: round-trip via get/put
const { view, complement } = mig.getJson(alice, 'user');
console.log('view:', view);
const back = mig.putJson(view, complement, 'user');
console.log('back:', back);
mig.getJson(record, rootVertex) returns { view, complement }. view is the target-shape projection (same as liftJson’s output); complement carries the encoding state the forward projection sets aside. mig.putJson(view, complement, rootVertex) reverses the translation, restoring the source-shape record:
view: { display_name: 'Alice', email_address: 'alice@example.com', years: 30 }
back: { email: 'alice@example.com', name: 'Alice', years: 30 }
The restored record’s field ordering is not preserved (object keys come out alphabetised), so structural equality, not JSON-string equality, is the right round-trip predicate.
Step 6: dispose
mig[Symbol.dispose]();
CompiledMigration holds a WASM-side resource. Call [Symbol.dispose]() explicitly or use a using declaration so it is released at scope exit:
using mig = p.migration(source, target). /* ... */ .compile();
What you built
Two schemas expressing the same model with different field names, an explicit field-by-field migration between them, a forward conversion of a record, and a verified reverse-direction restoration. The same pattern works for any pair of schemas in a single protocol; for genuinely cross-protocol pairs (e.g. JSON Schema ↔ Protobuf), express both against a composed theory authored via the theory DSL, or author the bridge in the lens DSL.
See also
- Translate across protocols for the operational how-to.
- Write a lens (DSL) for hand-authored translations.
- Composing protocols by colimit for the model.
- Theory DSL: denotational semantics.
Next
- The plain-terms explanation of cross-protocol translation is at Composing protocols by colimit.
- For non-trivial pairs of protocols, the auto-derived translation may be a starting point; Translate across protocols covers when to extend it by hand.
- For the formal account of how the colimit makes this possible: Pushouts and merge.
Where to go from here
You have walked through the four core flows of panproto: defining schemas, evolving them via migrations, version-controlling the history, and translating between protocols. From here:
- The how-to guides cover specific workflows in depth (CI, lenses, format-preserving codecs).
- The reference quadrant is the lookup for everything: CLI, SDKs, protocols, expression language, lens combinators, configuration.
- The explanation quadrant is for understanding why the system is shaped the way it is.
How-to guides
How-to pages are recipes. Each one assumes you know what you are trying to do and need the steps to do it. Every page follows the same skeleton:
- Prerequisites. What must be installed and what state your project must be in.
- The task. The minimum sequence of commands or code to accomplish the goal.
- Verification. How to confirm the task succeeded.
- Common mistakes. Failure modes and how to recognise them.
- See also. Adjacent how-tos, the relevant reference page, and the explanation page that covers the underlying model.
If you want to understand why a step works, follow the link to the explanation quadrant. If you want a complete walk-through with no prior context, start with a tutorial.
Index
Setup
Schemas and migrations
- Define a schema (CLI, TypeScript, Python, Rust)
- Build a migration
- Apply field transforms
- Use lenses, protolenses, dependent optics
- Write lenses in the lens DSL
- Build a custom protocol
Working with data
- Query instances
- Convert data between formats
- Round-trip with format preservation
- Parse full ASTs
- Decorate an abstract schema
Version control
Translation and integration
CI
Install panproto
panproto ships in four distributions, one per surface. Pick the one that matches what you are building.
| Surface | Page | Package |
|---|---|---|
Command line (schema) | Install the CLI | panproto-cli (Homebrew, shell installer, cargo install) |
| Rust application | Install the Rust SDK | panproto-core (crates.io) |
| TypeScript / JavaScript application | Install the TypeScript SDK | @panproto/core (npm) |
| Python application | Install the Python SDK | panproto (PyPI) |
The CLI and the language SDKs are independent: you can install any subset.
See also
- Reference: configuration for the
panproto.tomlmanifest. - Tutorial: your first schema once installed.
Install the CLI
The CLI is a single binary called schema. It is the entry point for the panproto-cli crate.
Prerequisites
A POSIX shell on macOS or Linux, or PowerShell on Windows. No language toolchain is required; pre-built binaries are published with every release.
Install
Homebrew (macOS, Linux)
brew install panproto/tap/panproto-cli
Shell installer (macOS, Linux, WSL)
curl --proto '=https' -LsSf https://github.com/panproto/panproto/releases/latest/download/panproto-cli-installer.sh | sh
PowerShell installer (Windows)
powershell -ExecutionPolicy ByPass -c "irm https://github.com/panproto/panproto/releases/latest/download/panproto-cli-installer.ps1 | iex"
From source
cargo install panproto-cli
Requires a Rust toolchain (1.85 or newer).
Verification
schema --version
prints the installed version. The full subcommand list is at Reference: CLI, or schema --help.
Common mistakes
- Installing through
cargo installwithout an up-to-date toolchain. panproto requires Rust 1.85+. - Mixing the Homebrew install with a from-source install on the same machine: only one
schemaends up first onPATH.
See also
Install the Rust SDK
Prerequisites
A Rust toolchain at edition 2024 (toolchain 1.85+).
Install
# Cargo.toml
[dependencies]
panproto-core = "0.49"
For specific feature flags (full-parse, project, git, llvm, jit, tree-sitter), see Reference: Rust SDK.
Verification
use panproto_core::protocols::atproto;
use panproto_core::schema::SchemaBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let proto = atproto::protocol();
let schema = SchemaBuilder::new(&proto)
.vertex("root", "record", Some("app.example.root"))?
.entry("root")
.build()?;
println!("built {} vertex(es)", schema.vertices.len());
Ok(())
}
cargo run builds and links against the panproto facade.
Common mistakes
- Pinning a stale toolchain.
panproto-corerequires edition 2024. - Depending on lower-level crates (
panproto-schema,panproto-mig, …) directly without a strong reason. The facade re-exports the canonical surface; reach past it only when you need an internal API.
See also
- Reference: Rust SDK for feature flags and crate selection.
- Define a schema from Rust.
- Crate map.
Install the TypeScript SDK
Prerequisites
Node 20 or newer. A package manager (npm, pnpm, or yarn). A bundler with WASM support if targeting the browser (Vite, Rollup, esbuild, webpack 5+).
Install
npm install @panproto/core
# or
pnpm add @panproto/core
# or
yarn add @panproto/core
Verification
import { Panproto } from '@panproto/core';
const p = await Panproto.init();
console.log(p.listProtocols());
Panproto.init() loads the WASM module and returns the top-level handle. Calling p.listProtocols() enumerates the built-in protocols and confirms the binding is wired up.
Common mistakes
- Forgetting to
await Panproto.init(). The WASM load is asynchronous; using the handle beforeinitresolves throws. - Bundler that does not understand
.wasmimports. Vite handles this out of the box; older webpack configurations may need a loader. - Running under Node 18 or earlier. WebAssembly bigint coercion under those versions is incomplete; some integer operations will throw.
See also
Install the Python SDK
Prerequisites
Python 3.13 or newer. A virtual environment is recommended.
Install
pip install panproto
The wheel includes native PyO3 bindings (no WASM) and eleven core tree-sitter grammars. Additional grammar packs are installed separately:
pip install panproto-grammars-functional # Haskell, OCaml, Elm, Erlang, Elixir, ...
pip install panproto-grammars-web # HTML, CSS, Vue, Svelte, ...
pip install panproto-grammars-all # umbrella package
The full table of packs is in Reference: Python SDK.
Verification
import panproto
print(panproto.list_builtin_protocols()[:3])
The native module loads at import time (no async wrapper, unlike the TypeScript SDK). Listing a few of the built-in protocols confirms the linkage. The full top-level surface is in Reference: Python SDK.
Common mistakes
- Running under Python < 3.13. The wheel uses 3.13-only typing constructs in its public API.
- Installing the deprecated pure-Python WASM SDK in parallel. The native PyO3 wheel supersedes it; remove the older package.
- Importing companion grammar packs before
panproto. The packs auto-register on import; the order matters only when both packs andpanprotoare imported in the same module.
See also
Define a schema
A schema in panproto is a graph of vertices and edges, validated against a protocol’s GAT. You can build one from any of the four surfaces, with the same outcome.
| Surface | Page |
|---|---|
schema CLI | From the CLI |
| TypeScript SDK | From TypeScript |
| Python SDK | From Python |
| Rust SDK | From Rust |
The CLI is the recommended starting point if you have an existing schema file (JSON Schema, Protobuf, ATProto Lexicon, …) you want to load and inspect. The language SDKs are the recommended starting point if you are building schemas programmatically inside an application.
See also
- Reference: protocol catalogue for the list of supported protocols.
- Schemas as theories for the model.
Define a schema from the CLI
Prerequisites
The schema binary installed (Install the CLI). A schema file in a supported protocol, or a protocol name to scaffold an empty one.
The task
Validate an existing schema
schema validate --protocol atproto path/to/schema.json
Exits zero on success. On failure, prints the failing equation or constraint with the offending vertex and edge.
Scaffold from an existing schema
schema scaffold --protocol atproto schemas/post.json
scaffold runs free-model construction over the given schema and prints sample term assignments to stdout. Use --json for machine-readable output, --depth and --max-terms to bound the search. (To start from nothing, write a one-vertex schema by hand and scaffold against that.)
Inspect
schema diff schemas/post-v1.json schemas/post-v2.json
diff reports vertex and edge changes between two schemas. Inside a panproto repository, schema show <ref> resolves a commit, schema, or migration object and prints its contents.
Verification
After validation, run:
schema verify --protocol atproto path/to/schema.json
verify checks that the schema satisfies every equation in the protocol’s theory, not just the constraints you wrote. A pass guarantees the schema is well-formed in the categorical sense.
Common mistakes
- Forgetting
--protocol. Many subcommands require an explicit protocol; auto-detection happens only at the project level when apanproto.tomlis present. - Running
schema validatewhen you meanschema check(the latter checks a migration, not a schema).
See also
- Reference: CLI for the full subcommand list.
- Tutorial: your first schema.
Define a schema from TypeScript
Prerequisites
@panproto/core installed (Install the TypeScript SDK).
The task
import { Panproto } from '@panproto/core';
const p = await Panproto.init();
const proto = p.protocol('atproto');
const schema = proto.schema()
.vertex('post', 'record', { nsid: 'app.bsky.feed.post' })
.vertex('post:body', 'object')
.vertex('post:body.text', 'string')
.edge('post', 'post:body', 'record-schema')
.edge('post:body', 'post:body.text', 'prop', { name: 'text' })
.build();
p.protocol(name) loads the named protocol’s theory. proto.schema() returns a SchemaBuilder. Each .vertex() call adds a vertex (record kind) with optional constraints; each .edge() call adds an edge (field, item, or variant) between vertices. .build() validates the constructed schema against the protocol’s theory and returns a Schema handle.
Verification
const result = schema.validate(proto);
if (!result.isValid) throw new Error(JSON.stringify(result.issues));
validate(protocol) returns a ValidationResult containing any issues. An empty issue list confirms the schema satisfies the protocol’s edge rules and obj-kinds.
Common mistakes
- Calling
.build()before all required edges are added. The validation runs eagerly; missing required structure raises immediately. - Re-using a builder after
.build(). Builders are single-shot; create a new one for each schema. - Treating the returned
Schemahandle as a plain object. It is an opaque handle into the WASM heap; pass it to subsequent SDK calls, do not introspect it directly.
See also
- Reference: TypeScript SDK.
- Build a migration for what to do with the schema next.
- Tutorial: your first schema.
Define a schema from Python
Prerequisites
panproto installed (Install the Python SDK).
The task
import panproto
proto = panproto.get_builtin_protocol("atproto")
b = proto.schema()
b.vertex("user", "object", "app.example.user")
b.vertex("user:name", "string")
b.vertex("user:age", "integer")
b.edge("user", "user:name", "prop", "name")
b.edge("user", "user:age", "prop", "age")
schema = b.build()
panproto.get_builtin_protocol(name) returns the named protocol; .vertex(id, kind, nsid=None) and .edge(src, tgt, kind, name=None) each mutate the SchemaBuilder in place (returning None), and .build() validates and returns a Schema. The TypeScript SDK exposes the same operations as a chainable surface; the Python binding does not.
Call panproto.list_builtin_protocols() to see every registered protocol name (atproto, openapi, cddl, geojson, …).
Verification
issues = schema.validate(proto)
assert not issues, issues
Schema.validate(protocol) returns a list of validation messages; an empty list means the schema is well-formed.
Common mistakes
- Chaining the builder calls. The Python
SchemaBuilder.vertex(...)/edge(...)/constraint(...)methods mutate in place and returnNone; hold the builder in a variable and mutate it statement-by-statement, then call.build(). - Using a Python dict where the SDK expects a
Schemahandle. Conversion is deliberate; to materialise anInstancefrom bytes against a builtSchema, usepanproto.IoRegistry().parse(protocol, schema, data).
See also
Define a schema from Rust
Prerequisites
panproto-core in your Cargo.toml (Install the Rust SDK).
The task
use panproto_core::protocols::atproto;
use panproto_core::schema::SchemaBuilder;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let proto = atproto::protocol();
let schema = SchemaBuilder::new(&proto)
.vertex("user", "object", Some("app.example.user"))?
.vertex("user:name", "string", None)?
.vertex("user:age", "integer", None)?
.edge("user", "user:name", "prop", Some("name"))?
.edge("user", "user:age", "prop", Some("age"))?
.entry("user")
.build()?;
println!("{} vertices, {} edges", schema.vertices.len(), schema.edges.len());
Ok(())
}
SchemaBuilder::new(&protocol) constructs the builder; each vertex and edge call validates against the protocol’s edge rules and obj-kind list. entry declares basepoint vertices for instance rooting. build returns a Schema (or a SchemaError).
Verification
use panproto_core::schema::{SchemaBuilder, validate};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let proto = panproto_core::protocols::atproto::protocol();
let schema = SchemaBuilder::new(&proto)
.vertex("user", "object", Some("app.example.user"))?
.entry("user")
.build()?;
let errors = validate(&schema, &proto);
assert!(errors.is_empty(), "validation errors: {errors:?}");
Ok(()) }
validate returns a Vec<ValidationError>. The error carries the failing equation and the offending vertex or edge as structured fields.
Common mistakes
- Reaching past
panproto-coreto lower-level crates without a reason. The facade re-exports the canonical surface; do not depend onpanproto-schemadirectly unless you need an internal API. - Holding
Schemaacross anawait. Handles are notSendby default; clone them or restructure the call.
See also
Build a migration
A migration is a structured map between two schemas plus, optionally, value-level transforms applied during lift. This page covers building one from the CLI and from the SDKs.
Prerequisites
Two schemas in the same protocol (or compatible protocols). The schema CLI installed, or one of the language SDKs.
The task
From the CLI
Given schemas/v1.json and schemas/v2.json, plus a mapping file migrations/v1-to-v2.json:
schema check --src schemas/v1.json --tgt schemas/v2.json --mapping migrations/v1-to-v2.json
check runs the existence check: which fields in v2 require which fields in v1, and is every required input present. Exits zero if the migration is well-defined.
To also type-check at the GAT level (equivalent to a separate schema typecheck):
schema check --src schemas/v1.json --tgt schemas/v2.json --mapping migrations/v1-to-v2.json --typecheck
For schema-level diff classification, generate a lens between the two schemas:
schema lens generate --protocol atproto schemas/v1.json schemas/v2.json --save lens.json
schema diff schemas/v1.json schemas/v2.json --lens --save lens.json
To migrate data, use the VCS-driven path: commit v1 and v2 to a panproto repository, then run schema data migrate <data-dir> against the working tree (see Schema VCS data versioning).
From the SDKs
const mig = p
.migration(srcSchema, tgtSchema)
.map('user', 'user')
.compile();
const { data: forward } = mig.lift(oldRecord); // forward
const { view, complement } = mig.get(oldRecord); // forward, retaining complement
mig.put(view, complement); // backward
p.checkExistence(src, tgt, builder) runs the same existence check as the CLI. Python and Rust SDKs use the same shape with language-idiomatic naming.
Verification
schema check exits zero if the migration is well-defined (existence conditions hold). For diff classification, use panproto.diff_and_classify(old, new, protocol) in Python, or panproto_check::diff(old, new) followed by panproto_check::classify(&diff, &protocol) in Rust. In TypeScript the equivalent is Panproto.diffFull(old, new).classify(protocol), which returns a CompatReport with one of:
| Classification | Meaning |
|---|---|
fully-compatible | Old data lifts unchanged; the migration is a refinement. |
backward-compatible | Old data lifts via a value-level transform. |
breaking | Some old records cannot be lifted; CI should reject. |
For wiring this into CI, see Breaking-change gate.
Common mistakes
- Skipping
--typecheckfor non-trivial migrations. Existence checking does not catch GAT-level type errors; the--typecheckflag does. - Treating a
breakingclassification as a warning. CI should reject by default; merging a breaking migration without an explicit acknowledgement is the most common cause of data corruption in production. - Lifting data before the check passes. Lift can produce invalid output if the migration is not well-defined.
See also
- Reference: CLI for the full subcommand list.
- Apply field transforms for value-level transforms.
- Migrations as morphisms for the model.
Apply field transforms
A field transform is a value-level expression applied during migration: a way to compute the new field’s value from the old data. Transforms are written in the expression language.
Prerequisites
A migration mapping between two schemas. The expression language reference for available builtins.
The task
Inline a transform in a mapping file
The mapping JSON consumed by schema check is a serialized Migration. Field-level value transforms attach as expression resolvers indexed by the (src_vertex, tgt_vertex) pair they bridge:
{
"vertex_map": {
"user": "user",
"user:first": "user:given_name"
},
"edge_map": [],
"hyper_edge_map": {},
"label_map": [],
"resolver": [],
"hyper_resolver": [],
"expr_resolvers": [
[["user:full_name", "user:full_name"],
"\\r -> concat(r.first, \" \", r.last)"]
]
}
Each expr_resolvers entry is [[src_vertex, tgt_vertex], expression]. The expression is parsed by panproto-expr-parser and applied during lift (old to new). Backward direction is supplied by the lens/protolens layer rather than the mapping file: pair an ApplyExpr field transform with its inverse on the corresponding Protolens step, or annotate a coercion on the schema and let the migration compiler emit both directions.
From the SDKs
The TypeScript and Python SDKs do not yet expose per-field transforms on the MigrationBuilder. To compose a migration with a value-level rewrite, build a ProtolensChain directly (combinators::rename_field, elementary::apply_expr, …) and call compile_migration on it:
import panproto
# `panproto.rename_field`, `add_field`, `remove_field`, `hoist_field`,
# and `pipeline` each return a `ProtolensChain`. Serialize and apply via
# `chain.to_json()` and the lens APIs.
chain = panproto.rename_field("user", "full_name", "full_name", "name")
print(chain.to_json())
To attach a value-level expression resolver to a Migration between two specific schemas, write the mapping JSON by hand (see above) and call panproto.compile_migration(migration, src_schema, tgt_schema).
Verification
schema check --src schemas/v1.json --tgt schemas/v2.json --mapping migration.json --typecheck
check --typecheck ensures the transforms type-check against the source and target schemas. Property tests in CI then verify the lens laws on sampled data.
Common mistakes
- Expecting
expr_resolversto supply both directions. The mapping carries forward-only expressions; the backward leg comes from the lens/protolens layer (pair anApplyExprfield transform with its inverse on the corresponding protolens step, or annotate a coercion on the schema and let the migration compiler emit both directions). - Using IO or random functions in the expression. The language is bounded-pure; non-deterministic builtins are not exposed.
- Letting the budget exceed. Long string operations on large records can hit the step budget. Expressions that hit the budget raise
ExprError::StepLimitExceededat runtime.
See also
- Reference: expression language for builtins and types.
- Build a migration for the surrounding workflow.
- Lenses and round-trip laws for why
backwardmatters.
Use lenses
Every migration is a lens. The lens API gives you direct access to the bidirectional transform: get lifts data forward, put projects new data back to the old shape, complement records what get discarded so put can restore it.
Prerequisites
A migration (Build a migration) or a hand-written lens via the lens DSL.
The task
A CompiledMigration is itself a lens; reach for LensHandle only when you want a free-standing protolens chain.
const { view, complement } = mig.get(oldRecord);
const editedView = { ...view, age: view.age + 1 };
const { data: updatedOld } = mig.put(editedView, complement);
mig.get returns the forward view together with the complement (the data discarded by get); mig.put consumes them and returns a LiftResult { data, ... } reconstructed with the edit applied. The round-trip laws guarantee this is well-defined.
To compose two compiled migrations sequentially:
const composed = p.compose(mig_ab, mig_bc);
To compose two free-standing protolens chains:
const composedChain = p.composeLenses(chainAB, chainBC);
Both are methods on Panproto; composition fails (throws) if the intermediate schemas do not chain.
Verification
const result = lens.checkLaws(instanceBytes);
console.log(result.holds, result.violation);
// For individual laws:
const getput = lens.checkGetPut(instanceBytes);
const putget = lens.checkPutGet(instanceBytes);
LensHandle.checkLaws(instance) returns a LawCheckResult { holds, violation } covering GetPut and PutGet together. checkGetPut and checkPutGet test each law individually; the Rust property tests in panproto-lens cover PutPut as well, exercised continuously in CI.
Common mistakes
- Calling
putwith a complement from a different source.Complement::composewill refuse withComplementFingerprintMismatch. Recompute the complement from the current source rather than reusing one. - Reading
getand then mutating the source before callingput. The complement is computed against the source as it was atgettime; if you mutate the source, the complement is stale. - Composing lenses whose intermediate schemas are isomorphic but not equal. The structural-equality check on
protolens_composablewill reject; rebuild one of the lenses against the other’s schema.
See also
- Reference: lens combinators.
- Lenses and round-trip laws.
- Lens DSL: denotational semantics.
- Use protolenses, Use dependent optics.
Use protolenses
A protolens is a schema-parameterised lens family: one declaration that produces a lens for any schema satisfying a precondition. Use protolenses when you want to apply the same transform across a fleet of related schemas without writing one lens per schema.
Prerequisites
The Rust SDK (panproto-lens::protolens) or the lens DSL with parametric declarations enabled.
The task
Declare
A Protolens packages a precondition (a TheoryConstraint on the source theory) with a TheoryTransform. Build elementary protolenses via the elementary helpers, or compose them into a ProtolensChain:
use panproto_lens::protolens::{ProtolensChain, combinators};
fn main() {
let rename_legacy_id: ProtolensChain = combinators::rename_field(
"user", "user:legacy_id", "legacy_id", "id",
);
let _ = rename_legacy_id;
}
The chain captures a precondition on the source theory and a sequence of transforms. It does not yet know which concrete schema it will run against.
Apply (fused)
use panproto_lens::protolens::{ProtolensChain, combinators};
use panproto_core::schema::{Protocol, Schema, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let rename_legacy_id: ProtolensChain = combinators::rename_field("user", "user:legacy_id", "legacy_id", "id");
let protocol: Protocol = panproto_core::protocols::atproto::protocol();
let user_schema: Schema = SchemaBuilder::new(&protocol).vertex("user", "object", None)?.entry("user").build()?;
let post_schema: Schema = user_schema.clone();
let lens_for_users = rename_legacy_id.instantiate(&user_schema, &protocol)?;
let lens_for_posts = rename_legacy_id.instantiate(&post_schema, &protocol)?;
let _ = (lens_for_users, lens_for_posts);
Ok(()) }
Each call produces a concrete Lens against a specific schema, with the migration metadata preserved as a single fused morphism.
Apply (sequential)
use panproto_lens::protolens::{ProtolensChain, combinators};
use panproto_core::schema::{Protocol, Schema, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let rename_legacy_id: ProtolensChain = combinators::rename_field("user", "user:legacy_id", "legacy_id", "id");
let protocol: Protocol = panproto_core::protocols::atproto::protocol();
let base_schema: Schema = SchemaBuilder::new(&protocol).vertex("user", "object", None)?.entry("user").build()?;
let stepwise = rename_legacy_id.instantiate_sequential(&base_schema, &protocol)?;
let _ = stepwise;
Ok(()) }
Sequential instantiation yields one lens per step; useful in property tests when each intermediate state needs to be inspected.
Compose
use panproto_lens::protolens::{Protolens, vertical_compose, elementary};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let first: Protolens = elementary::drop_sort("old_field");
let second: Protolens = elementary::drop_sort("legacy_id");
let composed = vertical_compose(&first, &second)?;
let _ = composed;
Ok(()) }
vertical_compose requires the target endofunctor of first to structurally match the source endofunctor of second. A mismatch returns LensError.
Verification
After instantiate returns a Lens, exercise the round-trip laws on representative data via Lens::get / Lens::put (or use the higher-level lens-law harness in panproto_lens::laws). Property tests in crates/panproto-lens/tests/ are the canonical examples.
Common mistakes
- Composing protolenses whose intermediate schemas only happen to look the same.
protolens_composableenforces structural equality, not name equality; build one against the other to be safe. - Reaching for sequential instantiation in production. Use fused (
instantiate) by default; sequential exists for inspection and tests. - Treating preconditions as pure documentation. The precondition is checked at instantiation time; a schema that does not satisfy it raises
LensError::ProtolensErrorwith a message listing the unmet constraints (useProtolens::check_applicabilityfirst if you want to surface the reasons separately).
See also
- Reference: lens combinators.
- Protolens composition for the formal model.
- Use lenses for the non-parametric case.
Use dependent optics
A dependent optic is an optic-kind chosen by the schema edge it is applied at: a prop edge yields a Lens, an item edge yields a Traversal, a variant edge yields a Prism. Dependent optics let you write transforms that work uniformly across all three edge kinds, with the optic kind unified at instantiation.
Prerequisites
The Rust SDK. The panproto-lens crate (re-exported from panproto-core::lens).
The task
Apply a scoped transform
#![allow(unused)]
fn main() {
use panproto_lens::protolens::elementary;
use panproto_lens::optic::OpticKind;
// An elementary step is itself a Protolens whose optic kind is derived
// from the edge kind at the focus.
let inner = elementary::rename_edge_name("post", "tags", "tags", "labels");
let scoped = elementary::scoped("post:tags", inner);
match scoped.optic_kind() {
OpticKind::Lens => { /* prop edge */ }
OpticKind::Traversal => { /* item edge */ }
OpticKind::Prism => { /* variant edge */ }
other => panic!("unexpected optic kind: {other:?}"),
}
}
elementary::scoped wraps an inner protolens at the given focus vertex. At the instance level the optic kind follows from the edge kind connecting the parent to the focus vertex: prop -> Lens, item -> Traversal, variant -> Prism.
Field-level combinators
The panproto_lens::protolens::combinators module exposes higher-level chains assembled from elementary steps; for instance combinators::rename_field(parent, field, old_name, new_name) returns a ProtolensChain that renames a JSON property key. To traverse an item edge and apply an inner transform per element, use elementary::scoped (or the combinators::map_items helper) targeting the array element’s vertex; the result is a Traversal under the optic-kind classifier.
Verification
The lens laws apply uniformly across all three optic kinds. Instantiate the protolens into a Lens against a concrete schema, then call panproto_lens::laws::check_laws(&lens, &instance) (or check_get_put / check_put_get individually) on representative data; each returns Result<(), LawViolation>.
Common mistakes
- Hand-coding the optic kind. The point of dependent optics is that the kind follows from the edge; if you find yourself branching on it manually, read it off
Protolens::optic_kind()(orProtolensChain::composed_optic_kind()for chains) and let the schema decide. - Applying a
prop-style combinator at anitemedge. The carrier optic at anitemedge is a Traversal, sorefine_scoped_opticcomposes the inner Lens with Traversal to yield Traversal: the round-trip laws still hold, but the result is a multi-focus traversal rather than a single-focus lens, which is rarely what the call site meant.
See also
Write lenses in the lens DSL
The lens DSL is a declarative way to specify a lens between two schemas. Specs are written in Nickel, JSON, or YAML and compile to the lens combinator algebra.
Prerequisites
A pair of schemas to bridge. The schema CLI or panproto-lens-dsl crate.
The task
Write the spec (Nickel)
# lenses/user-v1-to-v2.ncl
{
id = "user.v1-to-v2",
description = "Rename name fields and add display_name",
steps = [
{ rename_field = { from = "first_name", to = "given_name" } },
{ rename_field = { from = "last_name", to = "family_name" } },
{ add_field = {
name = "display_name",
default = "",
expr = "Concat(record.given_name, \" \", record.family_name)",
} },
],
}
Each step is a single-key object. The key picks the variant; the value carries its parameters. The DSL applies steps left-to-right against the source schema, producing a target schema and a CompiledLens between them. Full step grammar: crates/panproto-lens-dsl/src/document.rs.
Compile a chain
schema lens does not consume .ncl/.json/.yaml lens documents directly through the CLI. The DSL is compiled by the panproto-lens-dsl library; the CLI works on the compiled ProtolensChain it produces. To go from a pair of schemas to a chain via the CLI, use:
schema lens generate \
--protocol atproto \
schemas/user-v1.json \
schemas/user-v2.json \
--save lenses/user-v1-to-v2.json
This auto-derives a chain. To author a chain by hand, write the lens DSL and call panproto_lens_dsl::load and panproto_lens_dsl::compile from Rust (or the equivalent SDK calls).
Apply
schema lens apply --protocol atproto lenses/user-v1-to-v2.json data/users.json
Inspect
schema lens inspect --protocol atproto lenses/user-v1-to-v2.json
Prints the combinator chain. --protocol is required.
Compile from Python
ProtolensChain exposes loaders that consume a lens-DSL document and compile it directly to a chain anchored at the named body vertex of the source schema:
import panproto
chain = panproto.ProtolensChain.from_dsl_path(
"lenses/user-v1-to-v2.ncl",
body_vertex="record:body",
)
# Or from a string:
chain = panproto.ProtolensChain.from_dsl_json(json_source, "record:body")
chain = panproto.ProtolensChain.from_dsl_yaml(yaml_source, "record:body")
chain = panproto.ProtolensChain.from_dsl_nickel(nickel_source, "record:body")
from_dsl_path dispatches on file extension (.ncl / .json / .yaml / .yml). For Nickel, an optional import_paths argument extends the import-resolution lookup so user-defined modules can be referenced.
Verification
# Check the chain is applicable to every schema in a directory.
schema lens check --protocol atproto lenses/user-v1-to-v2.json schemas/
# Verify the lens laws on test data.
schema lens verify --protocol atproto data/users.json schemas/user-v2.json
lens check reports applicability without instantiating; lens verify checks the round-trip laws on actual data.
Common mistakes
- Step ordering. The DSL deliberately exposes ordering; some sequences commute and produce the same result, others do not. When in doubt, run
schema lens checkwith high sample counts. - Forgetting
backwardon afield_transformstep. Without it, the step is half a lens; compilation rejects. - Writing schemas inline. The DSL expects path references; embedding a schema as a literal works for small examples but loses VCS tracking.
See also
Query instances
A panproto instance is a graph of records. Queries filter and project that graph using the expression language.
Prerequisites
A schema and an instance loaded against it. The expression-language reference for predicates.
The task
Filter and project
executeQuery is exported from @panproto/core. Its signature is executeQuery(query, instance, wasm) (query first, instance second, WASM module third). The query keys are anchor, predicate (an Expr object, not a string), projection (string array), groupBy, limit, and path.
import { executeQuery, parseExpr, ExprBuilder } from '@panproto/core';
const recent = executeQuery(
{
anchor: 'post',
predicate: parseExpr('post.created_at > "2024-01-01"'),
projection: ['id', 'title'],
},
instance,
panproto._wasm,
);
The predicate is evaluated against each matched node; projection selects the fields included in each QueryMatch.
Following edges
Use the path field to traverse from the anchor before predicate matching:
const userPosts = executeQuery(
{ anchor: 'user', path: ['authored'], projection: ['title'] },
instance,
panproto._wasm,
);
To filter on edge presence, build a predicate with the instance-aware builtins (Edge, Children, HasEdge, EdgeCount) using ExprBuilder or parseExpr.
Verification
Predicates are type-checked against the schema before evaluation; an ill-typed predicate raises before any data is touched.
Common mistakes
- Using
Edge,Children, etc. in the standard evaluator. They returnNulloutside an instance environment. The query API sets the environment correctly; reaching for the bare evaluator does not. - Hitting the step budget on large records. Heavy string operations on long fields will exceed the budget; either narrow the predicate or raise the budget at the call site.
See also
Convert data between formats
panproto’s parse/migrate/emit pipeline converts data between any two protocols whose theories overlap. The intermediate is a panproto-inst instance graph; you do not need to write a transcoder per format pair.
Prerequisites
The schema CLI installed, or the SDK in your language. The source and target protocol names.
The task
Single file or directory (within one protocol)
schema data convert --protocol atproto \
--from schemas/user-v1.json --to schemas/user-v2.json \
data/users.json -o data/users-v2.json
<DATA> is a positional file or directory. --from and --to are schema paths (within the named protocol), not protocol names. Add --direction backward to push data the other way along the lens; add --defaults k=v,... to supply complement defaults.
For batch conversion, point <DATA> at a directory.
Across protocols
Cross-protocol conversion goes through a composed theory; see Translate across protocols. schema data convert is intra-protocol only.
From the SDKs
const out = await p.convert(jsonBytes, {
from: srcSchema,
to: tgtSchema,
});
Panproto.convert(data, opts) auto-generates a lens between the two BuiltSchema arguments, applies it forward, and returns the converted record. Pass defaults in opts to supply complement defaults for fields the source does not determine.
Verification
To verify round-trip fidelity, generate a chain explicitly and run schema lens verify:
schema lens generate --protocol atproto schemas/user-v1.json schemas/user-v2.json --save chain.json
schema lens verify --protocol atproto data/users.json schemas/user-v2.json
Lens verification on test data exercises the three round-trip laws (GetPut, PutGet, PutPut); a pass means the chain is loss-free for the sampled records.
Common mistakes
- Assuming all schema pairs are loss-free. They are not. Run
schema lens verifyafter conversion to exercise the round-trip laws on representative data, and runschema check --src --tgt --mapping ... --typecheckto classify the diff as fully compatible, backward compatible, or breaking. - Skipping lens verification in CI. Without it, silent loss is possible.
See also
Round-trip with format preservation
When you parse a JSON, YAML, TOML, XML, or CSV file and emit it back without changes, panproto can guarantee emit(parse(bytes)) == bytes byte-for-byte. This requires the format-preserving codec, which uses tree-sitter grammars and a CST complement to capture whitespace, comments, and ordering.
This page covers the structured-data path (JSON / YAML / TOML / XML / CSV). For source-code grammars (Rust, Python, Stan, BUGS, and 257 more), the emit path is emit_pretty, which derives spacing and dispatch from grammar.json structurally and exercises a per-protocol verification tier. See Parse full ASTs and Source-code emission. The two systems are independent and serve different formats; choose one based on what you are parsing.
Prerequisites
A panproto build with the tree-sitter feature flag enabled on panproto-core (or directly on panproto-io). The CLI ships with tree-sitter enabled by default.
The task
The format-preserving round-trip is exposed via schema parse emit, which parses a file and emits it back in one step:
schema parse emit config.yaml > config.roundtripped.yaml
diff config.yaml config.roundtripped.yaml
The diff is empty when the codec preserves the input. (For programmatic use, parse and emit are exposed separately by the SDK; see below.)
In Rust:
use panproto_core::io::unified_codec::UnifiedCodec;
use panproto_core::schema::{Schema, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let proto = panproto_core::protocols::atproto::protocol();
let schema: Schema = SchemaBuilder::new(&proto).vertex("root", "object", None)?.entry("root").build()?;
let bytes: &[u8] = b"";
let codec = UnifiedCodec::yaml("atproto")?;
let (instance, complement) = codec.parse_wtype_preserving(&schema, bytes)?;
let out = codec.emit_wtype_preserving(&schema, &instance, &complement)?;
assert_eq!(out, bytes);
Ok(()) }
The complement carries the CST data that the schema does not see. emit_wtype_preserving reconstructs the byte-for-byte original from (instance, complement). Constructors exist for each supported format: UnifiedCodec::json, xml, yaml, toml, csv, and tsv.
Verification
The byte equality is the verification. Property tests in CI check emit(parse(b)) == b against a corpus of real-world JSON, YAML, TOML, XML, and CSV files.
Common mistakes
- Modifying the instance without modifying the complement. If you edit a value in the instance, the complement still records the old whitespace around the old value; the round-trip will preserve the old layout around the new value.
- Mixing
format-preservingcodec output with non-preserving codec input. The two pipelines are separate; choose one consistently.
See also
- Reference: protocol catalogue.
- Convert data between formats.
- Parse full ASTs for tree-sitter parsing of source code.
Parse full ASTs
panproto can parse source code in 261 languages via tree-sitter and treat the full AST as a schema instance. The resulting instance can be queried, diffed, migrated, and version-controlled like any other schema.
Prerequisites
The schema CLI, or the Rust SDK with the full-parse feature enabled. For the Python SDK, the relevant grammar pack (the wheel ships eleven core languages; install panproto-grammars-functional, -web, -systems, etc. for more).
The task
Single file
schema parse file src/main.rs > main.ast.json
schema parse file <PATH> writes the AST as a JSON instance against an auto-derived GAT theory for the language to stdout. Redirect to a file as needed.
Whole project
schema parse project . > project.ast.json
Walks every recognised file in the project (default .), parses each with the appropriate grammar, and writes a single instance covering the whole project to stdout.
Round-trip a single file
schema parse emit src/main.rs
Parses then emits, useful for confirming a clean round-trip through the format-preserving codec.
From Rust
use panproto_core::parse::ParserRegistry;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let registry = ParserRegistry::new();
let schema = registry.parse_with_protocol(
"rust",
std::fs::read("src/main.rs")?.as_slice(),
"src/main.rs",
)?;
let _ = schema;
Ok(()) }
panproto_core::parse is the re-export of panproto-parse. ParserRegistry::new() populates with every grammar enabled at build time; for a specific file path, registry.parse_file(path, content) auto-detects the language by extension.
From Python
import panproto
reg = panproto.AstParserRegistry()
schema = reg.parse_with_protocol("python", source_bytes, "src/app.py")
Companion grammar packs install additional languages: pip install panproto-grammars-functional, -web, -systems, etc.
Read anonymous-token field values
A tree-sitter rule of the form field('<name>', choice('+', '-', '*', '/')) attaches a field name to an unnamed token alternative. The walker captures the matched token’s text as a field:<name> constraint on the parent vertex; Schema::field_text is the supported accessor:
schema = reg.parse_with_protocol("qvr", b"let y = log(x)", "demo.qvr")
let_call = next(v.id for v in schema.vertices if v.kind == "let_call")
schema.field_text(let_call, "func") # -> "log"
The Rust equivalent is Schema::field_text(vertex_id, name) -> Option<&str>. Named-node field children continue to surface as edges; this accessor is specifically for the anonymous-token field case.
Override a registered grammar at runtime
Grammar authors iterating on a grammar’s parser.c / grammar.json / node-types.json outside the panproto release cadence can swap in a freshly-compiled grammar mid-process. Compile the grammar via tree-sitter build, load the resulting shared library with ctypes, and pass the integer address of the tree_sitter_<name> symbol to override_grammar:
import ctypes
import panproto
lib = ctypes.CDLL("./build/qvr.dylib")
language_ptr = ctypes.cast(lib.tree_sitter_qvr, ctypes.c_void_p).value
reg = panproto.AstParserRegistry()
reg.override_grammar(
name="qvr",
extensions=["qvr"],
language_ptr=language_ptr,
node_types=open("./grammars/qvr/src/node-types.json", "rb").read(),
grammar_json=open("./grammars/qvr/src/grammar.json", "rb").read(),
)
schema = reg.parse_with_protocol("qvr", source_bytes, "demo.qvr") # uses the new grammar
If a parser is already registered under name, it is dropped first (along with any extension mappings). Cannot run while a ParseEmitLens produced by reg.lens(...) is alive: drop outstanding lens handles, or construct a fresh registry, first. The byte payloads are leaked into 'static storage on the Rust side (intended for dev-time work, not production).
Going the other way
The schema you get back from parse_with_protocol carries a complete layout fibre: byte spans, the whitespace between every pair of adjacent tokens, and discriminators recording which CHOICE alternative the parser took at each branch point. The emitter consumes those constraints to render bytes back. A schema you build by hand from SchemaBuilder carries none of them; emit_pretty_with_protocol falls back to a grammar walk driven by the structural acceptance predicate, a layered cassette system, and per-position interstitial scoring (see Source-code emission for the mechanics). The grammar walk produces structurally valid output for the verified set and best-effort output for the rest.
For generators that build a schema from scratch and want to render it to source bytes, see Decorate an abstract schema. The decorate operation takes an AbstractSchema (the hand-built half), attaches the layout fibre via a grammar walk, and returns a DecoratedSchema the emitter can render byte-for-byte.
Verifying the emitter for a protocol
Before relying on emit_pretty_with_protocol in a downstream pipeline, ask the registry which tier the protocol falls into:
use panproto_parse::{EmitVerificationStatus, ParserRegistry};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let reg = ParserRegistry::new();
match reg.emit_verification_status("python") {
EmitVerificationStatus::Verified => { /* round-trips its full corpus */ }
EmitVerificationStatus::Generic => { /* registered, but unverified */ }
EmitVerificationStatus::Unsupported => { /* not registered */ }
}
Ok(()) }
The 255 protocols currently in the Verified set are listed in crates/panproto-parse/src/registry.rs under VERIFIED_EMIT_PROTOCOLS. A protocol earns the tier by round-tripping its grammar author’s full test/corpus/ under the strict emit_corpus_audit oracle (emit(parse(emit(s))) == emit(s) plus vertex-kind and edge-shape multiset preservation on every entry), or by being pinned to a quivers transpile backend test.
Verification
Tree-sitter parsing is total: every byte sequence parses into some AST, with error nodes inserted around unparseable spans. The verified guarantee is a round-trip up to the vertex-kind and edge-shape multiset: emit(parse(bytes)) re-parses to the same abstract syntax tree, which check_parse_emit in panproto_parse::parse_emit_lens asserts. Interstitial preservation makes that emit reproduce the original whitespace byte-for-byte for most inputs, but some legitimately reformat (json re-indents arrays), so byte equality is not promised universally. schema parse emit <file> is the smoke test.
Common mistakes
- Treating the AST as the source of truth for non-syntactic information. Type information, name resolution, control flow are not modelled by the auto-derived theories.
- Assuming language coverage. The 261-language list is in
crates/panproto-grammars/. Languages not in the list have no parser.
See also
- Decorate an abstract schema for the put-direction of the parse / emit lens.
- Source-code emission for the emitter’s structural pipeline.
- Reference: protocol catalogue.
- Round-trip with format preservation.
- Layout enrichment for the categorical framing of the parse / decorate / emit pair.
Decorate an abstract schema
When you parse source code into a schema, the parse walker attaches layout data to every vertex: byte spans, the whitespace between adjacent tokens, and a discriminator recording which CHOICE alternative tree-sitter took. The emitter consumes those constraints to render source bytes back. A schema you build by hand from SchemaBuilder has none of them, so emit_pretty_with_protocol falls back to a grammar walk that may pick the wrong alternative or render nothing at all.
decorate fills the gap. Given an abstract schema (vertex kinds, child_of edges, leaf literal-value constraints) and a registered grammar, it attaches the full layout fibre and returns a DecoratedSchema the emitter can render.
Prerequisites
The Rust SDK with the full-parse feature, or the CLI. Python bindings are forthcoming.
The task
Build the abstract schema
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p)
.vertex("$0", "record", None)?
.vertex("$1", "object", None)?
.edge("$0", "$1", "record-schema", None)?
.vertex("$2", "string", None)?
.edge("$1", "$2", "prop", Some("title"))?
.constraint("$2", "literal-value", "hello")
.build_abstract()?;
Ok(()) }
build_abstract checks that no layout-fibre constraint was added during construction (no start-byte, no interstitial-N, no chose-alt-*) and returns an AbstractSchema. If a layout sort slipped in, you get SchemaError::LayoutConstraintsOnAbstractBuild; use build_decorated if a decorated schema was the intent.
Decorate
use panproto_core::parse::{LayoutPolicy, ParserRegistry};
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy::default();
let decorated = reg.decorate("lilypond", &abstract_schema, &policy)?;
let _ = decorated;
Ok(()) }
decorate runs emit_pretty_with_policy to render the abstract schema to canonical bytes under the policy, then re-parses those bytes. The re-parse attaches the complete layout fibre: every start-byte, every end-byte, every interstitial-N, plus the chose-alt-fingerprint and chose-alt-child-kinds discriminators that pin down which CHOICE alternative the parser took.
Render straight to bytes
If all you want is the rendered source, skip the re-parse:
use panproto_core::parse::{LayoutPolicy, ParserRegistry};
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy::default();
let bytes = reg.pretty_with_protocol("lilypond", &abstract_schema, &policy)?;
let _ = bytes;
Ok(()) }
pretty_with_protocol honours every field of the policy in the output: separator, newline, indent_width, line_break_after, and the indent open/close token sets. Two different policies render the same abstract schema to different bytes.
Customise the policy
use panproto_core::parse::LayoutPolicy;
use panproto_core::parse::ParserRegistry;
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy {
indent_width: 4,
separator: " ".into(),
newline: "\r\n".into(),
..LayoutPolicy::default()
};
let bytes = reg.pretty_with_protocol("lilypond", &abstract_schema, &policy)?;
let _ = bytes;
Ok(()) }
LayoutPolicy is an alias for the de-novo emitter’s FormatPolicy; the put direction of the parse / emit lens and the emitter use the same configuration type.
Strip layout back down
use panproto_core::parse::{LayoutPolicy, ParserRegistry};
use panproto_core::schema::{Protocol, SchemaBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let p: Protocol = panproto_core::protocols::atproto::protocol();
let abstract_schema = SchemaBuilder::new(&p).vertex("$0", "record", None)?.entry("$0").build_abstract()?;
let reg = ParserRegistry::new();
let policy = LayoutPolicy::default();
let decorated = reg.decorate("lilypond", &abstract_schema, &policy)?;
let stripped = decorated.forget_layout(); // -> AbstractSchema
let _ = stripped;
Ok(()) }
forget_layout is the schema-level forgetful U: it drops every constraint whose sort is in the layout fibre (per panproto_gat::is_layout_sort) and returns an AbstractSchema. The section law forget_layout(decorate(a)) ≅ a holds up to vertex-id renaming and kind/edge-multiset equivalence, which is the granularity panproto’s round-trip law machinery already uses.
Verification
The section-law smoke test in crates/panproto-parse/tests/decorate_section_law.rs walks a sample source through parse → forget_layout → decorate → forget_layout for every grammar with a parse fixture and asserts that both the vertex-kind multiset and the edge-shape multiset match before and after. Run it with:
cargo test -p panproto-parse --test decorate_section_law \
--features lang-json,lang-lilypond
The matching test for policy fidelity (pretty_with_protocol_honours_policy) renders the same abstract schema under two distinct policies and asserts the output bytes differ in exactly the way the policy prescribes (CRLF vs LF newline, two-space vs single-space separator, four-space vs zero indent).
Common mistakes
- Wrapping a parsed schema as an
AbstractSchemaand expectingdecorateto keep its vertex IDs. The parse walker invents fresh IDs; the section law holds up to multiset equivalence, not pointwise. If you need the parse-side IDs preserved, work with theDecoratedSchemadirectly. - Passing an
AbstractSchemabuilt against one protocol intodecoratefor a different protocol. The protocol-match guard rejects the call withParseError::SchemaConstruction; build the schema against the right protocol or look up the parser by the schema’s ownprotocol(). - Reaching for
DecoratedSchema::wrap_uncheckedon a hand-built schema and expectingemit_pretty_with_protocolto round-trip through byte-position arithmetic. The wrapping is a type-level assertion the constructor cannot verify; an empty layout fibre means the emitter falls back to a grammar walk, which is well-defined but uses the defaultFormatPolicy, not whatever policy you’d have passed todecorate. - Calling
decorateon a protocol that returnsEmitVerificationStatus::Generic.decoraterunsemit_prettyinternally, so its output inherits whatever fidelity the emitter has on that grammar. CheckParserRegistry::emit_verification_status(protocol)first; if the result isGeneric, the round-trip kind multiset still satisfies the section law, but byte-for-byte stability across re-emits is not guaranteed.
See also
- Source-code emission for what
emit_prettydoes internally during the decorate call. - Parse full ASTs for the get direction of the same lens.
- Round-trip with format preservation for the parallel codec story at the byte-position level.
- Reference: protocol catalogue for the registered grammars.
- Lenses and round-trip laws for the lens machinery the layout fibre rides on.
Build a custom protocol
A custom protocol is a new GAT plus a parser/emitter pair, registered with panproto so it can participate in the same diff/migrate/version-control workflow as the built-ins.
Prerequisites
Familiarity with Schemas as theories and Composing protocols by colimit. The Rust toolchain.
The task
Declare the theory
The fastest path is the theory DSL (panproto-theory-dsl):
{
name = "MyProto",
components = ["ThGraph", "ThConstraint", "ThNamed"],
extensions = {
sorts = ["Permission"],
ops = [
{ name = "perm_of", in = ["Edge"], out = "Permission" },
],
eqns = [],
},
}
components lists the building-block theories to compose by colimit. extensions adds sorts, operations, and equations on top of the colimit.
For finer control, declare the theory directly in Rust with the class! and inductive! macros from panproto-gat-macros.
From Python, the same DSL document loads via Theory.from_nickel(source), Theory.from_yaml(source), Theory.from_json(source), or Theory.from_path(path) (dispatches by extension). The loaders accept the theory, class, and inductive body variants; multi-body documents (morphism, composition, protocol, bundle) belong in panproto-theory-dsl::load_and_compile directly. For incremental authoring, panproto.TheoryBuilder mirrors class! in a chainable form. Round-trip the flat Theory shape via to_json / to_yaml paired with from_dict_json / from_dict_yaml.
Implement parser and emitter
Each protocol provides a Parser: Bytes -> Schema and an Emitter: Schema -> Bytes. Implement both in a new submodule of panproto-protocols. See existing modules (serialization::avro, data_schema::cddl, web_document::atproto) for canonical structure.
Register
Each protocol module exposes protocol() -> Protocol and register_theories(&mut HashMap<String, Theory, _>):
// crates/panproto-protocols/src/my_proto.rs
use std::collections::HashMap;
use panproto_gat::Theory;
use panproto_schema::Protocol;
use crate::theories;
pub fn protocol() -> Protocol {
Protocol {
name: "my_proto".into(),
schema_theory: "ThMyProtoSchema".into(),
instance_theory: "ThMyProtoInstance".into(),
..Protocol::default()
}
}
pub fn register_theories<S: ::std::hash::BuildHasher>(
registry: &mut HashMap<String, Theory, S>,
) {
theories::register_constrained_multigraph_wtype(
registry,
"ThMyProtoSchema",
"ThMyProtoInstance",
);
}
Pick the theories::register_* helper that matches your theory shape (register_constrained_multigraph_wtype, register_typed_graph_wtype, register_hypergraph_functor, …). Colimit failures during registration panic with a named intermediate step; that is a build-time bug to fix in the theory composition. To expose my_proto to the language SDKs and CLI, add it to the lookup table in crates/panproto-py/src/protocols.rs and the lookup_builtin_protocol table in crates/panproto-wasm/src/api/helpers.rs.
Verification
cargo test -p panproto-protocols my_proto
The standard property-test suite for protocols verifies parse/emit round-trip, schema validation against the theory, and migration existence between two scaffolded schemas.
Common mistakes
- Declaring extensions before the colimit components are correct. Extensions interact with the colimit structure; if a building-block step is wrong, the extension may not even reach registration.
- Skipping the parser/emitter. Without both, the protocol cannot participate in
schema convertorschema validateworkflows.
See also
Schema version control
panproto-vcs is git for schemas: an immutable DAG of schema, migration, lens, and data objects with branches, tags, merges, and a content-addressed store. The CLI verbs match git: init, add, commit, branch, merge, log, diff.
The four pages here cover the practical workflows.
| Page | Purpose |
|---|---|
| Init and commit | Start a repository, stage and commit changes. |
| Branch and merge | Diverge a feature branch, merge it back via pushout. |
| Version data alongside schemas | Carry data instances in commits and lift them through migrations automatically. |
| Bridge to git | Run panproto-vcs alongside git, or as a custom git remote. |
See also
- Schema version control semantics for the model.
- Pushouts and merge for the merge construction.
- Reference: CLI for the full subcommand list.
Init and commit
Prerequisites
The schema CLI installed. A directory containing schema files (or a fresh directory you are about to populate).
The task
cd my-schemas/
schema init # create .panproto/
schema add user.json
schema commit -m "initial user schema"
schema log # show history
init creates a .panproto/ directory analogous to .git/. add stages a schema; commit creates a content-addressed snapshot. The behaviour matches git for the parts that overlap.
To inspect the current state:
schema status
schema diff # diff working tree vs HEAD
schema show <commit-hash>
Verification
schema log --oneline
prints the commit history. Each commit shows its blake3 hash, schema hash, message, and author.
Common mistakes
- Forgetting to
schema addbeforeschema commit. Like git, the staging area is explicit; commits only include staged changes. - Editing inside
.panproto/objects/directly. The store is content-addressed; manual edits break the hash invariants.
See also
Branch and merge
Prerequisites
A panproto repository with at least one commit (Init and commit).
The task
schema branch feature/add-handle
schema checkout feature/add-handle
# edit the schema, add a `handle` field
schema add user.json
schema commit -m "add handle"
schema checkout main
schema merge feature/add-handle
merge runs the pushout construction over the schema graphs at HEAD, the merging branch’s HEAD, and their common ancestor. The result is the smallest schema containing both branches’ changes.
If the pushout’s universal property fails (because the two branches’ changes contradict on a shared substructure), merge raises UniversalFactorizationFailure and produces a conflict object instead of an unsafe result.
To resolve a conflict:
schema status # lists conflict objects
schema show <conflict> # inspect the contradiction
# edit the conflict descriptor
schema add <conflict>
schema commit -m "resolve conflict"
Verification
schema log --graph
shows the DAG with the merge commit visible. The merge commit has two parents (HEAD and the merged branch’s HEAD).
Common mistakes
- Editing schemas while on the wrong branch.
schema statusalways shows the current branch; check it before editing. - Treating conflict objects as text. They are structured descriptions of an irreconcilable categorical equation; resolution is by selecting one side or extending the schema, not by editing JSON in place.
See also
- Pushouts and merge for the merge construction.
- Schema version control semantics.
- Init and commit.
Version data alongside schemas
panproto-vcs commits can carry data instances. When a branch’s schema migrates, the data carried by its commits is automatically lifted forward by the migration’s lens. This makes data evolution part of the schema history rather than a parallel concern.
Prerequisites
A panproto repository with at least one schema and a corresponding data instance.
The task
Data is staged together with its schema via schema add --data:
schema add user.json --data records/
schema commit -m "v1 schema and seed data"
# Evolve the schema and re-stage with the same data directory.
schema add user-v2.json --data records/
schema commit -m "v2 schema"
# Sync the working data directory to the latest schema via the auto-derived chain.
schema data sync records/
schema add --data <DATA> stages a data directory alongside the schema. schema data sync lifts the on-disk data forward through any schema migrations between the recorded commit and the target ref (default HEAD). To preview without writing:
schema data migrate records/ --dry-run
schema data migrate runs the migration between two specific commits (default parent..HEAD); add --coverage to print statistics. Inspect data status:
schema status --data records/
schema data status records/
To extract historical data, check out the commit:
schema checkout <commit>
# read records/
Verification
schema data status records/
reports staleness relative to the current schema. A clean status means the data conforms.
schema data migrate --coverage records/ prints how much of the data was actually transformed by the lift, surfacing partial migrations.
Common mistakes
- Editing data inside the store directly. Like schemas, data objects are content-addressed.
- Skipping data when committing schema changes. If you commit a v2 schema without ever staging v1 data, there is nothing for the lens to lift; this is fine, but the v2 commit will have no data even though v1 might.
- History rewrites (rebase, amend) on a branch carrying data. The rewrite must lift the data through the new history; this is automatic, but re-run
schema data status records/afterwards to confirm the on-disk data is in sync.
See also
Bridge to git
panproto-vcs is independent of git, but most projects host their source in git. The bridge lets a panproto-vcs repository live alongside (or inside) a git one, and lets git remotes serve panproto histories.
Prerequisites
A git repository (the host project) and either a .panproto/ directory inside it or a sibling panproto repository.
The task
Sidecar mode
The simplest setup: keep .panproto/ next to .git/. Both are tracked by their own tools; commits to either are independent.
git add .panproto/
git commit -m "snapshot panproto state"
panproto-vcs’s content-addressed objects are deterministic, so storing them in git works (no merge conflicts inside .panproto/objects/).
Bidirectional translation
schema git import reads a git repository’s history and produces a corresponding panproto-vcs DAG; schema git export does the reverse:
schema git import path/to/git-repo HEAD
schema git export path/to/output-git-repo --repo .
The import range can be any git revspec (HEAD, main, HEAD~10..HEAD). The export takes the current panproto repository and writes a git mirror.
git-remote helper
The panproto-git-remote crate registers a custom git remote helper that exposes a panproto-vcs repository to git clients (so git clone panproto://path/to/repo works). The helper is separate from schema git; install the binary and configure git accordingly. See crates/panproto-git-remote.
Merge bridging
A git merge that touches .panproto/objects/ will not preserve panproto’s structural merge guarantees, since git merges the bytes. The supported pattern is to do schema-level work in panproto and then schema git export the result. There is no automatic git-merge-to-pushout translation in the CLI; the merge must originate in schema merge to get the universal-property-checked pushout.
Verification
schema status
git status
Both should be clean. A clean panproto status with a dirty git status is normal (you may have untracked files git knows about that panproto does not). The reverse is not.
Common mistakes
- Three-way text-merging
.panproto/objects/files. The store is content-addressed; the bytes are correct or wrong, never partially. Resolve viaschema rebaseor by re-running the merge inside panproto withschema merge, then re-export to git withschema git export. - Mixing the two modes. Choose sidecar or remote-bridge per project; mixing creates ambiguity about which DAG is the source of truth.
See also
Translate across protocols
Cross-protocol translation moves schemas between schema languages: JSON Schema to Protobuf, ATProto Lexicon to GraphQL SDL, and so on. Translation is a migration whose source and target theories belong to different protocols.
Prerequisites
Two protocols whose theories overlap on enough building-block theories that a colimit-mediated translation exists. The CLI or one of the SDKs.
The task
Cross-protocol translation runs through the colimit-of-theories construction in panproto-protocols. There is no single CLI subcommand that takes a source-protocol schema and emits a target-protocol schema directly; instead, the workflow is:
- Compose the theory. Use the theory DSL to declare a composition theory whose colimit covers both protocols, or rely on the built-in compositions for the protocol pair you want.
- Generate a lens between schemas in the composed theory.
schema lens generateproduces the chain; both schemas must be expressed against the composed theory. - Apply the lens to convert data.
The theory DSL exposes a small library of named building-block theories via builtin_resolver(): ThGraph, ThConstraint, ThMulti, ThWType, ThMeta, ThSimpleGraph, ThHypergraph, ThInterface, ThFunctor, ThFlat, ThGraphInstance. Whole-protocol theories (the per-protocol GATs that JSON Schema, Protobuf, ATProto, etc. compile to) are constructed in Rust inside panproto-protocols, not through the DSL.
The DSL composition shape is compose = { result, bases, steps }, with each step a ColimitStepSpec { left, right, shared_sorts, shared_ops? }. A toy composition over building blocks:
{
id = "dev.example.constrained-multigraph",
description = "Compose ThGraph, ThConstraint, and ThMulti by identifying their Vertex and Edge sorts",
compose = {
result = "ConstrainedMultigraph",
bases = ["ThGraph", "ThConstraint", "ThMulti"],
steps = [
{ left = "ThGraph", right = "ThConstraint", shared_sorts = ["Vertex", "Edge"] },
# Reference the prior step's intermediate result by its generated
# name (`step_<i>`); the final intermediate is renamed to `result`.
{ left = "step_0", right = "ThMulti", shared_sorts = ["Vertex", "Edge"] },
],
},
}
Compile it:
schema theory compile theories/constrained-multigraph.ncl
For cross-protocol translation between two existing built-in protocols (say, JSON Schema and Protobuf), the composition is in Rust inside crates/panproto-protocols. Once both protocols are registered, the actual CLI flow is:
# Generate a chain between two schemas, one in each protocol; both schemas
# must be expressible against the same registered protocol theory.
schema lens generate --protocol <protocol> \
schemas/user-a.json schemas/user-b.json \
--save lenses/a-to-b.json
# Apply.
schema lens apply --protocol <protocol> lenses/a-to-b.json data/user.json
For data conversion within a single protocol’s schema fleet (different schemas, same protocol), use schema data convert:
schema data convert --protocol atproto \
--from schemas/user-v1.json --to schemas/user-v2.json \
data/users/
Verification
schema lens verify --protocol <name> <data> <schema> checks the round-trip laws on the converted data. A clean run means the chain is loss-free for the given samples.
Common mistakes
- Reaching for a one-shot CLI conversion. The colimit composition step is essential for cross-protocol work; without it,
schema lens generatehas no shared theory to align the schemas against. - Translating between protocols with non-overlapping required structure. The lens auto-generation will be partial, and
applywill fail on records using the source-only structure. Distant protocols (say, FHIR to MongoDB) may require a hand-written chain on top of the auto-derived skeleton. - Assuming auto-derived translations preserve every constraint. Constraints expressible in one theory but not the other are dropped;
schema lens verifyflags this.
See also
Continuous integration
CI for panproto projects covers two questions: did this change validate, and did this change break compatibility?
| Page | Purpose |
|---|---|
| Breaking-change gate | Block PRs that introduce a breaking schema change unless explicitly acknowledged. |
| GitHub Actions | Drop-in workflows for schema validation and breaking-change detection. |
| Pre-commit hooks | Run schema validation before each local commit. |
See also
Breaking-change gate
A breaking-change gate is a CI step that fails the build when a PR introduces a schema change panproto classifies as breaking. It runs against the diff between the PR’s schema and the schema at main.
Prerequisites
A panproto repository under git. A CI system that can run shell commands.
The task
schema check enforces existence conditions; it does not classify. Use schema diff (which reports structural changes) plus schema lens generate (which fails when no chain exists) and combine into a gate:
# In your CI script.
git fetch origin main
git show origin/main:schemas/user.json > /tmp/user-base.json
# Existence + GAT-level type check.
schema check \
--src /tmp/user-base.json \
--tgt schemas/user.json \
--mapping migrations/user.json \
--typecheck
# Generate a chain; non-zero exit means the change is breaking
# (no auto-generated lens covers it).
schema lens generate --protocol atproto /tmp/user-base.json schemas/user.json --save /tmp/chain.json
Either step’s non-zero exit fails the build. To allow an explicit override, gate on a commit-message marker or a PR label:
if git log -1 --format=%B | grep -q '\[breaking-change-acknowledged\]'; then
schema check ... --typecheck || true
schema lens generate ... || true
else
schema check ... --typecheck
schema lens generate ...
fi
For richer classification, use the SDK: in Python, panproto.diff_and_classify(old, new, protocol) returns a CompatReport; in Rust, call panproto_check::diff(...) followed by panproto_check::classify(&diff, &protocol); in TypeScript, panproto.diffFull(old, new).classify(protocol) returns the same report. Each carries a classification (fully_compatible, backward_compatible, or breaking) along with the offending elements.
Verification
Open a PR that adds a backward-compatible field; the gate passes. Open a PR that drops a required field; the gate fails. Add [breaking-change-acknowledged] to the commit message; the gate passes with a warning.
Common mistakes
- Skipping
--typecheck. Without it, the gate misses GAT-level type errors that are also breaking changes. - Comparing against the wrong base. The gate must compare against the merge base of the PR and
main, not againstmain’s tip; otherwise force-pushes tomainmake every PR look breaking.
See also
- GitHub Actions for a drop-in workflow.
- Pre-commit hooks for catching breakage before push.
- Build a migration.
GitHub Actions
A drop-in workflow for schema validation and breaking-change detection on every pull request.
Prerequisites
A panproto repository on GitHub. The panproto-cli Homebrew tap, shell installer, or cargo install available on Linux runners.
The task
# .github/workflows/panproto.yml
name: panproto
on:
pull_request:
paths:
- 'schemas/**'
- 'panproto.toml'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # need history for the diff base
- name: Install schema
run: |
curl --proto '=https' -LsSf \
https://github.com/panproto/panproto/releases/latest/download/panproto-cli-installer.sh | sh
echo "$HOME/.cargo/bin" >> "$GITHUB_PATH"
- name: Validate
run: |
for f in schemas/*.json; do
schema validate --protocol atproto "$f"
done
- name: Breaking-change gate
run: |
base=$(git merge-base origin/${{ github.base_ref }} HEAD)
git show $base:schemas/user.json > /tmp/base.json
schema check --src /tmp/base.json --tgt schemas/user.json \
--mapping migrations/user.json --typecheck
schema lens generate --protocol atproto /tmp/base.json schemas/user.json \
--save /tmp/chain.json
The --protocol flag is required for every per-file schema validate. There is no --project flag; per-file iteration is the supported pattern.
Two jobs: validation and the breaking-change gate. Validation fails on a malformed schema; the gate fails on a breaking diff.
Verification
Push a PR and watch the workflow run. The validation step exits zero on a clean schema; the gate prints the classification and exits non-zero on breaking.
Common mistakes
- Omitting
fetch-depth: 0. Shallow clones make the merge-base lookup fail; the gate then runs against the wrong base. - Pinning
panproto-clito a stale version. Use the latest installer; the protocol catalogue and existence-check rules evolve.
See also
- Breaking-change gate for the underlying mechanic.
- Pre-commit hooks for the local-side equivalent.
Pre-commit hooks
A pre-commit hook runs schema validation before each git commit, so you catch malformed schemas before they enter the history. Optionally, it can also warn about breaking changes.
Prerequisites
The schema CLI installed. A git repository.
The task
Plain git hook
# .git/hooks/pre-commit
#!/usr/bin/env bash
set -euo pipefail
changed=$(git diff --cached --name-only --diff-filter=ACM | grep -E '^schemas/.*\.json$' || true)
[ -z "$changed" ] && exit 0
for f in $changed; do
schema validate --protocol atproto "$f"
done
# Optional breaking-change warning: try to auto-generate a chain
# between the staged version and the upstream copy. Failure suggests
# the change is breaking.
for f in $changed; do
base_blob=$(git show :@{u}:"$f" 2>/dev/null) || continue
echo "$base_blob" > /tmp/base.json
if ! schema lens generate --protocol atproto /tmp/base.json "$f" --save /tmp/chain.json 2>/dev/null; then
echo "warning: $f introduces a breaking change (commit anyway? Ctrl-C to abort)"
fi
done
chmod +x .git/hooks/pre-commit installs it.
With pre-commit framework
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: schema-validate
name: panproto schema validate
entry: schema validate --protocol atproto
language: system
files: '^schemas/.*\.json$'
The hook receives the staged file paths as positional arguments; the --protocol flag is required.
pre-commit install activates it.
Verification
Stage a malformed schema and try to commit; the hook rejects. Fix the schema and commit again; the hook passes.
Common mistakes
- Hook stalls on missing
schemabinary. Wrap the invocation in acommand -v schema || exit 0to fall back gracefully on machines without the CLI. - Validating every file on every commit. The script above scopes to staged
schemas/*.jsononly; broader scopes are noisy.
See also
- Breaking-change gate for the CI-side equivalent.
- Reference: CLI.
Reference
Reference pages are authoritative lookups: signatures, flags, tables, and grammars. They contain no exposition. If you want to understand the model behind these signatures, follow the per-page link to the corresponding explanation chapter.
| Page | What it lists |
|---|---|
| CLI | Every schema subcommand, with usage, options, and examples. Generated from the clap derive tree; never stale. |
| Rust SDK | Crate selection, feature flags, public re-exports, and the canonical entry points. Links into docs.rs for full type signatures. |
| TypeScript SDK | Package layout, initialization, the high-level facade, and the handle-based API. Links into TypeDoc. |
| Python SDK | PyO3 native module surface, top-level re-exports, and the companion grammar packs. Links into the mkdocs reference site. |
| Protocol catalogue | Every protocol panproto recognises, with its theory composition and supported operations. |
| Expression language | Surface grammar, builtins by category, and the type signatures of each builtin. |
| Lens combinators | The combinator algebra exposed by panproto-lens, organised by optic kind. |
| Configuration | The panproto.toml manifest schema. |
| Crate map | The 38 panproto-* crates in the workspace, with one-line descriptions and dependency direction. |
CLI reference
Every schema subcommand, with its full --help text. This page is regenerated
from the live binary by xtask/src/bin/gen-cli-docs.rs; edit the CLI, not the
page.
To regenerate locally:
cargo run -p xtask --bin gen-cli-docs
CI runs the same command and fails if the result differs from what is checked in.
For the model that the commands operate on, see Schemas as theories, Migrations as morphisms, and Schema version control semantics.
schema
Schematic version control: schema migration toolkit based on generalized algebraic theories
Usage: schema [OPTIONS] <COMMAND>
Commands:
validate Validate a schema against a protocol
check Check existence conditions for a migration between two schemas
scaffold Generate minimal test data from a protocol theory using free model construction
normalize Simplify a schema by merging equivalent elements
typecheck Type-check a migration between two schemas at the GAT level
verify Verify that a schema satisfies its protocol theory's equations
init Initialize a new panproto repository
add Stage a schema for the next commit
commit Create a new commit from staged changes
status Show repository status
log Show commit history
diff Diff two schemas or show staged changes
show Inspect a commit, schema, or migration object
branch Create, list, or delete branches
tag Create, list, or delete tags
checkout Switch to a branch or commit
merge Merge a branch into the current branch
rebase Replay current branch onto another
cherry-pick Apply a single commit's migration to the current branch
reset Move HEAD / unstage / restore
stash Save or restore working state
reflog Show ref mutation history
bisect Binary search for the commit that introduced a breaking change
blame Show which commit introduced a schema element
lift Apply a migration to a record, transforming it from source to target schema
integrate Integrate two schemas by computing their pushout
auto-migrate Automatically discover a migration between two schemas
gc Garbage collect unreachable objects
expr Evaluate, type-check, or interactively explore GAT expressions
enrich Add, list, or remove schema enrichments (defaults, coercions, mergers, policies)
remote Add, list, or remove remote repositories
push Push schemas to a remote repository
pull Pull schemas from a remote repository
fetch Fetch schemas from a remote repository
clone Clone a remote repository
data Data operations: migrate, convert, sync, and status
theory Theory DSL operations: define theories, morphisms, and protocols from data files
lens Bidirectional lens operations
parse Parse source files into full-AST schemas via tree-sitter
git Import/export between git repositories and panproto-vcs
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
-V, --version Print version
schema validate
Validate a schema against a protocol
Usage: schema validate [OPTIONS] --protocol <PROTOCOL> <SCHEMA>
Arguments:
<SCHEMA> Path to the schema JSON file
Options:
--protocol <PROTOCOL> The protocol name (e.g., "atproto")
-v, --verbose Enable verbose output
-h, --help Print help
schema check
Check existence conditions for a migration between two schemas
Usage: schema check [OPTIONS] --src <SRC> --tgt <TGT> --mapping <MAPPING>
Options:
--src <SRC> Path to the source schema JSON file
-v, --verbose Enable verbose output
--tgt <TGT> Path to the target schema JSON file
--mapping <MAPPING> Path to the migration mapping JSON file
--typecheck Also type-check the migration morphism at the GAT level
-h, --help Print help
schema scaffold
Generate minimal test data from a protocol theory using free model construction
Usage: schema scaffold [OPTIONS] --protocol <PROTOCOL> <SCHEMA>
Arguments:
<SCHEMA> Path to the schema JSON file
Options:
--protocol <PROTOCOL> The protocol name (e.g., "atproto")
-v, --verbose Enable verbose output
--depth <DEPTH> Maximum term generation depth (default: 3) [default: 3]
--max-terms <MAX_TERMS> Maximum terms per sort (default: 1000) [default: 1000]
--json Output as JSON
-h, --help Print help
schema normalize
Simplify a schema by merging equivalent elements
Usage: schema normalize [OPTIONS] --protocol <PROTOCOL> <SCHEMA>
Arguments:
<SCHEMA> Path to the schema JSON file
Options:
--protocol <PROTOCOL> The protocol name (e.g., "atproto")
-v, --verbose Enable verbose output
--identify <IDENTIFICATIONS> Pairs of elements to identify, as "A=B"
--json Output as JSON
-h, --help Print help
schema typecheck
Type-check a migration between two schemas at the GAT level
Usage: schema typecheck [OPTIONS] --src <SRC> --tgt <TGT> --migration <MIGRATION>
Options:
--src <SRC> Path to the source schema JSON file
-v, --verbose Enable verbose output
--tgt <TGT> Path to the target schema JSON file
--migration <MIGRATION> Path to the migration mapping JSON file
-h, --help Print help
schema verify
Verify that a schema satisfies its protocol theory's equations
Usage: schema verify [OPTIONS] --protocol <PROTOCOL> <SCHEMA>
Arguments:
<SCHEMA> Path to the schema JSON file
Options:
--protocol <PROTOCOL>
The protocol name (e.g., "atproto")
-v, --verbose
Enable verbose output
--max-assignments <MAX_ASSIGNMENTS>
Maximum assignments to check per equation (default: 10000) [default: 10000]
-h, --help
Print help
schema init
Initialize a new panproto repository
Usage: schema init [OPTIONS] [PATH]
Arguments:
[PATH] Directory to initialize (defaults to current dir) [default: .]
Options:
-b, --initial-branch <INITIAL_BRANCH> Use the given name for the initial branch
-v, --verbose Enable verbose output
-h, --help Print help
schema add
Stage a schema for the next commit
Usage: schema add [OPTIONS] <SCHEMA>
Arguments:
<SCHEMA> Path to the schema JSON file
Options:
-n, --dry-run Show what would be staged without actually staging
-v, --verbose Enable verbose output
-f, --force Force staging even if validation fails
--data <DATA> Stage data files alongside the schema
-h, --help Print help
schema commit
Create a new commit from staged changes
Usage: schema commit [OPTIONS] --message <MESSAGE>
Options:
-m, --message <MESSAGE> Commit message
-v, --verbose Enable verbose output
--author <AUTHOR> Author name [default: anonymous]
--amend Amend the previous commit instead of creating a new one
--allow-empty Allow creating a commit with no changes
--skip-verify Skip GAT equation verification
-h, --help Print help
schema status
Show repository status
Usage: schema status [OPTIONS]
Options:
-s, --short Show output in short format
-v, --verbose Enable verbose output
--porcelain Show output in machine-readable format
-b, --branch Show branch information
--data <DATA> Show data staleness for files in this directory
-h, --help Print help
schema log
Show commit history
Usage: schema log [OPTIONS]
Options:
-n, --limit <LIMIT> Maximum number of commits to show
-v, --verbose Enable verbose output
--oneline Show each commit on a single line
--graph Show an ASCII graph of the branch structure
--all Show all branches, not just the current one
--format <FORMAT> Pretty-print commits using a format string
--author <AUTHOR> Filter commits by author
--grep <GREP> Filter commits whose message matches a pattern
--data Show data and complement IDs in commit history
-h, --help Print help
schema diff
Diff two schemas or show staged changes
Usage: schema diff [OPTIONS] [OLD] [NEW]
Arguments:
[OLD] Path to the old schema (or first ref)
[NEW] Path to the new schema (or second ref)
Options:
--stat Show a diffstat summary
-v, --verbose Enable verbose output
--name-only Show only names of changed elements
--name-status Show names and status (A/D/M) of changed elements
--staged Diff the staged schema against HEAD
--detect-renames Detect likely renames between schemas
--theory Show theory-level diff (sorts, operations, equations)
--lens Also generate a protolens chain between the schemas
--save <SAVE> Save the protolens chain to a file (requires --lens)
--optic-kind Show the optic classification of the diff
-h, --help Print help
schema show
Inspect a commit, schema, or migration object
Usage: schema show [OPTIONS] <TARGET>
Arguments:
<TARGET> Ref name or object ID
Options:
--format <FORMAT> Pretty-print using a format string
-v, --verbose Enable verbose output
--stat Show a diffstat summary for commits
-h, --help Print help
schema branch
Create, list, or delete branches
Usage: schema branch [OPTIONS] [NAME]
Arguments:
[NAME] Branch name to create. Lists branches if omitted
Options:
-d, --delete Delete the branch
-D Force-delete the branch even if not fully merged
-f, --force Force overwrite if branch already exists
-m, --move <RENAME> Rename a branch (value is the new name)
-v, --verbose Show commit info for each branch
-a, --all List both local and remote-tracking branches
-h, --help Print help
schema tag
Create, list, or delete tags
Usage: schema tag [OPTIONS] [NAME]
Arguments:
[NAME] Tag name to create. Lists tags if omitted
Options:
-d, --delete Delete the tag
-v, --verbose Enable verbose output
-a, --annotate Create an annotated tag
-m, --message <MESSAGE> Tag message (implies --annotate)
-l, --list List tags matching a pattern
-f, --force Force-replace an existing tag
-h, --help Print help
schema checkout
Switch to a branch or commit
Usage: schema checkout [OPTIONS] <TARGET>
Arguments:
<TARGET> Branch name or commit ID
Options:
-b Create a new branch with the given name at HEAD and switch to it
-v, --verbose Enable verbose output
--detach Detach HEAD at the target commit
--migrate <MIGRATE> Migrate data in this directory to match the target branch's schema
-h, --help Print help
schema merge
Merge a branch into the current branch
Usage: schema merge [OPTIONS] [BRANCH]
Arguments:
[BRANCH] Branch to merge
Options:
--author <AUTHOR> Author name [default: anonymous]
--no-commit Perform the merge but do not commit
--ff-only Refuse to merge unless fast-forward is possible
--no-ff Create a merge commit even for fast-forward merges
--squash Squash the branch into a single change set
--abort Abort an in-progress merge
-m, --message <MESSAGE> Custom merge commit message
-v, --verbose Show pullback-based overlap detection details
--migrate <MIGRATE> Migrate data in this directory through the merge
-h, --help Print help
schema rebase
Replay current branch onto another
Usage: schema rebase [OPTIONS] [ONTO]
Arguments:
[ONTO] Branch or commit to rebase onto
Options:
--author <AUTHOR> Author name [default: anonymous]
-v, --verbose Enable verbose output
--abort Abort the current rebase operation
--cont Continue a paused rebase after resolving conflicts
-h, --help Print help
schema cherry-pick
Apply a single commit's migration to the current branch
Usage: schema cherry-pick [OPTIONS] [COMMIT]
Arguments:
[COMMIT] Commit ID to cherry-pick
Options:
--author <AUTHOR> Author name [default: anonymous]
-v, --verbose Enable verbose output
-n, --no-commit Apply the change without committing
-x Append "(cherry picked from commit ...)" to the message
--abort Abort the current cherry-pick operation
-h, --help Print help
schema reset
Move HEAD / unstage / restore
Usage: schema reset [OPTIONS] <TARGET>
Arguments:
<TARGET> Target ref or commit ID
Options:
--soft Soft reset: move HEAD only, keep staged and working changes
-v, --verbose Enable verbose output
--hard Hard reset: move HEAD, discard all changes
--author <AUTHOR> Author name [default: anonymous]
-h, --help Print help
schema stash
Save or restore working state
Usage: schema stash [OPTIONS] <COMMAND>
Commands:
push Save the current staged schema
pop Restore the most recent stash
list List all stash entries
drop Drop the most recent stash
apply Apply a stash entry without removing it
show Show the contents of a stash entry
clear Remove all stash entries
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema stash push
Save the current staged schema
Usage: schema stash push [OPTIONS]
Options:
-m, --message <MESSAGE> Optional stash message
-v, --verbose Enable verbose output
--author <AUTHOR> Author name [default: anonymous]
-h, --help Print help
schema stash pop
Restore the most recent stash
Usage: schema stash pop [OPTIONS]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema stash list
List all stash entries
Usage: schema stash list [OPTIONS]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema stash drop
Drop the most recent stash
Usage: schema stash drop [OPTIONS]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema stash apply
Apply a stash entry without removing it
Usage: schema stash apply [OPTIONS] [INDEX]
Arguments:
[INDEX] Stash index to apply [default: 0]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema stash show
Show the contents of a stash entry
Usage: schema stash show [OPTIONS] [INDEX]
Arguments:
[INDEX] Stash index to inspect [default: 0]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema stash clear
Remove all stash entries
Usage: schema stash clear [OPTIONS]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema reflog
Show ref mutation history
Usage: schema reflog [OPTIONS] [REF_NAME]
Arguments:
[REF_NAME] Ref name (defaults to HEAD) [default: HEAD]
Options:
-n, --limit <LIMIT> Maximum entries to show
-v, --verbose Enable verbose output
--all Show reflogs for all refs
-h, --help Print help
schema bisect
Binary search for the commit that introduced a breaking change
Usage: schema bisect [OPTIONS] <GOOD> <BAD>
Arguments:
<GOOD> Known good commit
<BAD> Known bad commit
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema blame
Show which commit introduced a schema element
Usage: schema blame [OPTIONS] --element-type <ELEMENT_TYPE> <ELEMENT_ID>
Arguments:
<ELEMENT_ID> Element identifier (vertex ID, edge `"src->tgt"`, or `"vertex_id:sort"`)
Options:
--element-type <ELEMENT_TYPE> Element type: vertex, edge, or constraint
-v, --verbose Enable verbose output
--reverse Walk history from the first commit forward
-h, --help Print help
schema lift
Apply a migration to a record, transforming it from source to target schema
Usage: schema lift [OPTIONS] --migration <MIGRATION> --src-schema <SRC_SCHEMA> --tgt-schema <TGT_SCHEMA> <RECORD>
Arguments:
<RECORD> Path to the record JSON file
Options:
--migration <MIGRATION> Path to the migration mapping JSON file
-v, --verbose Enable verbose output
--src-schema <SRC_SCHEMA> Path to the source schema JSON file
--tgt-schema <TGT_SCHEMA> Path to the target schema JSON file
--direction <DIRECTION> Migration direction: restrict (default, `Delta_F`), sigma (`Sigma_F`), or pi (`Pi_F`) [default: restrict]
--instance-type <INSTANCE_TYPE> Instance type: wtype (default) or functor [default: wtype]
-h, --help Print help
schema integrate
Integrate two schemas by computing their pushout
Usage: schema integrate [OPTIONS] <LEFT> <RIGHT>
Arguments:
<LEFT> Path to the left schema JSON file
<RIGHT> Path to the right schema JSON file
Options:
--auto-overlap Automatically discover the overlap between schemas
-v, --verbose Enable verbose output
--json Output the integrated schema as JSON
-h, --help Print help
schema auto-migrate
Automatically discover a migration between two schemas
Usage: schema auto-migrate [OPTIONS] <OLD> <NEW>
Arguments:
<OLD> Path to the old/source schema JSON file
<NEW> Path to the new/target schema JSON file
Options:
--monic Require injective (one-to-one) vertex mapping
-v, --verbose Enable verbose output
--json Output the migration as JSON
-h, --help Print help
schema gc
Garbage collect unreachable objects
Usage: schema gc [OPTIONS]
Options:
--dry-run Show what would be deleted without actually deleting
-v, --verbose Enable verbose output
-h, --help Print help
schema expr
Evaluate, type-check, or interactively explore GAT expressions
Usage: schema expr [OPTIONS] <COMMAND>
Commands:
gat-eval Evaluate a JSON-encoded GAT term from a file
gat-check Type-check a JSON-encoded GAT term from a file
repl Interactive expression REPL
parse Parse a Haskell-style expression and print its AST
eval Parse and evaluate a Haskell-style expression, printing the result
fmt Parse an expression and pretty-print it back in canonical form
check Parse an expression and report any syntax errors
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema expr gat-eval
Evaluate a JSON-encoded GAT term from a file
Usage: schema expr gat-eval [OPTIONS] <FILE>
Arguments:
<FILE> Path to the JSON file containing a GAT term
Options:
--env <ENV> Path to a JSON file with variable bindings
-v, --verbose Enable verbose output
-h, --help Print help
schema expr gat-check
Type-check a JSON-encoded GAT term from a file
Usage: schema expr gat-check [OPTIONS] <FILE>
Arguments:
<FILE> Path to the JSON file containing term, theory, and context
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema expr repl
Interactive expression REPL
Usage: schema expr repl [OPTIONS]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema expr parse
Parse a Haskell-style expression and print its AST
Usage: schema expr parse [OPTIONS] <SOURCE>
Arguments:
<SOURCE> Expression source text
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema expr eval
Parse and evaluate a Haskell-style expression, printing the result
Usage: schema expr eval [OPTIONS] <SOURCE>
Arguments:
<SOURCE> Expression source text
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema expr fmt
Parse an expression and pretty-print it back in canonical form
Usage: schema expr fmt [OPTIONS] <SOURCE>
Arguments:
<SOURCE> Expression source text
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema expr check
Parse an expression and report any syntax errors
Usage: schema expr check [OPTIONS] <SOURCE>
Arguments:
<SOURCE> Expression source text
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema enrich
Add, list, or remove schema enrichments (defaults, coercions, mergers, policies)
Usage: schema enrich [OPTIONS] <COMMAND>
Commands:
add-default Add a default value expression to a vertex
add-coercion Add a coercion expression between two vertex kinds
add-merger Add a merger expression to a vertex
add-policy Add a conflict policy to a vertex
list List all enrichments on the HEAD schema
remove Remove an enrichment by name
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema enrich add-default
Add a default value expression to a vertex
Usage: schema enrich add-default [OPTIONS] --expr <EXPR> <VERTEX>
Arguments:
<VERTEX> Vertex name
Options:
--expr <EXPR> Default value as JSON
-v, --verbose Enable verbose output
-h, --help Print help
schema enrich add-coercion
Add a coercion expression between two vertex kinds
Usage: schema enrich add-coercion [OPTIONS] --expr <EXPR> <FROM> <TO>
Arguments:
<FROM> Source vertex kind
<TO> Target vertex kind
Options:
--expr <EXPR> Coercion expression as JSON
-v, --verbose Enable verbose output
-h, --help Print help
schema enrich add-merger
Add a merger expression to a vertex
Usage: schema enrich add-merger [OPTIONS] --expr <EXPR> <VERTEX>
Arguments:
<VERTEX> Vertex name
Options:
--expr <EXPR> Merger specification as JSON
-v, --verbose Enable verbose output
-h, --help Print help
schema enrich add-policy
Add a conflict policy to a vertex
Usage: schema enrich add-policy [OPTIONS] --strategy <STRATEGY> <VERTEX>
Arguments:
<VERTEX> Vertex name
Options:
--strategy <STRATEGY> Conflict resolution strategy name
-v, --verbose Enable verbose output
-h, --help Print help
schema enrich list
List all enrichments on the HEAD schema
Usage: schema enrich list [OPTIONS]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema enrich remove
Remove an enrichment by name
Usage: schema enrich remove [OPTIONS] <NAME>
Arguments:
<NAME> Enrichment name or vertex name to remove enrichments from
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema remote
Add, list, or remove remote repositories
Usage: schema remote [OPTIONS] <COMMAND>
Commands:
add Register a new remote
remove Remove a remote
list List configured remotes
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema remote add
Register a new remote
Usage: schema remote add [OPTIONS] <NAME> <URL>
Arguments:
<NAME> Remote name
<URL> Remote URL
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema remote remove
Remove a remote
Usage: schema remote remove [OPTIONS] <NAME>
Arguments:
<NAME> Remote name to remove
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema remote list
List configured remotes
Usage: schema remote list [OPTIONS]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema push
Push schemas to a remote repository
Usage: schema push [OPTIONS] [REMOTE] [BRANCH]
Arguments:
[REMOTE] Remote name
[BRANCH] Branch to push
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema pull
Pull schemas from a remote repository
Usage: schema pull [OPTIONS] [REMOTE] [BRANCH]
Arguments:
[REMOTE] Remote name
[BRANCH] Branch to pull
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema fetch
Fetch schemas from a remote repository
Usage: schema fetch [OPTIONS] [REMOTE]
Arguments:
[REMOTE] Remote name
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema clone
Clone a remote repository
Usage: schema clone [OPTIONS] <URL> [PATH]
Arguments:
<URL> Repository URL
[PATH] Local path
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema data
Data operations: migrate, convert, sync, and status
Usage: schema data [OPTIONS] <COMMAND>
Commands:
migrate Migrate data to match the current schema version
convert Convert data between schemas
sync Sync data to match a target schema version via VCS
status Report data staleness relative to the current schema version
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema data migrate
Migrate data to match the current schema version
Usage: schema data migrate [OPTIONS] <DATA>
Arguments:
<DATA> Data directory containing JSON files
Options:
--protocol <PROTOCOL> Protocol name (inferred from HEAD commit if omitted)
-v, --verbose Enable verbose output
--range <RANGE> Migrate between specific commits (default: parent..HEAD)
--dry-run Preview without modifying files
-o, --output <OUTPUT> Output directory (default: overwrite in place)
--backward Migrate backward (requires stored complement)
--coverage Apply migration and print coverage statistics
-h, --help Print help
schema data convert
Convert data between schemas
Usage: schema data convert [OPTIONS] --protocol <PROTOCOL> <DATA>
Arguments:
<DATA> Data file or directory of JSON files
Options:
--from <FROM> Source schema
-v, --verbose Enable verbose output
--to <TO> Target schema
--protocol <PROTOCOL> Protocol name
--chain <CHAIN> Pre-built protolens chain JSON (alternative to --from/--to)
-o, --output <OUTPUT> Output file or directory
--direction <DIRECTION> Direction: "forward" or "backward" [default: forward]
--defaults <DEFAULTS> Default values as key=value pairs
-h, --help Print help
schema data sync
Sync data to match a target schema version via VCS
Usage: schema data sync [OPTIONS] <DATA_DIR>
Arguments:
<DATA_DIR> Data directory
Options:
--edits Store an edit log object in the VCS
-v, --verbose Enable verbose output
--target <TARGET> Target ref (default: HEAD)
-h, --help Print help
schema data status
Report data staleness relative to the current schema version
Usage: schema data status [OPTIONS] <DATA_DIR>
Arguments:
<DATA_DIR> Data directory
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema theory
Theory DSL operations: define theories, morphisms, and protocols from data files
Usage: schema theory [OPTIONS] <COMMAND>
Commands:
validate Validate a theory document (load + typecheck)
compile Compile a theory document and print results
compile-dir Compile all theory documents in a directory
check-morphism Validate a morphism document
recompose Replay a composition and print the resulting theory
check-coercion-laws Run sample-based coercion law checks over every directed equation in a theory document. Exits non-zero when any declared coercion class is falsified by a sample
repl Interactive theory REPL with syntax highlighting and history
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema theory validate
Validate a theory document (load + typecheck)
Usage: schema theory validate [OPTIONS] <FILE>
Arguments:
<FILE> Path to the theory document file (.ncl, .json, .yaml)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema theory compile
Compile a theory document and print results
Usage: schema theory compile [OPTIONS] <FILE>
Arguments:
<FILE> Path to the theory document file
Options:
--json Output as JSON
-v, --verbose Enable verbose output
-h, --help Print help
schema theory compile-dir
Compile all theory documents in a directory
Usage: schema theory compile-dir [OPTIONS] <DIR>
Arguments:
<DIR> Path to the directory
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema theory check-morphism
Validate a morphism document
Usage: schema theory check-morphism [OPTIONS] <FILE>
Arguments:
<FILE> Path to the morphism document file
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema theory recompose
Replay a composition and print the resulting theory
Usage: schema theory recompose [OPTIONS] <FILE>
Arguments:
<FILE> Path to the composition document file
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema theory check-coercion-laws
Run sample-based coercion law checks over every directed equation in a theory document. Exits non-zero when any declared coercion class is falsified by a sample
Usage: schema theory check-coercion-laws [OPTIONS] <FILE>
Arguments:
<FILE> Path to the theory document file
Options:
--json Output the full report as JSON
-v, --verbose Enable verbose output
--var-name <VAR_NAME> Name under which each sample is bound in the evaluation environment. Defaults to `"x"`; override when a theory's equations bind a different free variable so the checker does not surface "unbound variable" errors on every sample [default: x]
-h, --help Print help
schema theory repl
Interactive theory REPL with syntax highlighting and history
Usage: schema theory repl [OPTIONS]
Options:
--load <PATH> Theory documents to load on startup. Same shape accepted by `:load` inside the REPL
-v, --verbose Enable verbose output
-h, --help Print help
schema lens
Bidirectional lens operations
Usage: schema lens [OPTIONS] <COMMAND>
Commands:
generate Generate a lens between two schemas
apply Apply a saved lens chain to data
compose Compose two protolens chains or schemas
verify Verify lens laws on test data
inspect Inspect a saved protolens chain
check Check applicability of a chain against schemas in a directory
lift Lift a chain along a theory morphism
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema lens generate
Generate a lens between two schemas
Usage: schema lens generate [OPTIONS] --protocol <PROTOCOL> <OLD> <NEW>
Arguments:
<OLD>
Path to the old/source schema
<NEW>
Path to the new/target schema
Options:
--protocol <PROTOCOL>
Protocol name
-v, --verbose
Enable verbose output
--json
Output as JSON
--chain
Output a reusable protolens chain (JSON to stdout)
--try-overlap
Try overlap-based alignment when direct morphism fails
--save <SAVE>
Save the generated protolens chain to a file
--defaults <DEFAULTS>
Default values as key=value pairs
--fuse
Fuse multi-step chain into single protolens
--requirements
Show complement requirements (defaults/data needed)
--hints <HINTS>
Path to a JSON hints file for guided auto-lens generation
--stringency <TIER>
Stringency tier governing which alignment strategies run.
Accepted case-insensitively for parity with the Python and WASM bindings, both of which trim and lowercase their input.
`strict`: only kind-exact name equality. `balanced`: alias dictionary + tight token similarity (default). `lenient`: span-search and structural priors. `exploratory`: lossy retraction witnesses.
Possible values:
- strict: Kind-exact, edge-name-pruned CSP search; total morphism only
- balanced: Adds alias dictionary and tight token-similarity priors (default)
- lenient: Adds span-search and structural priors
- exploratory: Adds lossy retraction witnesses
--top-n <N>
Emit up to N ranked candidate lenses instead of the single best one. Output format switches to a JSON array when combined with `--json` or `--chain`
[default: 1]
--explain
Print per-step explanations (and confidences) for each emitted candidate
-h, --help
Print help (see a summary with '-h')
schema lens apply
Apply a saved lens chain to data
Usage: schema lens apply [OPTIONS] --protocol <PROTOCOL> <CHAIN> <DATA>
Arguments:
<CHAIN> Path to the protolens chain JSON
<DATA> Path to the data file
Options:
--protocol <PROTOCOL> Protocol name
-v, --verbose Enable verbose output
--direction <DIRECTION> Direction: "forward" or "backward" [default: forward]
--complement <COMPLEMENT> Complement data for backward apply
--schema <SCHEMA> Schema for chain instantiation
-h, --help Print help
schema lens compose
Compose two protolens chains or schemas
Usage: schema lens compose [OPTIONS] --protocol <PROTOCOL> <CHAIN1> <CHAIN2>
Arguments:
<CHAIN1> First chain or schema file
<CHAIN2> Second chain or schema file
Options:
--protocol <PROTOCOL> Protocol name
-v, --verbose Enable verbose output
--json Output as JSON
--chain Output in chain format
-h, --help Print help
schema lens verify
Verify lens laws on test data
Usage: schema lens verify [OPTIONS] --protocol <PROTOCOL> <DATA> [SCHEMA]
Arguments:
<DATA> Path to test data file
[SCHEMA] Schema file (second schema is optional)
Options:
--protocol <PROTOCOL> Protocol name
-v, --verbose Enable verbose output
-h, --help Print help
schema lens inspect
Inspect a saved protolens chain
Usage: schema lens inspect [OPTIONS] --protocol <PROTOCOL> <CHAIN>
Arguments:
<CHAIN> Path to the protolens chain JSON
Options:
--protocol <PROTOCOL> Protocol name
-v, --verbose Enable verbose output
-h, --help Print help
schema lens check
Check applicability of a chain against schemas in a directory
Usage: schema lens check [OPTIONS] --protocol <PROTOCOL> <CHAIN> <SCHEMAS_DIR>
Arguments:
<CHAIN> Path to the protolens chain JSON
<SCHEMAS_DIR> Directory containing schema JSON files
Options:
--protocol <PROTOCOL> Protocol name
-v, --verbose Enable verbose output
--dry-run Report only, do not instantiate
-h, --help Print help
schema lens lift
Lift a chain along a theory morphism
Usage: schema lens lift [OPTIONS] <CHAIN> <MORPHISM>
Arguments:
<CHAIN> Path to the protolens chain JSON
<MORPHISM> Path to the theory morphism JSON
Options:
--json Output as JSON
-v, --verbose Enable verbose output
-h, --help Print help
schema parse
Parse source files into full-AST schemas via tree-sitter
Usage: schema parse [OPTIONS] <COMMAND>
Commands:
file Parse a single source file into a full-AST schema
project Parse all files in a directory into a unified project schema
emit Parse a file and emit it back to source (round-trip test)
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema parse file
Parse a single source file into a full-AST schema
Usage: schema parse file [OPTIONS] <PATH>
Arguments:
<PATH> Path to the source file
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema parse project
Parse all files in a directory into a unified project schema
Usage: schema parse project [OPTIONS] [PATH]
Arguments:
[PATH] Path to the project directory [default: .]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema parse emit
Parse a file and emit it back to source (round-trip test)
Usage: schema parse emit [OPTIONS] <PATH>
Arguments:
<PATH> Path to the source file
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema git
Import/export between git repositories and panproto-vcs
Usage: schema git [OPTIONS] <COMMAND>
Commands:
import Import a git repository's history into panproto-vcs
export Export panproto-vcs history to a git repository
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema git import
Import a git repository's history into panproto-vcs
Usage: schema git import [OPTIONS] <REPO> [REVSPEC]
Arguments:
<REPO> Path to the git repository
[REVSPEC] Git revspec (e.g. "HEAD", "main", "HEAD~10..HEAD") [default: HEAD]
Options:
-v, --verbose Enable verbose output
-h, --help Print help
schema git export
Export panproto-vcs history to a git repository
Usage: schema git export [OPTIONS] <DEST>
Arguments:
<DEST> Destination path for the git repository
Options:
--repo <REPO> Path to the panproto repository (default: current directory) [default: .]
-v, --verbose Enable verbose output
-h, --help Print help
Rust SDK reference
The Rust surface of panproto is the panproto-core facade. Add it to your Cargo.toml:
[dependencies]
panproto-core = "0.49"
Full type signatures, constructors, and method documentation live on docs.rs:
Feature flags
| Feature | Effect |
|---|---|
full-parse | Pulls in panproto-parse and tree-sitter-based AST parsing. |
project | Pulls in panproto-project for multi-file project assembly. |
git | Pulls in panproto-git for the git bridge. |
llvm | Enables LLVM-backed lowering via panproto-llvm. |
jit | Enables JIT-compiled migration via panproto-jit. |
tree-sitter | Enables tree-sitter-based format-preserving parsing for built-in protocols (forwards to panproto-io/tree-sitter). |
The default feature set re-exports the always-on crates: panproto-gat, panproto-schema, panproto-inst, panproto-mig, panproto-lens, panproto-check, panproto-protocols, panproto-io, and panproto-vcs. The feature flags above pull in the optional crates on top.
Sub-crate lookup
For lower-level work, depend on individual crates rather than the facade. The crate map lists every workspace member with a one-line description and a link to its docs.rs page.
| Task | Crate |
|---|---|
| Define a GAT in Rust | panproto-gat, panproto-gat-macros |
| Validate a schema against a protocol | panproto-schema, panproto-protocols |
| Parse data and produce an instance | panproto-io, panproto-inst |
| Build and apply a migration | panproto-mig |
| Construct or compose lenses | panproto-lens |
| Write lenses in the lens DSL | panproto-lens-dsl |
| Use the expression language | panproto-expr, panproto-expr-parser |
| Version-control schemas and data | panproto-vcs, panproto-git |
| Parse full ASTs across 261 languages | panproto-parse |
| Decorate an abstract schema with a layout fibre | panproto-parse (ParserRegistry::decorate, ParserRegistry::pretty_with_protocol) |
| Distinguish abstract and decorated schemas at the type level | panproto-schema (AbstractSchema, DecoratedSchema, LayoutWitness) |
See also
- Install the Rust SDK for setup.
- Define a schema from Rust for the canonical entry point.
- Crate map for the complete workspace.
TypeScript SDK reference
The TypeScript SDK is published as @panproto/core. It wraps the WASM build of panproto with a fluent, handle-based API.
Installation
npm install @panproto/core
# or
pnpm add @panproto/core
# or
yarn add @panproto/core
Node 20+ is required. Browser builds are supported via the same package; a bundler with WASM support is needed.
Initialization
import { Panproto } from '@panproto/core';
const p = await Panproto.init();
Panproto.init() loads the WASM module, sets up the slab allocator behind the boundary, and returns the top-level handle. All subsequent calls go through it.
Surface
The SDK exposes (selected; see bindings/typescript/src/index.ts for the full list):
| Object / function | Purpose |
|---|---|
Panproto | Top-level handle. init(), protocol(name), migration(src, tgt), parseJson(schema, json), toJson(schema, instance), convert(data, { from, to, defaults? }), compose(m1, m2), composeLenses(l1, l2), checkExistence(src, tgt, builder), diffFull(old, new). |
Protocol | A loaded protocol. .name, .schema() returns a SchemaBuilder. |
SchemaBuilder | Fluent builder. .vertex(name, kind), .edge(src, tgt, kind, opts), .build() returns BuiltSchema. |
BuiltSchema | A built schema. .vertices, .edges, .protocol. |
Instance | A parsed data record. .toJson(), .validate(). |
MigrationBuilder | Builder. .map(srcVertex, tgtVertex), .mapEdge(srcEdge, tgtEdge), .resolve(...), .compile() returns CompiledMigration. |
CompiledMigration | A migration that is a lens. .lift(record) returns LiftResult { data, _rawBytes? }; .get(record) returns GetResult { view, complement }; .put(view, complement) returns LiftResult. |
LensHandle | A free-standing protolens chain. .get(bytes), .put(view, complement), .checkLaws(instance) returns LawCheckResult { holds, violation }; .checkGetPut, .checkPutGet for individual laws; .toJson(). |
FullDiffReport / CompatReport | Returned by Panproto.diffFull(old, new). Call .classify(protocol) on the diff to get a CompatReport with classification of fully-compatible / backward-compatible / breaking. |
executeQuery(query, instance, wasm) | Standalone query function. The query is InstanceQuery { anchor, predicate?, projection?, path?, groupBy?, limit? }; the predicate is an Expr object, not a source string. |
parseExpr, evalExpr, formatExpr, ExprBuilder | Expression-language entry points. |
IoRegistry | Multi-protocol parse/emit registry. |
Repository | panproto-vcs repository handle. |
DataSetHandle | Data-versioning handle. |
Full API documentation, including every method signature and parameter, lives in the TypeDoc output:
- TypeDoc reference for
@panproto/core(link to be wired up at publish time)
The package source is at bindings/typescript/.
Boundary characteristics
The SDK is a thin layer over panproto-wasm. Data crossing the boundary is encoded with MessagePack, and live values on the Rust side are referenced from JavaScript through opaque integer handles allocated from a slab. Object identity is preserved across calls, so storing handles in JavaScript data structures is safe.
See also
Python SDK reference
The Python SDK is published as panproto on PyPI. It uses native PyO3 bindings, not WASM.
Installation
pip install panproto
Python 3.13 or newer is required. The wheel ships with eleven core tree-sitter grammars (Python, JavaScript, TypeScript, Java, C#, C++, PHP, Bash, C, Go, Rust). Additional grammar packs are available as separately-installable companions:
| Pack | Languages |
|---|---|
panproto-grammars-functional | Haskell, OCaml, Elm, Gleam, Erlang, Elixir, PureScript, F#, Clojure, Scheme, Racket |
panproto-grammars-web | HTML, CSS, SCSS, Vue, Svelte, Astro, … |
panproto-grammars-systems | Zig, Nim, V, Crystal, … |
panproto-grammars-jvm | Scala, Kotlin, Groovy, … |
panproto-grammars-scripting | Lua, Perl, R, Julia, … |
panproto-grammars-data | SQL dialects, GraphQL, JSON variants, … |
panproto-grammars-devops | Terraform, Dockerfile, Helm, … |
panproto-grammars-mobile | Swift, Objective-C, Dart, … |
panproto-grammars-music | SuperCollider, LilyPond, ABC, Csound, ChucK, Glicol, Tidal, Strudel |
panproto-grammars-all | Umbrella package containing every pack above. |
Each pack auto-registers its grammars with panproto.AstParserRegistry() on import.
Top-level surface
import panproto
proto = panproto.get_builtin_protocol("atproto")
b = proto.schema()
b.vertex("post", "record", "app.bsky.feed.post")
b.vertex("post:body", "object")
b.edge("post", "post:body", "record-schema")
schema = b.build()
There is no Panproto umbrella class; the entry points are free functions on the panproto module. The full re-export list (errors, schema types, protocol registry, migration, check, instance, I/O, lens, GAT, expression-language, VCS, parse, project, git bridge) is in bindings/python/src/panproto/__init__.py. Selected entry points:
| Surface | Entry point |
|---|---|
| Protocol registry | get_builtin_protocol(name), list_builtin_protocols(), define_protocol(...) |
| Schema construction | Protocol.schema() returns a SchemaBuilder. Each .vertex(id, kind) / .edge(src, tgt, kind, name=None) / .constraint(vid, sort, value) mutates the builder in place and returns None; call .build() on the final builder. Chain syntax is a TypeScript-only convenience; Python is statement-by-statement. |
| Migration | MigrationBuilder, compile_migration, compose_migrations, invert_migration, pipeline |
| Check | diff_and_classify, diff_schemas, check_existence, check_coverage |
| Lens | Lens, ProtolensChain (with from_dsl_json / from_dsl_yaml / from_dsl_nickel / from_dsl_path loaders), auto_generate_lens, auto_generate_lens_candidates |
| GAT | Theory (with from_json / from_yaml / from_nickel / from_path DSL loaders and to_yaml / from_dict_yaml for the flat-shape round-trip), TheoryBuilder, Model, colimit_theories, free_model, migrate_model, check_model, check_morphism |
| Schema | Schema.constraints_for(vertex_id) lists every constraint; Schema.field_text(vertex_id, field_name) reads the text of a tree-sitter field('<name>', anonymous-token) child |
| Expression language | Expr, parse_expr, pretty_print_expr |
| VCS | Repository, VcsRepository, BisectState |
| Parse | parse_source_file, available_grammars, ParseEmitLens, AstParserRegistry() (with .override_grammar(name, extensions, language_ptr, node_types, ...) for dev-time grammar swapping) |
| Project | ProjectBuilder, parse_project, build_project |
Full API reference, including every method signature, lives at the dedicated mkdocs site:
- Python SDK reference (mkdocs)
The package source is at bindings/python/.
Type stubs
The package ships _native.pyi so that signatures are visible to type checkers (mypy, pyright). Stub signatures are kept in lockstep with the PyO3 runtime by CI.
See also
Protocol catalogue
A protocol in panproto is a schema language: Avro, CDDL, OpenAPI, ATProto Lexicons, Parquet, FHIR, a Kubernetes CRD. Each one is defined by a pair of GATs (a schema theory and an instance theory) composed by colimit from reusable building-block theories. Every protocol provides both a parser (native format → Schema) and an emitter (Schema → native format), so panproto can round-trip data through any pair.
For the model behind these registrations, see Schemas as theories and Composing protocols by colimit.
Categories
The built-in protocols are organised by category in panproto-protocols. Each category is a Rust submodule.
| Category | Module | Protocols |
|---|---|---|
| Serialization and IDLs | serialization | Avro, FlatBuffers, ASN.1, Bond, MessagePack Schema |
| Data schema | data_schema | CDDL, BSON |
| API specifications | api | OpenAPI, AsyncAPI, RAML, JSON:API |
| Database | database | MongoDB, Cassandra, DynamoDB, Neo4j, Redis |
| Web and document | web_document | ATProto Lexicons, DOCX, ODF |
| Data science | data_science | Parquet, Arrow, DataFrame schemas |
| Domain | domain | GeoJSON, FHIR, RSS/Atom, vCard/iCal, EDI X12, SWIFT MT |
| Configuration | config | Kubernetes CRDs, CloudFormation, Ansible |
| Linguistic annotation | annotation | AMR, bead, BRAT, Concrete, CoNLL-U, Decomp/UDS, ELAN, FoLiA, FOVEA, ISO-Space, LAF/GrAF, NAF, NIF, PAULA/Salt, TEI XML, TimeML, UCCA, UIMA/CAS, W3C Web Annotation |
| Raw file | raw_file | Non-code files (README, LICENSE, images) |
The authoritative list is in the panproto-protocols source tree. Each submodule’s register_* function documents the building-block theories it composes.
Registration shape
A protocol registration is a sequence of theory colimits applied in a determined order. For example, the constrained-multigraph-with-W-types theory used by MessagePack Schema (and several other protocols) is built as colimit(colimit(ThGraph, ThConstraint; Vertex), ThMulti; Vertex, Edge), with ThWType as the instance theory. If any colimit step fails, registration panics with a message naming the failing intermediate step. This is intentional: a registration failure is a build-time bug in the theory composition, not user input that can fail at runtime.
Source-of-truth
| Format | Source |
|---|---|
| Built-in protocol list | crates/panproto-protocols/src/lib.rs |
| Building-block theories | crates/panproto-protocols/src/theories.rs |
| Tree-sitter grammar list (261 languages) | crates/panproto-grammars/ |
Defining a new protocol
To add a custom protocol, see Build a custom protocol. The minimal recipe is: declare schema and instance GATs (Rust or via the theory DSL), register a parser and emitter, and add a registration call to the relevant submodule.
Source-code grammars and emit verification
In addition to the schema-language protocols catalogued above, panproto ships 261 tree-sitter grammars under crates/panproto-grammars/. Each grammar registers a tree-sitter Language plus its node-types.json AST signature; the resulting parser walks source code into a full-AST schema, and emit_pretty renders the schema back to bytes via the structural pipeline described in Source-code emission.
The emitter’s correctness varies by grammar; ParserRegistry::emit_verification_status reports which tier each protocol falls into:
| Tier | Meaning | Currently |
|---|---|---|
Verified | Every entry of the grammar author’s own test/corpus/ round-trips under the strict emit_corpus_audit oracle (byte fixed point plus vertex-kind and edge-shape multiset preservation), or the protocol is pinned by a quivers backend test | the 255 names in VERIFIED_EMIT_PROTOCOLS |
Generic | Registered grammar; emit uses the generic dispatch + universal cassette path; no test asserts correctness | the remaining vendored grammars not yet in the verified set |
Unsupported | No grammar.json vendored, or protocol not registered | grammars whose upstream did not ship grammar.json |
Downstream tooling (notably quivers’s transpile backends) should call this API upfront and refuse emit on protocols that return Generic or Unsupported. The full list of verified protocols is maintained as VERIFIED_EMIT_PROTOCOLS in crates/panproto-parse/src/registry.rs.
See also
- Source-code emission
- Schemas as theories
- Composing protocols by colimit
- Theory DSL: denotational semantics
Expression-language reference
The panproto-expr language is pure, total, and bounded: every well-typed expression terminates within a fixed step budget, no IO is permitted, and evaluation is deterministic. It is used inside migrations to describe field-level transforms, and inside queries to describe predicates and projections.
For the model behind the language, see Expression language: denotational semantics.
Surface syntax
The Haskell-style surface syntax supports literals, variables, lambdas, application, let, pattern matching, and list comprehensions. The full grammar lives in panproto-expr-parser. Inside JSON or YAML migration files, expressions appear as strings; the parser is invoked on read.
Types
| Type | Description |
|---|---|
Int | 64-bit signed integer with checked arithmetic. Overflow is an error, not a wrap. |
Float | IEEE 754 64-bit float. |
Str | UTF-8 string. |
Bool | Boolean. |
List a | Heterogeneous list (the type parameter is the element constraint when known). |
Record | Map from string keys to values. |
Null | Singleton type for the absence of a value. |
Any | Top type used in builtins that accept any input. |
Builtins
The 59 built-in operations are in panproto_expr::BuiltinOp, grouped by family in crates/panproto-expr/src/builtin.rs.
Arithmetic
Add, Sub, Mul, Div, Mod, Neg, Abs, Floor, Ceil, Round. Div and Mod raise DivisionByZero on a zero divisor. Integer arithmetic uses checked operations; overflow raises Overflow.
Comparison
Eq, Neq, Lt, Lte, Gt, Gte. Returns Bool.
Boolean
And, Or, Not. Short-circuit evaluation.
String
Concat, Len, Slice, Upper, Lower, Trim, Split, Join, Replace, Contains.
List
Map, Filter, Fold, FlatMap, Append, Head, Tail, Reverse, Length. Head and Tail raise on the empty list.
Record
MergeRecords, Keys, Values, HasField.
Utility
DefaultVal, Clamp, TruncateStr.
Type coercion
IntToFloat, FloatToInt, IntToStr, FloatToStr, StrToInt, StrToFloat. The *ToInt and *ToFloat parses raise ParseError on invalid input.
Type inspection
TypeOf, IsNull, IsList.
Graph traversal (instance-aware)
Edge, Children, HasEdge, EdgeCount, Anchor. These return Null in the standard evaluator. To use them, evaluate in an instance environment via panproto_inst::instance_env.
Errors
The full list of ExprError variants is in crates/panproto-expr/src/error.rs.
| Variant | Cause |
|---|---|
StepLimitExceeded(u64) | Evaluation exceeded the configured step budget (EvalConfig::max_steps, default 100,000). |
DepthExceeded(u32) | Evaluation exceeded the configured recursion depth. |
UnboundVariable(String) | Referenced a variable not bound in the environment. |
TypeError { expected, got } | Operand had an incompatible type. |
ArityMismatch { op, expected, got } | Builtin called with the wrong number of arguments. |
IndexOutOfBounds { index, len } | List index out of range. |
FieldNotFound(String) | Record field missing. |
NonExhaustiveMatch | No pattern arm matched the scrutinee. |
DivisionByZero | Div or Mod with a zero divisor. |
ListLengthExceeded(usize) | List operation exceeded the configured maximum length. |
ParseError { value, target_type } | A *ToInt / *ToFloat coercion failed to parse its input. |
NotAFunction | Application of a non-function value. |
Overflow | Checked integer arithmetic overflowed. |
FloatNotRepresentable(String) | Float value (NaN, infinity, out-of-range) cannot be represented as an integer. |
Authoritative source
Builtin enum: crates/panproto-expr/src/expr.rs. Implementation: crates/panproto-expr/src/builtin.rs. Evaluator: crates/panproto-expr/src/eval.rs.
See also
- Apply field transforms for using the language inside migrations.
- Query instances for using it inside queries.
- Expression language: denotational semantics for the formal model.
Lens combinator reference
A lens in panproto is a triple of functions over a source S, a view V, and a complement C:
get : S -> V
put : S × V × C -> S
complement : S -> C
Every constructor in panproto-lens produces a lens whose round-trip laws (GetPut, PutGet, PutPut) are verified by the property tests in panproto-lens/src/laws.rs. The complement carries the data discarded by get so that put can restore it.
For the model behind these combinators, see Lenses and round-trip laws and Lens DSL: denotational semantics.
Optic kinds
Each protolens or protolens chain is classified into an OpticKind. The classification is structural: it follows from the shape of the underlying TheoryTransform, which in turn reflects the schema edge a combinator is applied at.
OpticKind | Typical schema edge | Complement |
|---|---|---|
Iso | bijective relabeling (e.g. edge rename) | Unit |
Lens | prop (single-value projection) | dropped data |
Prism | sum / variant injection | variant tag |
Affine | Lens composed with Prism | (variant tag, dropped data) |
Traversal | item (collection, multi-focus) | per-position complements |
Composition follows the standard optics lattice: Iso is the identity, Traversal is absorbing, Lens + Prism collapses to Affine. The constructor functions for these kinds live in protolens::elementary (atomic steps: add_edge, drop_edge, rename_edge_name, …) and protolens::combinators (composite builders: pipeline, map_items, …). Browse the panproto_lens module index for full signatures.
Combinator families
| Family | Module | Purpose |
|---|---|---|
| Asymmetric lenses | asymmetric | Classical S → V transforms with put. |
| Symmetric lenses | symmetric | A ↔ B transforms with shared complement. |
| Composition | compose | Sequential and parallel composition of lenses. |
| Optics | optic | OpticKind enum (Iso, Lens, Prism, Affine, Traversal) and the composition lattice. |
| Fibration | fibration | The Grothendieck fibration of lenses over schemas. |
| Protolens | protolens | Schema-parameterized lens families with vertical and sequential composition. |
| Enrichment registry | enrichment_registry | Cross-crate LayoutEnricher trait and registry for schema-level enrichment fibres (e.g. the parse / decorate / emit lens; see Layout enrichment). |
| Laws | laws | Property-test harness for the three lens laws. |
Complement composition
Complement::compose is a partial commutative monoid:
- It returns
Result<Complement, LensError>. - It refuses to merge complements whose source-schema fingerprints disagree (
ComplementFingerprintMismatch). - It refuses to merge complements that disagree on a key (
ComplementConflict).
A pre-flight check is available as Complement::is_compatible. The full discussion is in Lenses and round-trip laws.
Protolens composition
Protolenses are natural transformations between schema endofunctors. protolens_composable requires structural equality of the intermediate endofunctor (same precondition, same transform) before vertical_compose will run; otherwise CompositionMismatch. Two instantiation modes are available:
| Mode | Function | Use |
|---|---|---|
| Fused | instantiate | Single morphism preserving migration metadata. Default for production. |
| Sequential | instantiate_sequential | Step-by-step folding through intermediate schemas. Used by property tests to inspect each step. |
Both satisfy the lens laws.
See also
- Use lenses, Use protolenses, Use dependent optics.
- Write lenses in the lens DSL.
- Lens DSL: denotational semantics.
- Protolens composition.
Configuration reference
The project manifest is panproto.toml at the project root. It is read by panproto-project::load_config and consumed by every CLI subcommand that operates on a project rather than a single file.
Top-level shape
[workspace]
name = "my-project"
exclude = ["build/**", "**/.cache"]
[[package]]
name = "user-api"
path = "schemas/user"
protocol = "json-schema"
[[package]]
name = "user-events"
path = "schemas/events"
protocol = "avro"
Sections
[workspace]
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Workspace identifier; used for VCS commits and emitted artifacts. |
exclude | list of glob patterns | no | Globs to exclude from project assembly. Evaluated relative to the manifest directory. |
[[package]]
One entry per schema package. Packages are assembled into a single project by schema coproduct.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Package identifier, unique within the workspace. |
path | path | yes | Directory containing the package’s schemas, relative to the manifest. |
protocol | string | no | Protocol name (json-schema, atproto, protobuf, …). When omitted, the protocol is inferred from file extensions and contents. |
Generating a manifest
schema init [PATH] initialises a panproto repository in the given directory (default .). The repository’s bookkeeping lives in .panproto/; the user-authored manifest is panproto.toml and is created by hand.
For programmatic creation, the panproto-project crate exposes generate_config and serialize_config.
Authoritative source
Manifest schema: crates/panproto-project/src/config.rs.
See also
- Define a schema from the CLI for the project-level workflow.
- Schema version control: init and commit.
Crate map
The panproto-* crates in the workspace, with one-line descriptions and dependency direction. The full dependency graph appears in explanation/architecture. Source for any crate lives at crates/<name>/ in the repository.
Core
| Crate | Description |
|---|---|
panproto-gat | Generalized algebraic theories: sorts, operations, equations, and the colimit machinery that composes them. |
panproto-gat-macros | class! and inductive! macros for declaring GATs in Rust source. |
panproto-schema | Schema representation: vertices, edges, the schema graph, validation. Hosts the AbstractSchema / DecoratedSchema typed-newtype distinction and the layout-fibre forgetful U (Schema::forget_layout). |
panproto-inst | Instances: data records over a schema, including Value::List and the typed value lattice. |
panproto-mig | Migration engine: morphisms between schema theories, restrict and lift. |
panproto-lens | Bidirectional lenses: get/put/complement, the three round-trip laws, fibration over schemas, optic kinds, protolenses, the cross-crate enrichment_registry for layout and other schema-level fibres. |
panproto-lens-dsl | Declarative lens DSL with Nickel, JSON, and YAML surfaces. |
panproto-theory-dsl | Declarative theory DSL for defining custom protocols. |
panproto-check | Breaking-change detection: classifies schema diffs as fully compatible, backward compatible, or breaking. |
panproto-protocols | Built-in protocol definitions composed via theory colimits. |
panproto-expr | Pure, total expression language: terms, types, environment evaluation. |
panproto-expr-parser | Haskell-style surface syntax parser for panproto-expr. |
I/O and parsing
| Crate | Description |
|---|---|
panproto-io | Parse/emit codecs that bridge native formats (JSON, Avro, Protobuf, …) to panproto-inst. |
panproto-parse | Tree-sitter full-AST parsing across 261 languages, with interstitial preservation. Hosts the put-direction (decorate) of the parse / emit lens and the LayoutEnricher adapter installed in panproto-lens’s enrichment registry. |
panproto-grammars | Pre-compiled tree-sitter grammars used by panproto-parse. |
panproto-grammars-all | Umbrella grammar companion pack for the Python wheel. |
panproto-grammars-functional | Functional-language grammar pack. |
panproto-grammars-web | Web-language grammar pack. |
panproto-grammars-systems | Systems-language grammar pack. |
panproto-grammars-jvm | JVM-language grammar pack. |
panproto-grammars-scripting | Scripting-language grammar pack. |
panproto-grammars-data | Data-language grammar pack. |
panproto-grammars-devops | DevOps-language grammar pack. |
panproto-grammars-mobile | Mobile-language grammar pack. |
panproto-grammars-music | Music-language grammar pack (SuperCollider, LilyPond, ABC, Csound, ChucK, Glicol, Tidal, Strudel). |
panproto-project | Multi-file project assembly via schema coproduct, manifest loading. |
Version control
| Crate | Description |
|---|---|
panproto-vcs | Schematic version control: DAG, refs, commit/branch/merge, pushout-based merge with universal-property verification, data versioning. |
panproto-git | Bidirectional bridge between panproto-vcs and git. |
panproto-git-remote | Custom git remote helper for cloning panproto repositories over git. |
panproto-xrpc | XRPC client for cospan-node VCS operations. |
Bindings and surfaces
| Crate | Description |
|---|---|
panproto-core | Public re-export facade. The single dependency a downstream Rust user needs. |
panproto-cli | The schema binary. |
panproto-wasm | WebAssembly bindings; consumed by the TypeScript SDK. |
panproto-py | Native Python bindings via PyO3. |
panproto-c | C ABI for non-Rust language bindings (Haskell first). |
panproto-repl | REPL engine for theories, terms, and morphisms. |
Acceleration
| Crate | Description |
|---|---|
panproto-jit | LLVM JIT compilation for accelerated migrations. |
panproto-llvm | LLVM IR protocol definition and lowering. |
Repository tasks
| Crate | Description |
|---|---|
xtask | Repository tasks (not published). Includes gen-cli-docs. |
See also
- Architecture for the dependency direction and layering.
- The repository
Cargo.tomlfor authoritative workspace membership.
Explanation
Explanation pages cover why the system is the way it is. They are essays, not recipes. Each one is built in two tiers:
- An In plain terms opener (3-5 sentences) that names the concept and gives the working-developer analogue. You can read the entire quadrant at this level and come away with a working mental model.
- A formal section that names the underlying construction and points into the reference and semantics pages where it is precisely defined.
The pages here, in increasing order of formality:
| Page | Tier |
|---|---|
| What panproto solves | Plain |
| Schemas as theories | Plain, with one formal section |
| Migrations as morphisms | Plain, with one formal section |
| Lenses and round-trip laws | Plain, with the three laws stated formally |
| Layout enrichment | Plain, with the section law stated formally |
| Composing protocols by colimit | Plain, with one formal section |
| Schema version control semantics | Plain, with one formal section |
| What panproto verifies | Catalogue of mechanically-checked properties |
| Architecture | Crate dependency graph and the layering that holds the system together |
The denotational semantics cluster is separate and load-bearing. It is the place where the implementation is pinned to a precise mathematical specification: the expression language, the lens DSL, the theory DSL, the pushout-based merge, and protolens composition. Each of those pages still opens with a plain-terms section, but the body is dense.
What panproto solves
In plain terms
Whenever you change the shape of your data: add a column, rename a field, split one record into two, retire an enum value, switch from JSON Schema to Protobuf, you have to write migration code to bring existing data forward. That code is tedious, error-prone, and easy to get subtly wrong in ways that only surface in production.
panproto reads two versions of a schema, computes the difference, and produces the migration code for you. It also flags whether the change is backward-compatible or breaking, version-controls the schema history with git-style commands, and lets the same diff/migrate/version-control workflow run across schema languages: JSON Schema, Protobuf, GraphQL, ATProto Lexicons, SQL DDL, Avro, and 45 others.
The thing that makes this work uniformly across all those languages is that panproto treats every schema as an instance of a common structure. The shared structure is what the rest of the explanation quadrant unpacks.
The three concrete jobs
panproto exists to do three jobs that today are usually done by hand:
- Diff and classify schema changes. Given two versions of a schema, identify what was added, removed, renamed, or had its type changed; classify each change as fully compatible, backward compatible, or breaking; and surface the result in a way CI can gate on.
- Generate the migration code. Produce a bidirectional transform between the two versions: a function that lifts old records to the new shape, paired with a function that pushes new records back to the old shape without losing data. The pair is checked to satisfy round-trip laws, so the migration cannot silently corrupt data.
- Version-control schemas as first-class objects. Treat the schema history like a git history: commits, branches, merges, diffs, blame, tags. Merge conflicts at the schema level are resolved by a precise, well-defined operation rather than a manual three-way text merge.
What it does not solve
panproto does not write your application code, validate your runtime behaviour beyond what the schema describes, or replace your database. It is a layer between schema files and the data they describe. The output is migration code and a diff classification, not infrastructure.
See also
- Schemas as theories for the structure that lets the same workflow apply across schema languages.
- Migrations as morphisms for the model behind generated migrations.
- Schema version control semantics for the merge story.
Schemas as theories
In plain terms
A schema is a description of what shapes your data can take: which fields exist, which types they hold, which combinations are allowed. JSON Schema, Protobuf, GraphQL, SQL DDL, ATProto Lexicons all do this, each in their own syntax.
panproto treats a schema as an algebraic theory: a set of basic kinds of thing (records, fields, types, references), a set of operations on them (add a field, set a constraint, link two records), and a set of equations describing when two ways of building the same shape are equivalent. The theory is a precise mathematical object; an actual JSON Schema file or Protobuf .proto file is a presentation of one.
The reason this matters is that once you have schemas as theories, you can talk about morphisms between them with the same precision. A morphism is a structured map from one theory to another: a translation that respects every operation. Migrations between schema versions, translations between schema languages, and the merging of two divergent schema branches are all morphisms. The whole pipeline downstream of the schema treats them uniformly.
The model
A schema is an instance of a generalized algebraic theory (GAT). A GAT consists of:
- Sorts. The basic kinds of thing the theory talks about: vertices, edges, types, constraints.
- Operations. Constructors that produce sorts from other sorts:
vertexof a given kind,edgebetween two vertices,propedge with a name,itemedge for collection membership,variantedge for sum-typed alternatives. - Equations. Equalities between operation chains that the theory imposes: composition is associative; the identity edge is a no-op; certain constraints are equivalent rewrites of others.
Each protocol panproto recognises is a GAT plus a parser/emitter pair. JSON Schema is one such GAT; Protobuf is another. The shared building-block theories (graph, multigraph, constraint, W-type, …) are composed by colimit to produce each protocol’s specific theory. See Composing protocols by colimit.
A schema in a protocol is a model of that protocol’s theory: a concrete choice of vertices, edges, and constraint values that satisfies the theory’s equations. An instance is a model of the schema’s instance theory: actual data records that conform to the schema.
This separation, schema-theory versus instance-theory, runs through the entire system. Schemas have their own GAT; instances have their own GAT, parameterised over a schema; migrations are morphisms between schema theories; lift is the operation that takes records over the source schema and produces records over the target.
What is not modelled
- Performance characteristics (a schema with 10,000 vertices is a valid model just like one with 5).
- Wire-format minutiae beyond what the schema theory describes (whitespace, comment syntax, ordering conventions).
- Application-level semantic constraints not expressible in the protocol’s theory.
The first two are handled by the format-preserving codec layer. The third is by design: panproto is precise about exactly what it can guarantee.
Related work
The schemas-as-theories framing belongs to a lineage that runs from Cartmell’s generalised algebraic theories through Spivak’s functorial data model and the algebraic-databases programme of Schultz and Wisnesky to Patterson’s ACSets and Lu’s recent multi-model unification. panproto presents each protocol as its own GAT and takes a colimit in the category of GAT presentations; ACSets and the Spivak-Wisnesky tradition fix a single meta-theory and parameterise schemas inside it. See Related work for the full discussion.
See also
- Composing protocols by colimit for how protocols are built from shared building-block theories.
- Migrations as morphisms for what morphism between schemas means concretely.
- Theory DSL: denotational semantics for the formal model of a GAT presentation.
- The book’s bibliography references Cartmell (1986) for the original GAT formulation and Spivak & Wisnesky (2015) for the functorial-data-migration application.
Migrations as morphisms
In plain terms
A migration is what gets you from version 3 of your schema to version 4, with all your existing version-3 data carried forward into version-4 shape. Hand-written migrations are usually a script that walks each record, renames fields, fills in defaults, drops what is no longer there.
panproto represents a migration as a structured map between the two schemas: for every vertex (record kind) in the new schema, where does it come from in the old one; for every edge (field, item, variant), how is it derived. The structured map is called a morphism. Once you have a morphism, two operations follow:
- Restrict moves the morphism backwards: it tells you what part of the old schema is needed to produce a given part of the new one.
- Lift moves data forwards: it takes a record that conforms to the old schema and produces a record that conforms to the new one, using the morphism to know what to put where.
Lift is the operation you usually want; restrict is the operation panproto uses internally to figure out which old fields are required for the migration to succeed. Both are total functions on the things they apply to; if a migration cannot be lifted (because some required input is missing), panproto-check says so up front rather than failing partway through.
The formal picture
Schemas live in a category whose objects are schema theories and whose morphisms are theory morphisms. A migration from schema to schema is a morphism in this category. The migration engine is split into two functors:
- Restrict : pulls a -instance back to an -instance along . Used to check existence conditions: which -records does require to be present?
- Lift : pushes an -instance forward to a -instance along . Used to actually migrate data.
The pair forms an adjunction in the categories of instances: lift is left adjoint to restrict. (This follows the convention in Spivak & Wisnesky (2015), where is the dependent sum over the fibres of and is its right adjoint by base change.) The adjunction is the structure that lets panproto check, before any data moves, whether a migration is well-defined.
A migration may also include a value-level transform: not just where a field comes from, but how its value is computed. These are written in the expression language and applied during lift.
Three classes of migration
| Class | Diff classification | Example | Lift behaviour |
|---|---|---|---|
| Fully compatible | Refinement | Add an optional field with a default. | Total; old records lift unchanged. |
| Backward compatible | Inclusion | Add a required field whose value is computed from existing fields. | Total; old records lift via the value-level transform. |
| Breaking | Neither | Remove a required field with no recovery. | Partial; some old records cannot be lifted. panproto-check flags this. |
panproto-check runs the existence check on a migration without applying it, so CI can gate on the classification before merge. See Breaking-change gate.
What this gives you
- You write the migration map (what goes where), not the migration script (how to execute it). The script is generated from the map and the schema theories.
- You get a precise classification of whether the change is breaking, before any data is touched.
- You get a bidirectional artifact: the lift forward is paired with a put backward, so the migration is also a lens. Round-trip laws apply.
- Migrations compose. If and , then is a migration whose lift agrees with applying ’s lift then ’s. The migration history of a schema is a chain of these compositions.
See also
- Lenses and round-trip laws for the bidirectional half.
- Apply field transforms for value-level transforms.
- What panproto verifies for the existence-check guarantee.
- Spivak & Wisnesky (2015) for the categorical data-migration framework this builds on.
Lenses and round-trip laws
In plain terms
A lens is a pair of functions for moving between two shapes of data: one for going forward (often called get), one for going backward (often called put). The going-backward function is what makes a lens a lens rather than a one-way transform. It lets you take an edited new-shape record, push the edit back into the old-shape record, and recover whatever data the new shape did not have room for.
The data the new shape does not have room for has to live somewhere during the round trip. In a panproto lens, it lives in a third value called the complement. You can think of it as a sidecar that holds whatever get discarded, so that put can put it back.
A lens is lawful when it satisfies three round-trip identities. Roughly: getting then putting unchanged data is a no-op; putting then getting recovers exactly what you put in; and putting twice in a row is the same as putting just the second value. panproto verifies these three laws by property-based testing on every lens combinator. A lens that fails any of them is rejected.
The reason lenses matter for panproto: every migration is a lens. The lift function is the get (forward) and the put function is the backward direction. Together they let you migrate data forward, edit the new data, and push it back to the old shape if you ever need to.
The triple
A lens between source S and view V with complement C is three functions:
get : S -> V
put : S × V × C -> S
complement : S -> C
The complement function records what get is about to throw away; the put function uses the complement to reconstruct the parts of S that V does not determine.
The three laws
For every lawful lens:
- GetPut. . Applying
getto extract a view, then putting that view back without changes, returns the original source. - PutGet. . Putting a new view in returns that view when you read it back.
- PutPut. . Two consecutive puts to the same complement collapse to the second put.
PutPut is the third law. It is checked by panproto_lens::laws::check_put_put against every lens combinator, with random perturbations of the second view generated to ensure the law holds across the full input space, not just at fixed points.
Complement composition
When two lenses are composed, their complements need to combine. Complement::compose is a partial commutative monoid:
- It returns
Result<Complement, LensError>. - Two complements compose only if they share the same source-schema fingerprint. Otherwise composition fails with
ComplementFingerprintMismatch. - For overlapping keys, the two complements must agree on the value. Disagreement fails with
ComplementConflict.
This is the part of the lens machinery that prevents silent data loss. Earlier versions of panproto merged complements with a “first writer wins” rule; that rule could swallow disagreements between two paths through a lens diagram. The partial-monoid rule makes any such disagreement a hard error. Pre-flight check: Complement::is_compatible.
Where lenses come from
You almost never write a lens from scratch. They are produced by:
- Migration compilation. Every migration morphism compiles to a lens whose
getis lift and whoseputis the backward transform. - The lens DSL (panproto-lens-dsl). A declarative spec in Nickel, JSON, or YAML compiles to the lens combinator algebra.
- Protolenses (panproto-lens::protolens). Schema-parameterized lens families whose instantiations cover whole fleets of related schemas at once.
Schema-level lenses: layout as an enrichment
There is a second flavour of lens that lives one level up. parse / emit is not a WInstance lens; it is a relation between bytes and schemas, with parse recording the source layout into a fibre of constraints over each vertex and emit_pretty consuming that fibre to render bytes back. The relation has the shape of a lens at the schema level: stripping the layout fibre is the get, attaching it via a grammar walk is the put.
panproto names this construction explicitly. The EnrichmentKind::Layout tag in panproto-gat classifies the constraint sorts that make up the fibre. TheoryTransform::StripEnrichment and TheoryTransform::AddEnrichment are the two directions at the protolens level; their schema-level interpretation runs in apply_theory_transform_to_schema (strip drops the fibre constraints; add dispatches to a registered synthesis driver). The ComplementConstructor::Enrichment variant names the fibre and the driver in the complement vocabulary.
The schema-level lens does not plug into the WInstance-level get / put pair the way an elementary protolens does. The byte-level operational entry points live in panproto-parse: ParserRegistry::decorate for the put direction, ParserRegistry::parse_with_protocol for the get. The protolens captures the schema-level relationship those byte-level operations sit over; it composes with elementary protolenses for chain-law reasoning but is not the implementation of decorate or parse. See Layout enrichment for the full treatment.
Related work
The asymmetric get/put/create triple is from Foster et al. (2007), with totality and well-behavedness laws stated there. The first-class complement comes from the symmetric and edit-lens line of Hofmann et al. (2011) and Hofmann et al. (2012), with the edit-lens module structure giving the partial-monoid merge that this chapter relies on. The dispatch from edge kind to optic kind (Lens, Prism, Affine, Traversal) is licensed by the profunctor-optics theorem of Pickering et al. (2017), restated categorically by Clarke et al. (2024). The migration-as-lens-graph idiom is from Litt et al. (2020). See Related work for the full discussion.
See also
- Lens DSL: denotational semantics for the formal lens model and the law statements.
- Protolens composition for vertical and sequential composition.
- Layout enrichment for the schema-level lens between abstract and decorated schemas.
- Lens combinator reference for the algebra.
Layout enrichment
In plain terms
When panproto parses source code, it does two things at once. It records the structure of the program (which nodes are children of which, what kinds they are) and it records the layout of the source (where each token started in the byte stream, what whitespace sat between adjacent tokens, which alternative the parser took at each branch point). The structure is what most downstream operations care about. The layout is what the emitter needs to render bytes back.
The two parts live together in one schema, but they are separable: you can strip the layout and have a perfectly good abstract description of the program, or you can keep it and round-trip back to source bytes. decorate is the operation that takes the abstract description and rebuilds the layout on top.
The reason this matters: any tool that wants to generate source code from a panproto schema (a code refactorer, a reverse bridge from a domain model to a target language, a migration that materialises a new file) needs to produce a schema with the layout fibre attached. Before decorate, generators had to manually populate the layout constraints: disguised string concatenation, brittle to grammar revisions. With decorate, the operation is mechanical: write the abstract content, hand it to a grammar, get a schema the emitter can render byte-for-byte.
This page is the categorical framing: the parse / emit pair as a Grothendieck-style fibration, the section law that ties decorate and forget_layout together, and the cross-crate registry that makes the parse-side machinery available to the lens layer. For the operational mechanics of the emitter itself (how it derives spacing, dispatches CHOICE alternatives, and resolves external tokens through the cassette system), see Source-code emission.
The forgetful U and its section
Take a schema S produced by parsing some source bytes. Every vertex carries some constraints. Split them into two groups: the layout constraints (start-byte, end-byte, interstitial-N, chose-alt-fingerprint, chose-alt-child-kinds) and everything else (vertex kind, edges, literal-value, anonymous-token field:*, plus any protocol-defined sorts). The first group is parser-only metadata; the second is the abstract content of the program.
Stripping the first group is a function:
forget_layout : DecoratedSchema → AbstractSchema
Going the other way is harder. Given just the abstract content, you have to choose whitespace, choose which CHOICE alternative to dispatch through, and synthesise the byte spans. decorate is one canonical choice:
decorate : AbstractSchema × LayoutPolicy → DecoratedSchema
The two satisfy the section law:
where ≅_kind means equal up to vertex-id renaming and the kind / edge multiset.1 You can think of decorate as a one-sided inverse of forget_layout, picking a canonical representative of the parse-preimage at every abstract schema.
The pair (forget_layout, decorate) is the schema-level version of the parse / emit pair. parsing is bytes → DecoratedSchema; emitting is DecoratedSchema → bytes. Composing both gives the round-trip:
The image of pretty lands in the parse-preimage of its input modulo the same kind / edge multiset equivalence. This is the round-trip law that ships with the existing ParseEmitLens machinery, lifted to the typed-newtype level.
Layout as a Grothendieck-style enrichment
Across the broader panproto type system, layout is one of several enrichments a schema can carry over its abstract base. The framing is the same one panproto already uses for coercions, with a Grothendieck fibration over the abstract schema and the layout data living in the fibre.
The base of the fibration is the abstract schema. The fibre over each vertex is the layout data that vertex carries. The total space is the decorated schema. forget_layout is the cartesian projection back to the base. decorate is a section of that projection, picking out one chosen layout-data assignment for each vertex.
The EnrichmentKind enum in panproto-gat is the classifying tag. Today there is one variant, Layout. The infrastructure is shaped so other enrichments (provenance witnesses, refinement evidence, anything else that extends the schema without changing its underlying GAT theory) can slot in alongside without re-engineering the lens framework.
Two pieces of machinery name this fibration directly:
TheoryTransform::StripEnrichment(kind)andTheoryTransform::AddEnrichment { kind, enricher, policy }describe the two directions at the protolens level. Their schema-level effect is to remove or attach the fibre constraints; their theory-level effect is identity (the underlying GAT is the same in both source and target).ComplementConstructor::Enrichment { kind, enricher }names a registered synthesis driver in the complement vocabulary. This is metadata for the lens framework’s chain-law reasoning; the driver itself lives behind theLayoutEnrichertrait inpanproto-lens::enrichment_registry, populated bypanproto-parseatParserRegistry::newtime so the lens crate stays grammar-agnostic.
The cross-crate registration mechanism
The schema-level synthesis for Layout needs a grammar walker, and grammar walkers live in panproto-parse. The lens crate cannot depend on the parse crate (the dependency arrow would invert), so the bridge is a thin global registry of LayoutEnricher implementations keyed by (EnrichmentKind, enricher_name).
When ParserRegistry::register accepts a parser, it installs that parser into the global registry under its protocol name. A subsequent Protolens::instantiate against a parse_emit_protolens("lilypond", policy) dispatches the AddEnrichment { kind: Layout, enricher: "lilypond", .. } arm of apply_theory_transform_to_schema to the registered driver. The driver runs emit_pretty + parse against the input and returns the result.
The registry is process-global, single-keyed, and tolerant of re-registration (the last writer for a (kind, name) pair wins). Lock poisoning is recovered transparently because the critical sections do not invoke user code.
What the protolens does and does not do
parse_emit_protolens(grammar, policy) returns a Protolens whose source endofunctor strips the layout fibre, whose target endofunctor adds it back via the registered driver, and whose complement constructor names the fibre. It composes with every other protolens in panproto-lens (vertical and horizontal composition both work) and contributes to chain-law reasoning.
What it does not do is plug into the WInstance-level get / put pair the way an elementary protolens does. The lens framework’s Complement struct holds WInstance-level discarded data (dropped nodes, dropped arcs, contraction choices). It has no field for a per-vertex constraint fibre, and Protolens::instantiate produces a Lens whose source and target schemas differ in vertex IDs because the synthesis driver invents fresh ones. The schema-level shuffling happens in apply_theory_transform_to_schema; the operational byte-level entry points are on ParserRegistry.
That asymmetry is intentional. Parse and emit are conversions between bytes and schema-typed data structures; they are not WInstance lenses. The protolens captures the schema-level relationship that those byte-level operations sit over, and the operational API is where the byte-level work happens.
Related work
Two threads bear directly on the parse-emit fibration. The lens-based ancestry runs from Foster et al. (2007) through Bohannon et al. (2008)’s resourceful lenses (dictionary skeletons, quasi-obliviousness) and Foster et al. (2008)’s quotient lenses (lenses modulo equivalences, canonizers via lquot and rquot), with Lutterkort (2008) as the closest framework-level analogue. The grammar-based ancestry is Zhu et al. (2015)’s BiYacc, whose reflective printer takes both the AST and the original concrete string. de & Visser (2012)’s token-stream-and-origin-tracking algorithm is the closest match to the byte-level reconstruction strategy panproto uses inside the layout enrichment driver. See Related work for the full discussion.
See also
- Source-code emission for the operational mechanics of
emit_pretty: token-role classification, CHOICE dispatch tiers, cassettes, and the verification tier API. - Decorate an abstract schema for the operational recipe.
- Lenses and round-trip laws for the lens machinery the enrichment rides on.
- Architecture for the crate-level dependency direction that makes the cross-crate registry necessary.
-
The kind / edge multiset granularity, rather than pointwise vertex-id equality, is the standard granularity throughout panproto’s round-trip law machinery. The parse walker invents fresh vertex IDs at every call; insisting on pointwise identity would make even
parse ∘ emit_pretty ∘ parseill-typed. The multiset is the natural invariant: vertex kinds and edge-shape signatures are preserved by every layer of the parse / emit pipeline, and the section law is stated at the same level. ↩
Source-code emission
In plain terms
panproto reads source code in 261 languages by sitting on top of tree-sitter, which has a community-maintained grammar for each one. The reverse direction, writing source code back out from a schema, is harder. tree-sitter grammars are written for parsing; they record the production rules the parser uses but not the spacing, indentation, or alternative-disambiguation choices an emitter needs. panproto’s source-code emitter, emit_pretty, closes that gap by deriving emission behaviour structurally from the same grammar.json the parser already ships.
The output of every emit is a function of three things: the schema, the production grammar, and a small per-language cassette that supplies defaults for the cases grammar.json cannot describe. Everything else (token roles, bracket pairs, spacing rules, indent triggers, CHOICE dispatch) is derived at construction time from the grammar’s own structure. There is no per-language emitter code.
A separate format-preserving codec covers structured-data formats (JSON, YAML, TOML, XML, CSV) and guarantees byte-for-byte round-trip. The two systems are independent; this page is about the tree-sitter source-code path.
The two halves of the gap
grammar.json lists every rule (SEQ, CHOICE, REPEAT, OPTIONAL, FIELD, ALIAS, SYMBOL, STRING, PATTERN, IMMEDIATE_TOKEN, PREC, TOKEN, RESERVED) the parser walks. It does not record:
- Whitespace and layout. Whether
a + bora+b; whetherif (x) { ... }indents its body; whetherdef f(): x = 1is one line or two. Tree-sitter strips whitespace during parse; nothing ingrammar.jsonsays how to put it back. - CHOICE disambiguation for synthesised schemas. When two CHOICE alternatives both admit the same children (
binary_expression CHOICE[+, -, *, /, ...]over the same operand kinds), the parser picks via its parse-table state. A schema built from scratch (no parse history) has no record of which alt was taken.
emit_pretty handles the first half structurally and the second half via a layered fallback hierarchy.
Grammar-derived token roles
Every STRING literal in a rule body is classified by its structural role at Grammar::from_bytes time. There are eight roles, classified per (rule, position) pair, not per token text:
| Role | Identification rule |
|---|---|
BracketOpen | First STRING in a SEQ whose first / last STRINGs satisfy the matched-pair predicate |
BracketClose | Last STRING of the same SEQ |
Separator | First STRING in a REPEAT body’s inner SEQ |
Keyword | STRING matching [a-zA-Z_][a-zA-Z0-9_]* |
Operator | Non-alphanumeric STRING between content members inside a CHOICE alternative |
Connector | Non-alphanumeric STRING between content members in a standalone SEQ (a structural connector such as . or ::, not an algebraic operator); emits no surrounding space |
Terminal | Text from a leaf vertex’s literal-value constraint |
Immediate | A token the grammar wraps in IMMEDIATE_TOKEN, glued to its neighbour with no intervening whitespace |
The matched-pair predicate identifies (), [], {}, <>, begin/end, do/done, |...|, <<>>, ${}, ⟨⟩, and every other bracket-like construct from grammar structure alone. There is no fixed character set: a per-SEQ check that the first and last STRINGs are different (or same-text with at least one wrapped in IMMEDIATE_TOKEN, e.g. regex /.../), with non-STRING content between them, identifies the pair.
The role-pair lookup table (needs_space_by_role) takes two adjacent token roles and returns whether to emit a space. No token text is inspected by the layout pass; spacing decisions are entirely role-driven. The table is small (one fixed function) and the same for every grammar.
Indent triggers
A bracket-open token triggers indentation when the content between open and close contains REPEAT or REPEAT1 (recursively through CHOICE / SEQ wrappers). This is the structural witness of “block-level content the source author would have indented”. Function bodies, statement lists, struct fields, namespace contents all match; inline constructs like type parameter lists, function arguments, and string interpolation do not, even though they may use the same bracket pair characters in other rules.
Word-like bracket pairs (function/end, if/end, module/end, etc., the matched-pair shape with alphanumeric delimiters) always trigger indentation: they only appear in block constructs across the 261 vendored grammars.
CHOICE dispatch
At emit time, pick_choice_with_cursor selects which CHOICE alternative to walk for a given vertex. The pipeline composes several structural filters; each rejects alternatives that cannot be the right pick given the cursor’s unconsumed edges and the schema’s constraint witnesses.
The categorical core is the acceptance predicate, accepts_first_edge(prod, edge_field, target_kind), an inductive function over the production tree:
| Production | Acceptance rule |
|---|---|
STRING / PATTERN / BLANK | reject (consumes no edges) |
SYMBOL X (concrete) | edge_field == "child_of" AND target_kind ⊑ X |
SYMBOL X (hidden / supertype) | accepts(X.rule, edge) |
ALIAS{c, named:true, value:V} | edge_field == "child_of" AND target_kind == V |
FIELD{name, content} | edge_field == name AND content.yield admits target_kind |
SEQ[m1, m2, ...] | accepts(m1, edge) OR (m1 ε-able AND accepts(SEQ[m2..], edge)) |
CHOICE[a1, a2, ...] | any of accepts(ai, edge) |
OPTIONAL / REPEAT / REPEAT1 / wrappers | accepts(inner, edge) |
accepts_first_edge fuses four previously-separate ad-hoc checks (FIELD-name matching, SYMBOL subtype dispatch, ALIAS rewrite, yield-set admission) into one rule.
Three discriminators layer on top when more than one alternative passes accepts_first_edge:
- Token-set restriction. An alt’s FIELD body of shape
ALIAS{CHOICE[STRING_1, STRING_2, ...], value: V}constrains the field child’s literal to that set. If the cursor’s field-named edge target carries a literal outside the set, the alt is rejected. This is what disambiguates Gocall_expression(alt 0 hasfunction: ALIAS{CHOICE["new","make"], value:"identifier"}, valid only when the function name is literallynewormake). - Alias-source filter. An alt’s FIELD body of shape
ALIAS{SYMBOL X, named: true, value: _}requires the cursor’s field-named edge target to carry apre-alias-symbolconstraint equal toX. The walker recordspre-alias-symbolfromtree_sitter::Node::grammar_name()whenever it differs fromkind(); this is the only ALIAS-disambiguation signal tree-sitter 0.25 / 0.26 actually exposes. - Positional interstitial scoring. When the above filters leave more than one candidate, alts are scored against the slice of recorded interstitials from the current cursor position forward. The cursor’s consumed count gives the slice offset, so a trailing CHOICE-with-BLANK at position N sees only the interstitial at gap N, not the comma separators from earlier REPEAT iterations.
On yield-set ties after every filter, tree-sitter precedence (PREC annotations on alternatives) breaks the tie. This honours the grammar author’s explicit disambiguation rule.
The subtype closure
accepts_first_edge consults a precomputed subtype closure K ⊑ Y (“a vertex of kind K can appear where the grammar says SYMBOL Y”). The closure is computed once at Grammar::from_bytes time:
- Walk every hidden rule (
_-prefixed) and supertype rule’s body to identify which concrete kinds satisfy each dispatch point. - Walk every named ALIAS (
ALIAS{c, named:true, value:V}) and record that the valueVis satisfied by every kind reachable fromc. - Close the relation under composition: a Tarjan SCC over the dispatchable-only subgraph (hidden / supertype names form the nodes; satisfying edges are the transitions), producing an exact
O(V + E)transitive closure without any iteration cap. The closure is keyed by concrete kind name, soaccepts_first_edge’s lookup isO(1).
Cassettes
The output of every external scanner token must come from somewhere. tree-sitter’s external scanners are C code that produces tokens whose text varies at runtime; grammar.json declares the token name but not the text. Four resolution tiers cascade:
- Anonymous ALIAS.
ALIAS{SYMBOL ext, named:false, value: V}declares that the external token emits exactlyV. Read directly from grammar structure. - CHOICE equivalence.
CHOICE[SYMBOL ext, STRING s]declares that the external token is equivalent to the literals. Read directly from grammar structure. - CstComplement. For tokens whose text was captured at parse time, the schema’s
literal-valueconstraint on the vertex carries the actual text. Used for heredoc bodies, raw-string content, regex patterns, escape sequences. - Cassette default. When none of the above applies, a per-grammar
GrammarCassettesupplies a default.
The cassette layer itself composes two parts:
- Universal layer (
common_external_default). A closed table of name-pattern → default-text mappings derived from a structural audit of every vendored grammar. Recognises layout markers (_concat,_no_space,_brace_start), immediate-position markers (_immediate_*), error sentinels (_error_*,error_sentinel), automatic-semicolon family (_automatic_semicolon,_optional_semi), generic string delimiters (string_start,string_end), heredoc / raw-string content (returns""; actual text comes fromliteral-value), and the descendant-operator family from CSS-like grammars. - Per-grammar override (
GrammarCassette::external_token_default). A small per-grammar implementation that overrides specific tokens. The lookup composes per-grammar first, universal fallback second, viaresolve_external_token.
The per-grammar overrides exist only where the language needs an override on top of the universal layer. The default empty cassette (used for ~230 of the 261 grammars) delegates entirely to the universal layer.
IMMEDIATE_TOKEN as a layout marker
tree-sitter’s IMMEDIATE_TOKEN is the grammar’s explicit signal that the wrapped token must have no preceding whitespace. emit_pretty lifts this to a single NoSpace marker emitted at the unique structural site where IMMEDIATE_TOKEN is declared:
- When the production walker enters
Production::ImmediateToken, emitNoSpace. - When
emit_vertexenters a vertex whose rule body isIMMEDIATE_TOKEN(...)at the head, emitNoSpacebefore any content the leaf shortcut or rule-body walk produces.
The layout pass reads the marker; downstream code does not need to inspect production shapes to recover the property. This is what makes regex literals like /abc/g round-trip tight on both delimiters: regex_pattern’s rule body starts with IMMEDIATE_TOKEN, and the trailing / is wrapped in IMMEDIATE_TOKEN in the regex SEQ.
The verification tier API
emit_pretty is structurally sound for any grammar with a vendored grammar.json, but structural soundness is not the same as an output that round-trips. The fixed-point law emit_pretty(parse(emit_pretty(s))) == emit_pretty(s) is the actual correctness witness, and it has to be exercised per grammar. ParserRegistry::emit_verification_status reports which tier a protocol falls into:
| Status | Meaning |
|---|---|
Verified | The protocol round-trips under the strict oracle described below; downstream tooling may trust its emit |
Generic | The protocol is registered and the generic dispatch path applies, but no test asserts emit correctness; output is structurally derived and likely correct, yet unverified |
Unsupported | The protocol is not registered, or its grammar lacks the vendored grammar.json that emit_pretty requires |
Downstream tooling (notably quivers and other transpile pipelines) calls this API upfront and refuses emit on any backend returning Generic or Unsupported. There are no runtime warnings; the silence-versus-noise tradeoff falls on the caller.
A protocol earns Verified on one of two bases. The strong basis is corpus verification: every entry in the grammar author’s own test/corpus/ round-trips under the full oracle, which the strict emit_corpus_audit test checks over the whole corpus rather than one hand-written sample. The oracle is three conjuncts on each corpus entry s. Writing e1 for emit_pretty(parse(s)) and e2 for emit_pretty(parse(e1)), it requires e1 == e2 (emit reaches a fixed point), kind_multiset(parse(s)) == kind_multiset(parse(e1)) (no vertex kind is gained or lost), and the edge multisets equal likewise (no edge is gained or lost). Notice what the oracle does not require: e1 == s. A verified protocol may legitimately reformat, as json re-indents arrays and go applies gofmt spacing; what it may not do is alter the abstract syntax tree or fail to reach a fixed point. The second basis is backend verification: a quivers transpile backend (python, stan, bugs, jags, julia, scheme, javascript) covered by dedicated emit regression tests over the construct surface quivers actually emits, with full corpus pass tracked as follow-on work.
The current Verified set covers 255 protocols, kept in sorted order in VERIFIED_EMIT_PROTOCOLS in crates/panproto-parse/src/registry.rs. Adding a protocol requires the corpus audit to pass for it (or a backend regression test to cover it), then appending the name to that array. An earlier expansion to 149 protocols on single hand-written samples was reverted after the corpus audit showed most failed their own grammar’s test corpus: one sample is not a sufficient witness.
Limitations
Three classes of input fall outside what the structural pipeline can recover:
- By-construction CHOICEs with no constraint signal. When a schema is built from scratch (no parse history) and a CHOICE is over multiple alternatives that all admit the same children (
CHOICE[STRING "model", STRING "data"]for a BUGS block keyword), nothing in the abstract content picks. The universal cassette + per-grammar cassette provide deterministic defaults, but the choice is opinionated and may not match the source author’s intent. Decorate an abstract schema is the canonical workflow. - Heredoc / raw-string synthesis. The universal cassette returns
""for heredoc / raw-string content because the actual text is parse-time-dependent. Synthesised vertices with such kinds but no capturedliteral-valuewill emit empty. Quivers and similar synthesis pipelines should populateliteral-valueexplicitly. - Operator precedence in synthesised expressions. No precedence-driven parenthesisation pass exists. Parsed schemas preserve original parens via interstitial constraints, but synthesised
binary_expressionschemas may emit ambiguously and re-parse to a different tree. A precedence pass on the schema before emit would close this gap; deferred.
The deeper limitation, structurally: tree-sitter’s parse-table state (the actual record of which CHOICE alternative the parser took at each step) is not exposed through the C API. The only ALIAS-disambiguation signal tree-sitter 0.25 / 0.26 surfaces is grammar_name() (the pre-alias SYMBOL name), which is consumed via pre-alias-symbol. A future upstream tree-sitter change exposing production / reduce IDs would let the emitter trace the parse exactly, eliminating the heuristic tiers entirely.
See also
- Decorate an abstract schema for the synthesise-then-render workflow.
- Layout enrichment for the categorical framing of the parse / emit relationship.
- Round-trip with format preservation for the structured-data codec (JSON / YAML / TOML / XML / CSV), which is byte-exact and orthogonal to this system.
- Parse full ASTs for the parser side.
- Reference: protocol catalogue for the list of supported languages.
Composing protocols by colimit
In plain terms
Every schema language has a lot in common with every other. They all have records, fields, references between records, ways to constrain what values a field can take, and ways to express that something is one of a fixed set of alternatives. Building each protocol’s schema model from scratch would mean reimplementing those shared pieces dozens of times.
panproto solves this by defining each shared piece as a small theory, then combining them. The theory of graphs (vertices and edges) is one piece. The theory of constraints (predicates on edge values) is another. The theory of multigraphs (graphs that allow multiple parallel edges between the same pair of vertices) is a third. Each protocol is built by gluing these pieces together at the points where they share structure.
The gluing operation is called a colimit. Conceptually, it takes several theories plus a description of how their shared parts identify, and produces a single combined theory whose vertices and edges are the union of the inputs, with the shared parts collapsed. The result is a new theory that has all the structure of the inputs and respects all their equations.
This is why adding a new protocol to panproto is mostly a matter of declaring which building-block theories it uses and how they fit together: the colimit construction does the rest.
The construction
A protocol’s schema theory is built as the colimit of a diagram of building-block theories. The diagram is a small category whose objects are theories and whose morphisms are theory inclusions describing the shared structure.
For example, the typed-multigraph-with-W-types theory used by JSON Schema is constructed as:
ThGraph + ThConstraint + ThMulti + ThWType
where each + is a pushout (a binary colimit) along the shared sort Vertex. The result has:
- The vertices, edges, and identity laws from
ThGraph. - The constraint sort and predicate operations from
ThConstraint. - The multi-edge labelling from
ThMulti. - The instance-level W-type structure (tree-shaped data anchored at schema vertices) from
ThWType.
If a colimit step fails, registration panics with a message naming the failing intermediate step (colimit ThGraph + ThConstraint over ThVertex failed: ...). This is intentional: a failed registration is a build-time bug in the theory composition, not user input.
Why colimits, specifically
The colimit construction has three properties that matter for panproto:
- Universality. The colimit is the minimal theory containing all the inputs and respecting their shared structure. No spurious extra equations are introduced.
- Existence checking. The construction is mechanical and can fail predictably. If two building blocks have incompatible equations on a shared sort, the colimit step fails at registration time.
- Functoriality of migration. Because protocols are colimits, a migration between two protocols can be defined componentwise on the building blocks; the colimit assembles the components into a single protocol-level migration.
The merge operation in schema version control is also a colimit (specifically a pushout); the same machinery powers both.
Reusable building blocks
The shared theory library lives in crates/panproto-protocols/src/theories.rs. The major pieces:
| Theory | Purpose |
|---|---|
ThGraph | Vertices and edges, with source and target operations. |
ThConstraint | Vertex-attached constraints (a dependent Constraint(v: Vertex) sort). |
ThMulti | Parallel edges distinguished by edge labels. |
ThWType | Recursive type constructors (W-types) at the instance level. |
ThMeta | Metadata edges on instance nodes. |
The library also exposes higher-level pieces built by composing these (ThSimpleGraph, ThHypergraph, ThInterface, ThFunctor, ThFlat, ThGraphInstance) for protocols that want to start from a richer base.
A protocol’s registration function is a recipe for combining these. To define a new protocol, see Build a custom protocol.
Related work
Cross-protocol translation has two mature precedents. The data-exchange line (Fagin, Kolaitis, Miller, and Popa on universal solutions and the chase (Fagin et al. 2005); Fagin, Kolaitis, Popa, and Tan on second-order tgds as a closed-under-composition class (Fagin et al. 2005)) sets the logical machinery: a universal solution is a colimit; composition needs a Skolemised second-order extension; cores pick out the canonical materialisation. The categorical version is the CQL line of Schultz & Wisnesky (2017) and Schultz et al. (2017), where a schema mapping is a functor, universal solutions are colimits in the category of instances, and data integration is a pushout. The engineering-IR line (Apache Calcite (Begoli et al. 2018), Substrait, Apache Arrow, MLIR) supplies the hub-and-spoke adapter pattern. panproto’s colimit and pushout_by_name run the same construction over GAT-presented wire-format schemas. See Related work for the full discussion.
See also
- Schemas as theories for what a single theory is.
- Pushouts and merge for the formal pushout construction and the universal property panproto verifies.
- Schema version control semantics for the use of pushouts in merge.
Schema version control semantics
In plain terms
panproto-vcs is git, but for schemas. It tracks a history of schemas the way git tracks a history of source files: commits, branches, tags, merges, diffs, blame. The CLI verbs are the same (init, add, commit, branch, merge, log, diff).
Two things make it different from git applied to the schema files themselves:
- The diff and merge operate on the schema, not the text.
schema diffdoes not show you a unified diff of the JSON; it shows you what changed structurally: which vertices were added, which edges renamed, which constraints tightened. Merge does not three-way-merge the bytes; it merges the schema graph at the structural level, so you cannot end up with a syntactically valid but semantically broken schema after a merge. - Data and migrations are versioned alongside the schemas. Every commit records a schema snapshot and (optionally) the data instances that conformed to it; migrations between schemas are stored as their own content-addressed objects, paired with the complements needed to invert them. Branches diverge with their data; merges reconcile both.
The merge operation is the place where this gets interesting. Three-way text merge fails when both sides edit the same line. The schema-level analogue is two branches that both add a field with the same name but different types. panproto-vcs has a precise, well-defined operation for resolving this: the schemas are pushed out along their common ancestor. The result is the smallest schema containing both branches’ additions, with the conflict surfaced as an explicit refinement constraint that the user resolves.
The DAG
panproto-vcs is structured exactly like git: a content-addressed DAG of immutable objects.
| Object | What it holds |
|---|---|
FileSchema / SchemaTree / FlatSchema | A schema at a point in time, in per-file, tree, or migration-endpoint form. |
Migration | A morphism between two schemas, identified by their object IDs. |
Complement | The complement data needed to invert a data migration. |
DataSet | A set of instances conforming to a specific schema. |
CstComplement | The format-preserving CST data for byte-identical reconstruction. |
Protocol / Theory / TheoryMorphism / Expr / EditLog | Supporting objects referenced by commits and migrations. |
Commit | A pointer to a schema, an optional pointer to data, a parent commit list, an author, a message. |
Tag | An annotated tag object pointing to another object. |
| Branch | A mutable reference to a commit; lives under .panproto/refs/heads/. |
Every object is content-addressed with a blake3 hash of its canonical serialisation. Refs (branches under refs/heads/, tags under refs/tags/) live under .panproto/refs/. Objects live under .panproto/objects/. The structural similarity to .git/ is intentional: the existing mental model transfers.
Merge as pushout
A three-way merge in git is: take base , ours , theirs , and produce a result that contains the changes from relative to and the changes from relative to . When the changes overlap on the same line, conflict.
The schema analogue: , , are schemas; and are both descendants of . The merge result is the pushout of and along :
B ------> O
| |
| |
v v
T ------> M
The pushout is the unique smallest schema containing both and and respecting their shared structure from . “Unique smallest” is made precise by a universal property: any other schema that also contains and admits a unique morphism from to .
panproto-vcs does not just compute the pushout: it verifies the universal property. vcs::merge::verify_pushout_universal checks that the merge result mediates uniquely from any alternative cocone, returning the mediator vertex map. If the universal-property check fails, the merge raises PushoutError::UniversalFactorizationFailure rather than producing a wrong result.
For the formal pushout construction, the cocone definition, and exactly what is checked, see Pushouts and merge.
Conflicts
A merge conflict arises when the pushout would introduce an inconsistency: two branches add a field with the same name but incompatible types, or one branch removes a vertex the other branch still references. Conflicts are reported as explicit objects (rather than text markers) and resolved by editing the conflict descriptor.
Data versioning
Commits can carry data instances. When a branch’s schema migrates, the data carried by its commits is automatically lifted forward by the migration’s lens. Branches can therefore diverge in both schema and data; merging both kinds of divergence in one operation is what schema merge does.
A consequence: history rewriting (rebase, amend) on a branch carrying data must lift the data through the rewritten history. panproto-vcs does this; the data is not a passive blob.
Related work
Two threads sit directly behind panproto-vcs. The categorical-VCS lineage (Mimram and Di Giusto on patches as morphisms with merge as pushout (Mimram and Giusto 2013), Angiuli and colleagues’ homotopical patch theory (Angiuli et al. 2014), Roundy’s Darcs (Roundy 2005)) supplies the “merge is the pushout of the divergent patches against the common ancestor” semantics and the diagnosis of conflicts as failures of the pushout to exist. The schema-evolution lineage (Curino, Moon, and Zaniolo’s PRISM workbench (Curino et al. 2008) and Litt, van Hardenberg, and Henry’s Cambria [Litt et al. (2020); Litt et al. (2021)]) supplies the engineering vocabulary: schema-modification operators with forward and backward mappings, quasi-inverses for the operators that lose information, and a directed graph of schema versions connected by lenses. panproto-vcs is the four-artifact unification of these lines, with the protocol theory, schema, data, and lens complement committed together into a single content-addressed DAG. See Related work for the full discussion.
See also
- Init and commit for the practical workflow.
- Branch and merge.
- Bridge to git for using panproto-vcs alongside git.
- Pushouts and merge for the formal model.
- What panproto verifies for the universal-property guarantee.
What panproto verifies
panproto’s correctness rests on a small set of properties that are mechanically checked. Some are verified at compile time (panic on failure during protocol registration); some are verified at runtime when the operation is invoked; some are verified by property-based tests in CI. This page is the catalogue.
If a property is in this list, the implementation enforces it. If you can construct a counterexample, that is a bug.
| Property | Where checked | Failure mode | Source |
|---|---|---|---|
| Protocol registration produces a valid theory | Compile-time (panic at registration) | Named intermediate colimit step in panic message | panproto-protocols/src/theories.rs |
| Schema validates against its protocol | Runtime | schema validate exits non-zero with the failing equation | panproto-schema |
| Migration existence conditions hold | Runtime, before any data is moved | schema check exits non-zero, naming the missing input | panproto-check |
| Migration type-checks at the GAT level | Runtime, on demand via --typecheck | schema check --typecheck exits non-zero, naming the offending sort or operation | panproto-mig |
Lens GetPut law: put(s, get(s), complement(s)) = s | CI property tests, every lens combinator | Property-test failure with shrunk counterexample | panproto-lens/src/laws.rs |
Lens PutGet law: get(put(s, v, c)) = v | CI property tests, every lens combinator | Property-test failure with shrunk counterexample | panproto-lens/src/laws.rs |
Lens PutPut law: put(put(s, v₁, c), v₂, c) = put(s, v₂, c) | CI property tests, every lens combinator | Property-test failure with shrunk counterexample | panproto-lens/src/laws.rs |
Source-code emit round-trips its grammar’s full corpus (emit(parse(emit(s))) == emit(s) plus vertex-kind and edge-shape multiset preservation, on every corpus entry, for every protocol in the verified set) | CI test (emit_corpus_audit) | Test panic naming the protocol and first divergent corpus entry | panproto-parse/tests/emit_corpus_audit.rs |
| Complement composition compatibility | Runtime, on Complement::compose | LensError::ComplementFingerprintMismatch | panproto-lens/src/asymmetric.rs |
| Complement composition agreement | Runtime, on Complement::compose | LensError::ComplementConflict (with offending key) | panproto-lens/src/asymmetric.rs |
| Protolens composition: structural equality of the intermediate endofunctor | Runtime, on vertical_compose | LensError::CompositionMismatch | panproto-lens/src/protolens.rs |
| Pushout cocone commutativity | Runtime, on every colimit construction | Returned as part of ColimitResult | panproto-gat/src/colimit.rs |
| Pushout universal property: every alternative cocone factors uniquely through the pushout | Runtime, on demand via verify_universal | EquationNotPreserved | panproto-gat/src/colimit.rs |
| Schema merge universal property | Runtime, on vcs::merge::verify_pushout_universal | PushoutError::UniversalFactorizationFailure | panproto-vcs/src/merge.rs |
| Expression evaluation totality (within step budget) | Runtime, on every evaluation | ExprError::StepLimitExceeded | panproto-expr/src/eval.rs |
| Expression arithmetic overflow check | Runtime, on every arithmetic op | ExprError::Overflow | panproto-expr/src/builtin.rs |
| Expression division by zero check | Runtime, on Div/Mod | ExprError::DivisionByZero | panproto-expr/src/builtin.rs |
Source-code emit coverage
The source-code emitter (emit_pretty) is verified against the full upstream test/corpus/ of 255 of the 261 vendored tree-sitter grammars: every corpus entry round-trips under the strict oracle described in the row above. The remaining six are blocked upstream, not by the emitter:
comment,todotxt,wolframmodel their content as opaque free-text spans, so the grammar gives the emitter no structure to reconstruct and the captured text is dropped on emit (a corruption the char-multiset detector flags).lessis compiled against an older tree-sitter ABI than the 0.26 runtime loads, so its parser yields only error nodes; there is nothing to round-trip until the grammar is re-vendored.movehas nolet-binding production in the vendored grammar, so real source already parses to an error tree on the way in; this is a parse-layer defect, not an emit one.testparses tree-sitter’s own corpus format, whose===and---delimiters collide with the corpus reader, so it cannot be exercised inside the harness.
The six are the irreducible residual under the current grammars and runtime; closing any of them needs an upstream grammar fix, an ABI re-vendor, or a harness change rather than emitter work.
What is not verified
The following properties are not mechanically checked and should not be assumed:
- Performance characteristics. The implementation does not guarantee any particular complexity bound on lens composition, colimit construction, or migration application.
- Round-trip stability of value-level transforms across data with information loss. A migration that drops a field cannot round-trip the dropped data; the lens laws apply only to the surviving structure.
- Equivalence of two protocols with isomorphic theories but different parsers. Two protocols whose theories are the same up to isomorphism are still distinct from panproto’s perspective.
- Application-level invariants not expressible in the schema theory. “Email addresses must contain
@” is checked only if the schema actually carries a constraint expressing it.
See also
- Schema version control semantics for the merge case.
- Lenses and round-trip laws for the lens case.
- Migrations as morphisms for the migration case.
- Pushouts and merge for the universal-property statement.
Architecture
panproto is a layered Rust workspace. The bottom layer is the GAT engine; everything else builds on it. The top layer is the surfaces (CLI, WASM, PyO3, C ABI) that downstream users actually call. Each layer depends only on layers below it; the dependency graph is acyclic.
Layering
graph TD
subgraph "Surfaces"
CLI[panproto-cli<br/>schema binary]
WASM[panproto-wasm]
PY[panproto-py]
C[panproto-c]
REPL[panproto-repl]
end
subgraph "Facade"
CORE[panproto-core]
end
subgraph "Higher operations"
VCS[panproto-vcs]
GIT[panproto-git]
XRPC[panproto-xrpc]
GITREMOTE[panproto-git-remote]
PROJ[panproto-project]
CHECK[panproto-check]
end
subgraph "Codegen"
LLVM[panproto-llvm]
JIT[panproto-jit]
end
subgraph "Pipeline"
MIG[panproto-mig]
LENS[panproto-lens]
LENSDSL[panproto-lens-dsl]
IO[panproto-io]
PARSE[panproto-parse]
GRAMMARS[panproto-grammars<br/>+ grammars-{all,web,data,jvm,<br/>scripting,systems,functional,<br/>devops,mobile,music}]
end
subgraph "Protocols and theories"
PROTOS[panproto-protocols]
THEORYDSL[panproto-theory-dsl]
EXPR[panproto-expr]
EXPRPARSER[panproto-expr-parser]
end
subgraph "Foundation"
GAT[panproto-gat]
GATMACROS[panproto-gat-macros]
SCHEMA[panproto-schema]
INST[panproto-inst]
end
CLI --> CORE
WASM --> CORE
PY --> CORE
C --> CORE
REPL --> CORE
CORE --> VCS
CORE --> GIT
CORE --> PROJ
CORE --> CHECK
CORE --> MIG
CORE --> LENS
CORE --> IO
CORE --> PARSE
CORE --> LLVM
CORE --> JIT
VCS --> MIG
GIT --> VCS
XRPC --> VCS
XRPC --> SCHEMA
GITREMOTE --> VCS
GITREMOTE --> GIT
GITREMOTE --> XRPC
PROJ --> SCHEMA
CHECK --> MIG
MIG --> LENS
MIG --> SCHEMA
LENS --> SCHEMA
LENSDSL --> LENS
IO --> INST
PARSE --> INST
PARSE --> LENS
PARSE --> GRAMMARS
LLVM --> PROTOS
LLVM --> SCHEMA
JIT --> EXPR
LENS --> EXPR
MIG --> EXPR
EXPRPARSER --> EXPR
PROTOS --> SCHEMA
PROTOS --> INST
THEORYDSL --> GAT
GATMACROS --> GAT
SCHEMA --> GAT
INST --> GAT
EXPR --> GAT
The diagram shows the direction of dependency, not every individual edge. The full set is in the workspace Cargo.toml. The ten panproto-grammars-* pack crates are grouped together in the diagram: each is a leaf re-exporting a subset of the tree-sitter grammars under feature flags, consumed transitively by panproto-parse.
One arrow worth pointing out: panproto-parse depends on panproto-lens, not the other way around. The two crates meet through the enrichment_registry module in panproto-lens, a thin trait-and-registry pair the lens crate exposes for downstream crates to populate. panproto-parse installs an adapter for every parser it accepts so that protolens machinery in panproto-lens can dispatch grammar-driven enrichment synthesis without depending on tree-sitter. The mechanism is documented in Layout enrichment; the registry pattern keeps the lens crate grammar-agnostic and the dependency direction acyclic.
The boundaries
Three places in the system are boundary layers in the sense that they convert between panproto’s internal representation and an external one. They are deliberately thin and concentrated in single crates so they can be audited independently.
WASM boundary
panproto-wasm is the only crate that knows about wasm-bindgen. It exposes panproto-core to JavaScript through a slab of opaque integer handles, with MessagePack encoding for data crossing the boundary. The TypeScript SDK (@panproto/core) is a thin wrapper over the WASM exports; it adds the fluent builder ergonomics but no logic.
Python boundary
panproto-py is the only crate that knows about PyO3. It exposes panproto-core to Python with native bindings (no WASM). The Python wheel ships eleven core tree-sitter grammars; the rest are in companion panproto-grammars-* packs.
C boundary
panproto-c is the only crate that knows about C ABI. It exposes a stable C interface used by the Haskell binding (and any other non-Rust language). CBOR is the over-the-boundary format here.
The generated CLI reference
The schema binary’s --help text is the source of truth for the CLI surface. The reference/cli page is regenerated by an xtask (see xtask/src/bin/gen-cli-docs.rs) and CI fails if the page is out of date. This is the only generated file in the docs site.
Versioning
All workspace crates are versioned in lockstep at the workspace level (workspace.package.version). A release of panproto-core 0.46.0 implies every panproto-* crate at 0.46.0. The TypeScript SDK and Python SDK follow the same version. See Release process for cadence.
See also
- Crate map for one-line descriptions of every crate.
- What panproto verifies for the per-crate properties checked.
- Composing protocols by colimit for the GAT-engine consumer pattern.
Related work
In plain terms
panproto sits at the intersection of five older research lines: schemas as algebraic theories, bidirectional lenses, format-preserving parsing, structured version control, and cross-protocol data exchange. Each of those lines has its own canonical papers, its own working systems, and its own vocabulary. This chapter walks each one in turn, names the works that shaped panproto’s design, and says where panproto’s contribution differs.
The chapter is a capstone. Each individual idea is treated in depth in the topical explanation chapters; the goal here is the relationship between panproto and the literature, in one place.
1. Schemas as theories
The idea that a database schema is an algebraic object, and that data conforming to a schema is a model of that object, is mature. The line panproto follows most closely begins with Cartmell’s generalised algebraic theories in the 1970s and continues through David Spivak’s functorial data model in the 2010s, the CQL line developed by Schultz and Wisnesky, the ACSets framework built by Patterson, Lynch, and Fairbanks, and a recent thread of multi-model unification work led by Jiaheng Lu. panproto’s panproto-gat and panproto-protocols crates are an instance of this lineage rather than a departure from it.
Cartmell (1986) introduced generalised algebraic theories in a thesis (1978) and the closely related 1986 paper in the Annals of Pure and Applied Logic. GATs extend many-sorted equational logic with dependent sorts: the sort of a term may depend on the value of another term, so a theory can express constructions like “an edge with vertex parameters” without leaving the algebraic-theory framework. This is the formal apparatus panproto uses; the comment block at the top of crates/panproto-gat/src/lib.rs cites Cartmell directly. The Sort, Operation, Equation, and Theory types in panproto-gat cover Cartmell’s four ingredient classes (sorts/types, operator symbols, axioms, theories). The feature that makes GATs generalised rather than ordinary many-sorted equational logic is dependent sorts, the sort of a term may itself depend on terms; the type-equality / term-equality axiom split (which Cartmell treats as two separate axiom kinds) is a consequence of dependent sorts, since type-equality becomes nontrivial in their presence. panproto collapses both axiom kinds into one Equation constructor with a discriminating tag. GATs sit one rung above many-sorted equational logic and one rung below full dependent type theory, with enough expressive power to handle dependent record-style schemas without inheriting decidability problems from the higher rung.
Spivak (2012) is the headline categorical-database paper. A schema is a finitely-presented category, an instance is a Set-valued functor on it, and a translation between schemas is a functor of categories. The contribution is the equivalence between schemas-as-categories and the category of small categories: every theorem about small categories transfers to schemas. From a translation , Spivak constructs three migration functors. The pullback transports D-instances back to C-instances contravariantly; the right pushforward performs joins; the left pushforward performs Skolemized unions. The triple is an adjoint chain. panproto’s migrate operation in panproto-mig::lift and the three Σ/Δ/Π modes of CQL-style migration are direct descendants of these constructions, although panproto works with GAT presentations rather than free categories with path equivalences.
Schultz & Wisnesky (2017) is the more directly load-bearing precedent for panproto’s design. Where Spivak treats schemas as categories, Schultz and Wisnesky treat schemas as multi-sorted equational theories extending a fixed type-side: sorts are entities, function symbols are foreign keys and attributes, equations are integrity constraints. An instance is itself a theory that extends the schema by adjoining 0-ary constant symbols and ground equations; its denotation is the term model, which is initial among models of the extended theory. A schema mapping is a morphism of theories that is the identity on the type side. The same Σ⊣Δ⊣Π adjoint triple appears, but now derived from substitution along a theory morphism rather than precomposition with a functor. The TAC version of the algebraic-databases programme (Schultz et al. 2017) develops the mathematical theory at length; the JFP article is the implementer-facing presentation. Schultz and Wisnesky also push the design pattern further than the older paper does, using pushouts of schemas and instances as a global-as-view design pattern for data integration, with the integrated schema literally the pushout of source schemas. This is the construction that panproto’s Composing protocols by colimit imports wholesale and turns into the colimit and pushout_by_name functions in panproto-gat::colimit. The choice to make schemas theories rather than categories, with the type-side fixed by a meta-theory, is also panproto’s choice; the ThWType, ThGraph, ThConstraint, ThMulti, and ThMeta building blocks in crates/panproto-protocols/src/theories.rs play the role Schultz and Wisnesky’s type-side plays.
Patterson et al. (2022) sit in the same family but optimise for a different setting. An attributed C-set is a functor from a fixed finitely-presented category together with attribute data attached via a slice over a chosen C-set. The paper unifies graphs, data frames, Petri nets, and wiring diagrams in one framework and gives a Julia implementation in Catlab.jl that is performance-competitive with hand-written domain-specific data structures. Limits and colimits are computed pointwise, which makes pullback, pushout, and limit-based queries cheap. panproto and the ACSets work agree about the goal (one categorical framework, many concrete data shapes) but differ in mechanics: Patterson, Lynch, and Fairbanks define a schema as a small category equipped with a partition map separating combinatorial sorts from attribute-type sorts, then build the attributed C-set category as a slice in the C-set topos (Theorem 2 of the paper) and prove functoriality in both the schema and the attribute-type structure . panproto fixes a different meta-theory (the category of GAT presentations rather than the category of finite-product categories) and takes colimits there. The two approaches are most easily compared as instances of the same idea (a categorical schema with a parameterised instance), with ACSets giving an unusually clean implementation story for scientific computing and panproto giving an unusually direct path to cross-protocol schema engineering.
Lu (2025) is, as of 2025, the most direct adjacent claim of “categorical framework that covers several data models at once.” Lu unifies relational, XML, and property-graph data via a thin set category: a thin category with three object kinds (entity, attribute, relationship), where pullback realises the join operator and is also used to characterise multivalued dependencies, pushout realises the connected-component construction of graph data, and two reduced representations (1RR and 2RR) recover Boyce-Codd/XML normal form and fourth normal form respectively. The work is competitive with panproto on the unification axis but covers a different set of formats (relational, XML, and property graph rather than the wire-format protocols panproto targets) and stays close to the relational-DB tradition with its emphasis on normal forms and dependency closures. The honest framing for panproto is that the unification ambition is now common [Lu (2025); Patterson et al. (2022); Schultz & Wisnesky (2017)]; what panproto adds is coverage of the schema languages developers actually ship against (Avro, OpenAPI, ATProto, FHIR, the eighteen linguistic-annotation formats listed in crates/panproto-protocols/src/) and a colimit-of-presentations methodology rather than a fixed meta-theory.
Two older lines deserve named mentions even though they do not directly drive panproto’s code. The theory-colimit construction itself is older than the institutions framework: Burstall and Goguen introduced “putting theories together” via pushouts in (1977) (IJCAI 1977), developed the construction as the semantics of the Clear specification language (Burstall and Goguen 1980), and later abstracted it institution-theoretically (Goguen and Burstall 1992), with the modern textbook treatment in Sannella & Tarlecki (2012). An institution packages signatures, sentences, models, and a satisfaction relation; theory presentations in any institution form a category with colimits, computable in the signature category, and pushouts of theories then merge specifications. Concrete institutions have been worked out for equational logic, conditional equational logic, Horn-clause logic, first-order logic, and (more recently) dependent type theories; a GAT institution has not been published under that name to our knowledge, though organising GAT presentations as an institution is straightforward (signatures = GAT signatures, sentences = type and term equations, models = standard GAT models, satisfaction = the usual semantic relation). panproto’s pushout_by_name is the Burstall & Goguen (1977) move at the GAT level: pushout of GAT presentations along a shared sub-presentation. The relationship is that panproto is one instance of the theory-colimit construction, organising GAT presentations of wire-format schema languages into a category and pushing them out there. Spivak and Kent’s (2012) olog paper is the categorical-knowledge-representation cousin. An olog is a finite-limit, finite-colimit sketch with English-labelled boxes and arrows, intended as a user-facing way of authoring categorical schemas. The intuition that a schema in any of these formalisms is a category-shaped object with labels in natural language is the olog intuition; the difference is that panproto enforces the labels through the GAT type system and uses tree-sitter for the layout layer rather than freehand authoring.
Against this background, panproto’s specific contribution comes from three moves. First, presenting each industrial schema language as its own GAT and taking a colimit in the category of GAT presentations to obtain a polyglot schema theory carries the Burstall & Goguen (1977) theory-colimit construction (later abstracted institution-theoretically in Goguen & Burstall (1992) and given textbook treatment in Sannella & Tarlecki (2012)) over to wire formats. Spivak’s (2012) framework, the Schultz & Wisnesky (2017) line (CQL), the ACSets work of Patterson et al. (2022) (Catlab.jl), and Lu (2025) each fix a single meta-theory (finitely-presented categories, multi-sorted equational logic, attributed C-sets, thin set categories) and vary schemas inside it; panproto varies the theory presentation and composes presentations. Second, panproto’s crates/panproto-protocols/src/atproto.rs is the first formal categorical treatment of ATProto Lexicon. Third, the combination of GAT-presented schemas with format-preserving CST complements (Section 3) and four-artifact pushout merge (Section 4) runs the categorical machinery against the byte-level representation that the precedents do not address.
2. Bidirectional lenses and the round-trip laws
Lenses are pairs of functions and satisfying laws that make updating the small piece and propagating back to behave coherently. The modern form is from Foster et al. (2007), and the family has since grown to cover relational lenses (Bohannon et al. 2006), symmetric and edit lenses [Hofmann et al. (2011); Hofmann et al. (2012)], profunctor optics [Pickering et al. (2017); Clarke et al. (2024)], and generic-deriving libraries that automatically dispatch optic kind (Lens, Prism, Traversal) from a structural classification of focus. panproto’s panproto-lens crate, and especially its Protolens type, sit inside this family.
Foster et al. (2007) set the asymmetric template: a lens is a pair between source and view (partial functions and ), with totality, well-behavedness (GetPut, PutGet), and the stronger PutPut law for the “very well-behaved” subclass. Creation of a fresh source from a view alone is handled by extending the universe with a missing-element marker so that put a Ω defines the creation case. The Foster paper does not stop at the algebra: it builds a combinator library (id, composition, xfork, filter, prune, focus, rename, map, wmap, copy, conditional combinators). Bohannon and colleagues’ (2006) relational lenses then port the discipline from trees to relations. Their select, join_dl, and drop primitives have typing rules that require the schema’s functional dependencies to be in tree form (the attributes appearing in the FDs split into pairwise disjoint sets whose dependency graph is a forest), so that an internal record revision operation (used to define the putback of each combinator) can re-write tuples to restore the FDs after a view update. The combination gives a small calculus in which projection-, join-, and union-views admit lawful updaters, which is essentially what panproto’s ThConstraint building block in crates/panproto-protocols/src/theories.rs captures at the schema-language level: foreign-key constraints become equations in the schema theory, and any lens that respects those equations propagates them.
The closest direct precedent for “optic kind determined by edge type” is the profunctor-optics work that licenses it. Pickering and colleagues (2017) represent each optic as a polymorphic function , where the constraint class on the profunctor selects the optic kind: gives lenses (because first lifts a transformer on the component to a transformer on the pair), gives prisms (because left lifts to the sum), and gives traversals. Profunctor optics compose by ordinary function composition; concrete representations do not. The paper’s Figure 5 is a diamond of inclusions (Adapter into Lens and into Prism, Lens and Prism into Traversal) that drops out of which constraints are conjoined. The categorical update by Clarke et al. (2024) unifies the framework further: an optic is the coend , varying the two monoidal actions and recovers fifteen optic families in a single table (adapter, lens, monoidal lens, algebraic lens, monadic lens, linear lens, prism, coalgebraic prism, grate, glass, affine traversal, traversal, kaleidoscope, setter, fold), and the construction extends to mixed and -enriched settings. panproto’s OpticKind enumeration in crates/panproto-lens/src/optic.rs and the prop / variant / item edge classification in panproto-gat::sort implement the dispatch as a runtime classifier over a cross-protocol IR; the contribution is not the dispatch itself but that the indexing object is panproto’s schema graph rather than a single host-language ADT or a single-protocol schema.
Hofmann et al. (2011) introduced the symmetric variant of the lens laws. A symmetric lens carries a complement object that captures information about not present in , plus a pair that propagates in both directions through the complement. The symmetric category has a symmetric monoidal tensor product but Theorem 5.1 of the paper proves it has neither categorical products nor categorical sums. Hofmann et al. (2012) pushes the framework further by replacing whole-state transformations with edits: each object is a module pairing a set with a monoid of edits and a partial monoid action, and a lens is a pair of stateful monoid homomorphisms threading a complement set under a consistency relation . Theorem 7.1 of the edit-lens paper proves a one-to-one correspondence between equivalence classes of edit lenses over canonical “differ” modules (whose edit monoid is overwrite-only) and the state-based symmetric lenses of Hofmann et al. (2011). This is what licenses panproto treating the complement as a first-class versioned artifact: an edit-lens-style complement is exactly the thing the schematic VCS commits alongside data and schema. Pre-flight check Complement::is_compatible, the Object::Complement and Object::CstComplement variants in panproto-vcs::object, and the partial-monoid merge rule on the complement (see Lenses and round-trip laws) all sit on this foundation.
Litt and colleagues’ (2020) Cambria essay is the engineering precedent for “schema migration as a graph of composable lenses” and was directly influential on the panproto migration story. The PaPoC 2021 paper that grew out of the essay (Schema Evolution in Distributed Systems with Edit Lenses) makes the connection to Hofmann et al. (2012) explicit: a schema migration is an edit lens, replays of past edits compose through the lens, and divergent branches reconcile through the symmetric structure. panproto’s panproto-mig::compose function composes morphisms in essentially the Cambria way; the difference is that the migration is decided in the GAT category rather than over JSON-schema lenses alone, and the four-artifact bundle (protocol theory, schema, instance data, lens complement) is versioned together in panproto-vcs.
Pacheco & Cunha (2011) give the algebraic counterpart: an equational calculus over point-free lenses in which the standard point-free laws (id, -assoc, -functor, -fusion, distl-nat, -nat) lift to the bidirectional setting, and bidirectional fold and unfold satisfy uniqueness so that fold-fusion can be applied directly at the lens level. They show that the lens category is not as well-behaved as (no categorical products, sums, or exponentials, only the corresponding bi-functors and projections-with-defaults) and mechanise the calculus in the pointless-lenses and pointless-rewrite Haskell libraries. Pacheco et al. (2012) carries the same idea over to delta lenses, where the change-propagation account replaces whole-state updates. panproto’s symbolic simplification of ProtolensChain in crates/panproto-lens/src/protolens.rs extends the same move: where the Pacheco line operates on XML schemas and inductive data types in Haskell, panproto’s chains span protocol theories joined by coercion morphisms across the 50-protocol catalogue.
The categorical foundations of lenses have been deepened considerably since Foster et al. (2007). Diskin and colleagues’ (2011) delta-lens line gives the change-propagation account that panproto’s data-migration story implicitly relies on. The dependent-optics and categorical-cybernetics line (Riley’s Categories of Optics; Spivak’s Generalized Lens Categories via Functors; Capucci, Gavranović, Hedges, and Rischel on cybernetic systems; Niu and Spivak on polynomial functors) generalises the framework to indexed and parameterised optic families and to dependent types. The contrast with the recent dependent-optics literature is that panproto ships a runtime for prop/variant/item dispatch over a cross-protocol IR, applying the existing categorical machinery to the 50-protocol catalogue.
3. Format-preserving parse and emit
The byte-equal round-trip law has been the goal of a long line of lossless-CST work: rust-analyzer’s rowan, Microsoft Roslyn, Instagram’s LibCST, the Prettier and Biome formatter trees, and a research line around Spoofax and Stratego. Each of these targets one language. Two threads from the bidirectional-transformations literature give the more general design: lens-based config editors (Augeas, Boomerang) and grammar-based bidirectional pretty-printers (BiYacc and its cousins). panproto’s ParseEmitLens (in crates/panproto-parse/src/parse_emit_lens.rs) is positioned between the two.
Lutterkort (2008) is the closest analogue to panproto’s approach at a framework level. Augeas treats every config file as an XPath-like tree backed by the file bytes, and ships hand-written lenses (in a small ML-style DSL) that map file ↔ tree. The lens primitives (key, label, seq, store, del) act on regular-expression matches; the combinators (subtree, concatenation, union, iteration) compose them with a Boomerang-derived typechecker that enforces unambiguous concatenation, union, and iteration over regular languages. Augeas’s grammar formalism is restricted to regular languages (the paper’s Section 6 lists removing this restriction as future work, since it precludes context-free formats like httpd.conf), so it handles /etc/hosts, /etc/passwd, and the like, but not nested-context formats. panproto’s ParseEmitLens reuses the same lens factorisation but lifts the parser layer to tree-sitter (or one of the structured format codecs in crates/panproto-io), so the regular-language ceiling is gone.
The Boomerang line is the algebraic core that Augeas builds on. Foster et al. (2007) give the original combinators for bidirectional tree transformations; Bohannon et al. (2008) adds resourceful lenses with key and match primitives so that the put direction can align reorderable chunks by key rather than by position. They prove that resourceful lenses are quasi-oblivious: a lens is quasi-oblivious with respect to an equivalence if (the EquivPut law), which is the condition that lets the put direction respect reorderings as if they were equalities. Foster et al. (2008) then generalises to quotient lenses: lenses well-behaved modulo programmer-specified equivalences and , with canonizers (canonize / choose) inserted anywhere in a chain via lquot and rquot operators. The quotient-lens typechecker uses a coarse Identity / Any approximation that scales to the 4200-line Boomerang lens for the UniProtKB/SwissProt protein-sequence format reported in Section 8 of the quotient-lens paper. panproto’s complement carries the same role as the dictionary skeleton in resourceful lenses and the canonizer state in quotient lenses: a sidecar that records what get discarded so that put can put it back in a position-and-key-preserving way. The ParseEmitLens does not yet expose lquot/rquot to users (the canonizer story for parse/emit is implicit in the layout-policy machinery), but the schema-level constraint vocabulary already carries the equivalence relations the canonizers operate on.
The grammar-based thread is the other ancestor. Zhu et al. (2015)’s BiYacc shows that a single DSL program can denote both a parser and a reflective printer: a putback that takes both the abstract syntax tree and the original concrete string and produces an updated concrete string that preserves syntactic characteristics (extra parentheses, whitespace, encoded negations) of the original. BiYacc programs satisfy GetPut and PutGet, and compile to BiFluX for execution. panproto does not compile from a bidirectional grammar; it inherits the grammar from tree-sitter and supplies the bidirectional layer separately, but the contract is the same one BiYacc proves: parse ∘ emit is the identity on the canonical schema, and emit ∘ parse preserves layout where the lens complement records it. Brunsfeld (2018) is the underlying parser-generator citation; the panproto contribution is the lens factorisation on top of 261 grammars, not the parser technology itself.
Two adjacent threads belong in the same neighbourhood. de & Visser (2012) give a token-stream-and-origin-tracking algorithm for layout preservation under AST refactorings: each AST node points into the token stream, the ConstructText reconstruction algorithm (abbreviated CT in proofs) reuses original text for unchanged subterms and pretty-prints only newly constructed ones, and a DIFF over the new list against origin tags handles inserted and deleted children. They prove correctness () and a maximal-layout-preservation theorem, and round out the system with heuristic comment-binding patterns (Preceding(1), Preceding(2), Succeeding(1..3)). Their Section 7.5 notes that the lens framework loses the fine-grained relation between subterms and text fragments needed for layout preservation, and singles this out as the obstacle the reflective-printer trick in BiYacc responds to. panproto sits closer to the de Jonge–Visser side: the complement records exactly the byte-position and interstitial information the AST omits, and emit_pretty reuses it when it is present and falls back to a layout policy when it is not. Pombrio & Krishnamurthi (2014) takes the same lens framing in a different domain: their CONFECTION system desugars and resugars syntactic-sugar pattern rules, Section 6.1 frames the desugar/unexpand pair explicitly as a lens, and Theorem 1 in Section 6.1.2 proves PutGet iff a disjointness condition on the rule list holds. The resugaring properties are Emulation and Abstraction (Theorems 3, 4, 5) plus Coverage (informal; the paper explicitly says Coverage is not formalised and can only be strived for in practice), and they translate directly to what panproto’s layout-enrichment fibration is doing one level up: stripping a constraint sort is the desugaring step, and re-attaching it via a registered synthesis driver is the resugaring step.
4. Schema versioning as structured merge
Two distinct precedents shape panproto’s VCS layer. The categorical-VCS line owns the merge-as-pushout semantics. Mimram & Giusto (2013) gave the formal account; Pijul is the working CLI implementation; Angiuli and colleagues’ (2014) homotopical patch theory recasts the same construction inside homotopy type theory. In all three, files (or patches between files) are objects and morphisms in a category, and a three-way merge is the pushout of the patches against the common ancestor. Conflicts are failures of the pushout to be defined, or of the universal property to factor. Roundy’s (2005) Darcs is the engineering ancestor that motivated this thread.
Mimram & Giusto (2013) fix the simplest interesting case: a file is a function from a length- index set into an alphabet of lines, and a patch is an injective increasing partial function on the index sets that preserves labels wherever it is defined (undefined positions are deletions; positions in the codomain outside the image are insertions). The category of files and patches is the free monoidal category on insertion generators and deletion generators subject to (deleting an inserted line is identity); the subcategory of total injective patches is the free monoidal category on the insertion generators alone. Concurrent edits give a span , and the natural definition of “merging” is the pushout of this span. The central observation is that is not finitely cocomplete: the patches that insert different lines at the same position have no pushout in , which is the categorical face of a textual conflict. Theorem 15 of the paper describes the free finite conservative cocompletion of the insertion-only subcategory as the category of finite sets equipped with a transitive relation and morphisms that are functions respecting the relation; the labelled and partial-function generalisation (Theorem 20, p. 12) adds labels and replaces functions with partial functions, giving the full cocompletion of . Pushouts in are computed as disjoint unions quotiented by the span equivalence and equipped with the transitive closure of the inherited relations; cycles can arise during cocompletion (Example 18 of the paper exhibits one explicitly), which is how the framework records “the user must pick between two orderings of a conflicting line.” The single-file insertion-only construction generalises to insertion-and-deletion (Theorem 20, partial functions instead of total ones) and is left in the paper as a target for richer base categories (multi-file, XML trees, event-structure repositories). panproto’s merge.rs does not implement the partial-function version literally, but the move it makes is the same one: detect that the naive pushout fails in the schema-theory category , cocomplete by adjoining a conflict descriptor that records the divergent additions as parallel arrows, and resolve by choosing a section.
Angiuli, Morehouse, Licata, and Harper (2014) make the same construction but inside homotopy type theory, where the answers come out cleaner because the groupoid laws are free. A patch theory in their setting is a higher inductive type whose points are repository contexts, whose 1-paths are patches, and whose 2-paths are patch laws. Identity, inverse, and composition come from the path operations refl, !, and ∘; the groupoid laws (associativity, units, inverses cancel) are automatic because the identity type is already a -groupoid. A model of a patch theory is then a function out of the HIT into a universe of sets and bijections, and functorial semantics makes the model preserve patch laws automatically: every program that consumes the HIT must respect the equations between paths. The paper’s three running examples (an integer counter that is literally the circle; a fixed-length document with a swap-lines patch and an indep / noop law saying edits to independent lines commute; an add/remove patch theory indexed by histories of compositions) culminate in a merge operation that takes a span of diverging patches and returns a cospan completing it into a commuting square (Section 3.2). Their merge is more flexible than the Mimram–Di Giusto pushout in one direction (it accepts arbitrary divergent patches, with a “merge undoes both” fallback when nothing better exists) and more restrictive in another (it requires inverses to live inside the theory, which the authors flag in Section 7 as “conceptually questionable and practically problematic”). Their workaround is to index contexts by patch histories and define merge over History rather than over the underlying file (Section 6.4), which is structurally what panproto-vcs already does when it commits the lens complement alongside the schema: the complement carries the history information that a pure file-level patch category cannot. The homotopical framing also makes one of panproto’s design choices explicit. The Mimram–Di Giusto allows “cyclic” objects to represent conflicts; the Angiuli et al. patch-with-laws example uses a 2-path to equate the two orderings rather than equipping the merge result with a cyclic order. panproto sits between the two: a vertex-set conflict is reified as a conflict object the user resolves, but a permutation of independent additions does not produce a new object because the underlying GAT theory already quotients by the appropriate equations.
The schema-and-data VCS line owns the engineering of “a Merkle DAG of commits with schema + data inside each commit.” TerminusDB, Dolt, and the Noms prolly-tree lineage are working examples; Merkle (1988) is the hash-tree primitive they all rely on, and the operational git tradition that Chacon & Straub (2014) document is the immediate engineering precedent. None of these systems frames merge categorically; they treat schema conflicts operationally (cell-by-cell, table-by-table) and have no notion of a lens complement attached to a commit.
Curino, Moon, and Zaniolo’s PRISM workbench (2008) is the load-bearing precedent at the operator level. PRISM defines eleven Schema Modification Operators (SMOs): CREATE TABLE, DROP TABLE, RENAME TABLE, COPY TABLE, MERGE TABLE, PARTITION TABLE, DECOMPOSE TABLE, JOIN TABLE, ADD COLUMN, DROP COLUMN, RENAME COLUMN (plus NOP), each translated to a pair of Disjunctive Embedded Dependencies (DEDs): forward DEDs that describe the migration of data from the old to the new schema, and backward DEDs that describe the reverse. The DED pairs feed a chase-and-backchase query rewriter (Deutsch and Tannen’s MARS engine) so that legacy queries written against an old schema version are automatically translated to run against the new one. Section 4.2 of the paper lists six operators with a perfect unique inverse (RENAME COLUMN, RENAME TABLE, PARTITION TABLE, CREATE TABLE, ADD COLUMN, plus NOP) and Table 3 marks JOIN TABLE and DECOMPOSE TABLE as perfect-inverse-only under additional conditions (“yes/no”); the remaining operators (DROP TABLE, COPY TABLE, MERGE TABLE, DROP COLUMN) require a quasi-inverse in the sense of Fagin, Kolaitis, Popa, and Tan: the inverse is not the strict mapping inverse but the “best one can do to recover ground instances.” Figure 3 places each operator on a redundancy / information-preservation grid that tells the database administrator what they are getting up front. PRISM was stress-tested against the full 170-version, 4.5-year schema-evolution history of MediaWiki/Wikipedia and reported 97.2% of evolution steps fully automated; on the 66 most common query templates from the Wikipedia query log, 74% of queries (worst-case across the version timeline) were automatically supported, compared to 84% under manual rewriting. A separate 27-query subset of those templates was used for the runtime-performance comparison in Figure 6 of the paper. panproto inherits two diagnostic ideas directly. First, the PRISM observation that schema change without a record-level lift story is incomplete survives intact: panproto’s panproto-mig::existence check is the equivalent gate, refusing to certify a migration whose data side is not constructible. Second, the quasi-inverse / perfect-inverse partition is the precedent for the CoercionClass lattice (Iso / Retraction / Projection / Opaque) in crates/panproto-gat/src/sort.rs: each class is a precise statement about which kind of inverse a lens carries, the same diagnostic move PRISM makes at the SQL-operator level. Where panproto goes elsewhere is the embedding into a Merkle DAG with categorical merge: PRISM treats each schema-evolution step in isolation and rewrites queries through a chain of DEDs; panproto commits each step into a content-addressed DAG and merges divergent branches by pushout, with the SMO-style information-preservation report as a side output rather than the central artifact.
Litt and colleagues’ (2020) Project Cambria essay, and the PaPoC 2021 paper it grew into (Litt et al. 2021), are the immediate lens-based precedent. Cambria’s data model is a directed graph in which nodes are schema versions and edges are bidirectional lenses; a document tagged with its writing schema is translated to a reading schema by composing the lenses along the shortest path through the graph. The essay credits Hofmann, Pierce, and Wagner’s edit lenses (Hofmann et al. 2012) as the direct inspiration; the PaPoC paper extends the construction with the framing that JSON-patch operations are the edits a Cambria lens propagates. The system was integrated with the Automerge CRDT to demonstrate read-time, per-replica translation in a peer-to-peer setting. panproto’s panproto-vcs::merge builds on the same idea but takes a step the Cambria graph does not: the schema-version graph is promoted from a navigation aid (find a path of lenses) to a content-addressed DAG with pushout-based three-way merge, and the lens complement, the schema, the protocol theory, and the data instances are committed together as one Merkle object rather than living in separate artifacts. The Cambria authors’ own list of open problems (recursive schemas, cross-document migrations, data augmentation, lens repositories) reads as a roadmap that panproto’s GAT-and-CST machinery is set up to answer, although none of those answers should be claimed without further work.
panproto’s contribution is the four-artifact unification of these two lines. A panproto-vcs::CommitObject (in crates/panproto-vcs/src/object.rs) carries a protocol theory ID, schema ID, data set IDs, lens complement IDs, and CST complement IDs together. Three-way structural merge (crates/panproto-vcs/src/merge.rs) detects overlap via pullback and promotes it to pushout in the underlying theory category, with verify_pushout_universal checking the real universal property at the vertex level. Vertex-level factorisation is verified; edge-level factorisation is deferred. No prior system places protocol theory, schema, instance data, and lens complement into one DAG with pushout-based merge semantics; the combination is what panproto-vcs contributes on top of the two precedents.
A second contrast worth recording is with Shapiro and colleagues’ (2011) conflict-free replicated data types. CRDTs and structured-merge VCS solve nearby problems by very different means: CRDTs guarantee strong eventual convergence by restricting the algebra of allowed operations; panproto-vcs allows arbitrary schema operations and decides merge by category-theoretic universal property. The two are complementary rather than competing; cf. Shapiro et al. (2011) for the canonical statement of the CRDT side, and the Cambria integration with Automerge (Litt et al. 2021) for one way the two layers can compose.
5. Cross-protocol translation and data exchange
Translation between schema languages has two well-developed precedents: the data-exchange tradition that runs from Fagin, Kolaitis, Miller, and Popa’s ICDT 2003 paper through Fagin, Kolaitis, Popa, and Tan and the Clio project, and the engineering-IR tradition that runs from Apache Calcite through Substrait, Apache Arrow, MLIR, and ONNX. panproto sits between them: it inherits the universal-solution and second-order-tgd machinery from the data-exchange line and the hub-and-spoke adapter architecture from the IR line, and turns both into one pipeline over GAT-presented schemas.
Fagin et al. (2005) establish the core of a universal solution as the canonical materialised target instance. A data-exchange setting is a quadruple of source schema, target schema, source-to-target tuple-generating dependencies (tgds) , and target tgds and equality-generating dependencies (egds). A universal solution for a source instance is a target instance from which every other solution receives a homomorphism; universal solutions exist iff solutions exist, and one can be computed by the chase when is weakly acyclic (Theorem 2.6, originally from the 2003 paper). Universal solutions are not unique up to isomorphism: Example 3.1 exhibits an infinite family of nonisomorphic universal solutions of different sizes for one source instance. The paper picks out the core of (the smallest substructure that is also a homomorphic image of ) as the unique smallest universal solution up to isomorphism and proves that it is preserved under chase steps for tgds and egds (Proposition 3.4). Computing the core of an arbitrary structure is intractable: Theorem 4.2 shows CORE IDENTIFICATION is DP-complete, even on undirected graphs. Inside data exchange the picture is cleaner: Theorem 5.2 (greedy algorithm) gives a polynomial-time algorithm for computing the core when is a set of tgds and is a set of egds, Theorem 5.9 (blocks algorithm, ) extends to the case where the source is unavailable (relevant when Clio has materialised a universal solution and discarded the source), and Theorem 5.15 (Algorithm 5.14) carries the blocks algorithm to the egd case. The core is also the right answer for certain-answer semantics: Proposition 6.4 proves that for existential queries , equals the certain answers on universal solutions, and the core is the unique universal solution (up to isomorphism) with this property for every conjunctive query with inequalities . The CoercionClass lattice (Iso / Retraction / Projection / Opaque) in crates/panproto-gat/src/sort.rs is the Fagin-style invariant transplanted to the schema-presentation level: each class records what kind of homomorphism a cross-protocol coercion carries, which directly determines whether the lifted instance is universal, sub-universal, or projection-only at the target.
Fagin et al. (2005) answer the next question: what is the right language for composing schema mappings? A composition takes the source-to-target relation defined by two successive mappings and asks whether it can be defined by a single mapping in the same logic. Proposition 3.4 of the paper exhibits a composition definable by an infinite set of source-to-target tgds but by no finite set; Proposition 3.5 strengthens this to a composition undefinable by any (finite or infinite) set of source-to-target tgds; Theorem 3.6 sharpens further with an explicit construction whose composition query is NP-complete (by reduction from 3-COLORABILITY) and is not definable in , hence not in least-fixed-point logic. The paper introduces second-order tgds (SO tgds): existential second-order formulas with Skolem function symbols and equalities between terms, satisfying a safety condition that makes them domain-independent. Theorem 4.6 proves that SO tgds are closed under composition with an explicit composition algorithm (introduce a second-order existential for each existential first-order in ; substitute into ; collect implications). Section 5 extends the chase to SO tgds and proves the universal-solution / polynomial-time-certain-answers results carry through (Proposition 5.2, Corollary 5.3). The takeaway for panproto: the right object to compose across protocols is not the underlying schema mapping but a Skolemised second-order generalisation of it, with explicit function symbols standing for the choices the existential quantifiers make. panproto’s ProtolensChain in crates/panproto-lens/src/protolens.rs carries exactly this structure: the lens chain is a sequence of mappings with explicit complement objects standing in for the Skolem choices, composed by the chain combinator with the same closure property the SO tgd composition theorem provides.
The categorical version of the same construction is in the CQL line. Schultz & Wisnesky (2017) (CQL; previously called AQL) presents a schema mapping as a functor between schemas-as-theories, defines the adjoint triple of data migration functors, computes universal solutions as colimits in the category of instances, and gives a uniform pushout-based design pattern for data integration. Schultz et al. (2017) is the long-form mathematical treatment. The CQL implementation is the closest existing analogue of what panproto’s cross-protocol pipeline does, and the construction panproto uses for panproto-gat::colimit and pushout_by_name is structurally the same one CQL implements for relational schemas; panproto carries the construction to GAT-presented wire-format schemas and to instance data that lives natively in those formats (JSON, CBOR, Protobuf, Avro, FHIR) rather than relational tuples.
Apache Calcite (Begoli et al. 2018) is the engineering precedent for hub-and-spoke query-language IRs. Calcite’s architecture is a modular query optimiser (rules, metadata providers, two planner engines: a cost-based Volcano-style dynamic-programming engine and an exhaustive rule-application engine), a relational-algebra core, and an adapter architecture in which each backend (Cassandra, Druid, Elasticsearch, MongoDB, Pig, Spark, Splunk, JDBC dialects) supplies a SchemaFactory, a Schema, and a calling convention trait that the optimiser uses to push down operators into the backend’s native query language. The paper reports Calcite is embedded in Apache Drill, Hive, Solr, Phoenix, Kylin, Apex, Flink, Samza, Storm, MapD, Lingual, and Qubole Quark (Table 1), with several hundred optimisation rules in the rule set. panproto’s protocol-colimit machinery is the same hub-and-spoke shape applied to schemas rather than queries: each protocol crate in crates/panproto-protocols/ plays the role of a Calcite adapter, the universal protocol-graph plays the role of Calcite’s logical relational algebra, and panproto-mig::hom_search is the analogue of Calcite’s rule-application engine. The substantive difference is that Calcite’s IR is SQL-typed relational algebra (with full column types including complex ARRAY, MAP, and MULTISET constructors), augmented with pluggable traits that record physical properties such as ordering, partitioning, and the calling convention used to drive the optimiser, and the adapter contracts are written by hand; panproto’s IR is a GAT-presented schema theory, and the adapter contracts are derived by colimit from the per-protocol theory presentations. Substrait and MLIR are the same hub-and-spoke pattern in other domains: Substrait defines a cross-engine query-plan IR with producers and consumers across DuckDB, Velox, Apache Arrow, and others; MLIR provides a dialect framework for compiler IRs spanning machine learning, parallel computation, and hardware synthesis. Apache Arrow plays the role of the in-memory columnar data layout that several of these systems use as the physical exchange format.
panproto’s cross-protocol pipeline is an instance of the data-exchange/IR construction with one substantive addition: the schema, the instance data, and the lens complement are all transported together along the chain. The pipeline composes pushouts in panproto-gat::colimit, hom-search migrations in panproto-mig::hom_search, and coverage reports in panproto-mig::coverage; the CoercionClass lattice records translation loss at the schema level the way Fagin’s universal-solution / core distinction records it at the instance level; the four-artifact bundle (protocol theory, schema, instance, complement) carries the data exchange forward in a single step that a Calcite adapter or a Substrait plan cannot represent because they have no notion of a complement object.
panproto ships 50 protocols today, drawn from wire formats (Protobuf, Avro, Thrift, FlatBuffers, Cap’n Proto), JSON family (JSON Schema, OpenAPI, AsyncAPI, CloudEvents, JSON-LD, GeoJSON), graph and RDF (RDF/Turtle, SPARQL, GraphQL), columnar (Parquet, Arrow, ORC), document and configuration (XML, YAML, TOML, CUE, Dhall, Nickel), domain-specific (FHIR, ATProto Lexicon, GTFS, OpenAPI Routes), and linguistic-annotation (CoNLL-U, Universal Dependencies, PropBank, FrameNet, TimeML, etc.). Each is presented as a GAT inside crates/panproto-protocols/, and the polyglot schema theory is the colimit of those presentations in . The compositional construction (pushout of two presentations derived migration coverage report) is what Calcite gives you for SQL dialects and what CQL gives you for relational schemas. Doing it across 50 industrial wire-format schema languages with one categorical engine is, as an engineering artifact, what panproto’s cross-protocol layer adds to the literature.
What panproto adds
The five sections above place panproto in five mature research lines. Across those lines, panproto delivers the following:
- A polyglot schema theory constructed as a colimit in the category of GAT presentations, with each of 50 industrial wire-format schema languages (Protobuf, Avro, Thrift, FlatBuffers, Cap’n Proto, JSON Schema, OpenAPI, AsyncAPI, CloudEvents, JSON-LD, GeoJSON, GraphQL, SPARQL, RDF/Turtle, Parquet, Arrow, ORC, XML, YAML, TOML, CUE, Dhall, Nickel, FHIR, ATProto Lexicon, GTFS, and the linguistic-annotation family) presented as its own GAT and composed with the rest. This is the Goguen & Burstall (1992) theory-colimit move transplanted from logics to wire formats, run at industrial scale rather than as a single worked example.
- A universal CST factorisation across 261 tree-sitter grammars plus the major data-format codecs, with the formatting residue committed as an explicit lens complement. No prior system carries layout as a first-class versioned artifact across this breadth of languages.
- A four-artifact pushout merge over a content-addressed DAG: protocol theory, schema, instance data, and lens complement are merged together in one operation that verifies the categorical universal property at the vertex level. The Mimram–Di Giusto, Pijul, and homotopical-patch-theory line gives the pushout-as-merge semantics for files; the PRISM line gives schema-modification operators; the Cambria line gives schema-version graphs with edit lenses; panproto unifies the three with shared content-addressing and shared merge semantics.
- The first formal categorical semantics for ATProto Lexicon, produced as one entry in the protocol catalogue rather than as a standalone study.
- An end-to-end pipeline (parse → decorate → lens → migrate → emit) that runs the categorical machinery against real bytes: GAT colimits for protocol composition, hom-search for migration derivation, profunctor-optics dispatch for runtime lens application, CST complements for layout preservation, and pushout merge for version control. Each stage is one named precedent from the literature; running them as one composable engine against a 50-protocol catalogue is the engineering contribution.
The topical explanation chapters elsewhere in this book work out each stage in detail.
Denotational semantics
This cluster pins panproto’s three DSLs and two structural constructions to a precise mathematical specification. Each page opens with an “In plain terms” section and then proceeds through a six-step skeleton:
- Surface syntax. BNF for what a user types.
- Abstract syntax. The Rust enum the parser produces.
- Semantic domain. The mathematical universe the syntax interprets into.
- Interpretation function. A semantic function defined by structural recursion on the abstract syntax, in the Scott-Strachey idiom. Typing is presented separately as an inductive relation ; evaluation is not a relation.
- Soundness. Statement of what the implementation guarantees, and which property tests or runtime checks enforce it.
- What is intentionally not modelled. The boundary of the formal account.
The pages in turn:
| Page | What it pins down |
|---|---|
| Shared notation | The judgement forms, environments, and meta-notation used across the others. Read this first. |
| Expression language | panproto-expr: terms, types, total evaluation under a step budget. |
| Lens DSL | panproto-lens-dsl: the lens triple (get, put, complement), the three round-trip laws, complement composition as a partial commutative monoid. |
| Theory DSL | panproto-theory-dsl: GAT presentations, sort/operation/equation judgements, the colimit interpretation. |
| Pushouts and merge | The pushout construction in the category of GATs, the universal property, and what the implementation verifies. |
| Protolens composition | Protolenses as natural transformations between schema endofunctors, the structural-equality criterion for composition, sequential vs fused instantiation. |
| REPL command language | panproto-repl: state model, command interpretation, and the bare-term typecheck path. |
The cluster is meant to be read by anyone who wants to know exactly what panproto guarantees. Familiarity with category theory helps but is not required: every page restates its formal content in plain terms before invoking it.
Shared notation
This page defines the meta-notation reused across the other denotational-semantics pages. Read this first.
In plain terms
We use two complementary notations:
- Inference rules, of the form , define a relation (typically a typing relation ) inductively as the smallest relation closed under the rules.
- Semantic equations, of the form , define a function by structural recursion on the syntax. The double brackets emphasise that we are speaking of meaning, not surface text.
The DSL pages in this cluster present the type system as inference rules and the value semantics as semantic equations: the standard denotational idiom, in the sense of Scott and Strachey. Where evaluation can fail to terminate, we lift the value domain to so that is total.
A context (also called an environment) is a finite map from names to data. Typing contexts map variable names to types; evaluation contexts map variable names to values; theory contexts map sort names to their definitions.
Environments
Three environment kinds appear repeatedly:
- : a typing environment mapping variable names to types. Notation: extends with a fresh binding .
- : a value environment mapping variable names to values. Notation: extends with .
- : a theory context mapping sort names to their definitions in a GAT. Used in the theory DSL and in the pushout construction.
A judgement of the form asserts that under typing context , the expression has type . We do not use a separate evaluation judgement in this cluster: evaluation is presented as the semantic function rather than as a relation.
Inference rules
An inference rule has the form
Each premise and the conclusion are judgements. The rule asserts that whenever the premises hold, the conclusion follows. A derivation is a tree of rule applications whose leaves are axioms (rules with no premises) and whose root is the judgement being proved.
Semantic functions
The semantic function for a syntactic category is written where is the semantic domain. The subscript is omitted when context determines the category.
Common semantic functions:
- : the set of values of type .
- : the meaning of expression under value environment , where adjoins a bottom element to handle non-terminating or budget-exceeded computations.
- : the meaning of a lens as a triple of functions.
- : the category-with-families generated by a GAT presentation.
The bottom element
In the expression language, evaluation is total within a step budget but the budget can be exceeded. We model this by adjoining a bottom element to the value domain: . Reaching corresponds to ExprError::StepLimitExceeded at runtime.
Equality
Equality between values, types, and theories is definitional unless otherwise stated. Equality between schemas is structural up to alpha-renaming of bound names. Equality between morphisms is judgemental: when and have the same source, the same target, and agree on every input.
See also
- Expression language for the first user of these conventions.
- The other pages in this cluster all build on these conventions.
Expression language: denotational semantics
In plain terms
The expression language panproto-expr is what you use to describe field-level transforms inside a migration (“the new full_name field is the old first plus a space plus the old last”) and predicates inside a query (“only records where created_at > '2024-01-01'”). Everything in the language is pure: no IO, no mutation, no clock reads, no random numbers. Two expressions that look the same always do the same thing. Every expression terminates within a fixed number of evaluation steps.
This page describes what those properties mean and what the evaluator actually computes.
Surface syntax
The Haskell-style surface, parsed by panproto-expr-parser:
expr ::= literal
| ident
| expr expr -- application
| "\\" ident "->" expr -- lambda
| "let" ident "=" expr "in" expr
| "if" expr "then" expr "else" expr
| "case" expr "of" alts -- pattern match
| builtin "(" expr,... ")" -- builtin application
| expr "." ident -- field access
| expr "[" expr "]" -- index
literal ::= int | float | str | bool | "null"
| "[" expr,... "]" -- list
| "{" ident ":" expr,... "}" -- record
The parser desugars if c then a else b to a Match over a boolean scrutinee with two arms, and case e of ... to a Match directly. There is no separate If or Case node in the abstract syntax.
Abstract syntax
pub enum Expr {
Var(Arc<str>),
Lam(Arc<str>, Box<Self>),
App(Box<Self>, Box<Self>),
Lit(Literal),
Record(Vec<(Arc<str>, Self)>),
List(Vec<Self>),
Field(Box<Self>, Arc<str>),
Index(Box<Self>, Box<Self>),
Match { scrutinee: Box<Self>, arms: Vec<(Pattern, Self)> },
Let { name: Arc<str>, value: Box<Self>, body: Box<Self> },
Builtin(BuiltinOp, Vec<Self>),
}
Match covers both if/then/else and case/of; pattern matching is the only branching primitive. Each arm pairs a Pattern (whose variants include Wildcard, Var, Lit, Record, List, and Constructor) with a body expression. The full enum lives at crates/panproto-expr/src/expr.rs.
Type system
The type-formation grammar:
A typing context is a finite map from variable names to types. The typing relation is defined inductively by the usual rules; selected:
Builtin signatures have type schemes given in reference/expression-language; each Builtin(op, \overline{e}) rule plugs in ’s scheme and checks that the arguments match.
Semantic domain
Let be the recursive sum
interpreting Null as the singleton and the function space as partial continuous maps. Lift to to adjoin a bottom for divergence under the step budget. Environments live in , and ranked environments in to track the remaining step budget.
Semantic function
The denotational semantics is the family
defined by structural recursion on . Write and for the budget decrement. The equations:
The budget rule fires before any equation if the remaining steps are zero; otherwise the relevant equation applies and the recursive sub-denotations are evaluated with the budget decremented. Operationally this is EvalState::tick in crates/panproto-expr/src/eval.rs; when is returned the implementation surfaces ExprError::StepLimitExceeded(max_steps).
The auxiliary is the standard pattern-match search: try each in order, attempting to unify against the scrutinee value; on the first success bind the pattern variables into and evaluate ; on exhaustion raise NonExhaustiveMatch. The auxiliary is the partial function defined in crates/panproto-expr/src/builtin.rs.
Builtin side conditions
The builtins listed under reference/expression-language are individually total or partial. The partial ones return a non- error rather than :
Div,Modwith zero divisor:DivisionByZero.- Integer arithmetic overflow:
Overflow(i64::checked_*). *ToInt/*ToFloaton unparseable input:ParseError.- List index out of bounds:
IndexOutOfBounds; list operations past the configured maximum:ListLengthExceeded. - Record access on a missing field:
FieldNotFound. Matchexhaustion:NonExhaustiveMatch.Appof a non-function value:NotAFunction.
Errors are distinct from . models resource exhaustion (StepLimitExceeded and DepthExceeded); errors model defined failure and propagate as Err(ExprError) from the implementation.
Soundness
The evaluator satisfies:
- Type preservation. If and and with , then .
- Totality (within the budget). For every well-typed and well-typed , terminates in finitely many steps with either a value or .
- Determinism. If and then .
Type preservation is enforced by the type-checker in crates/panproto-expr/src/typecheck.rs and by builtin signatures rejecting ill-typed arguments. Totality follows from the budget rule. Determinism follows from the absence of mutation and IO.
What is intentionally not modelled
- Performance. Two expressions can have the same denotation but very different cost. The semantics fixes only what is computed, not how much it costs.
- Step-budget tuning. The budget is a parameter set at the outermost call. The semantics treats it as fixed; the language itself does not expose it.
- Floating-point determinism across architectures.
Floatoperations follow IEEE 754, but bit-level reproducibility across hardware is not guaranteed.
See also
- Reference: expression-language for the builtin catalogue.
- How-to: apply field transforms for usage.
- Lens DSL for how
panproto-exprappears inside lens specifications.
Lens DSL: denotational semantics
In plain terms
A lens DSL spec is a recipe for building a bidirectional transform between two schemas. You declare which fields map to which (with optional value-level expressions), and the compiler produces a triple of functions that satisfies three round-trip laws by construction. This page pins down what the spec compiles to and what “satisfies the laws” means.
Surface syntax
The Nickel surface (canonical authoring form). JSON and YAML surfaces are isomorphic via serde.
{
id = "user.v3-to-v4",
description = "Rename `name` and replace `age` with `years`",
source = "dev.example.user.v3",
target = "dev.example.user.v4",
steps = [
{ rename_field = { old = "name", new = "display_name" } },
{ remove_field = "age" },
{ add_field = { name = "years", kind = "integer", default = 0, expr = "old.age" } },
],
}
Each step is a single-key object whose key selects the variant. The full step grammar is in crates/panproto-lens-dsl/src/document.rs.
Abstract syntax
pub struct LensDocument {
pub id: String,
pub description: String,
pub source: String,
pub target: String,
// Body: exactly one of the four variants is present.
pub steps: Option<Vec<Step>>,
pub rules: Option<Vec<Rule>>,
pub compose: Option<ComposeSpec>,
pub auto: Option<AutoSpec>,
// Rule-variant metadata.
pub passthrough: Option<Passthrough>,
pub invertible: Option<bool>,
// Protocol-specific extension metadata.
pub extensions: HashMap<String, serde_json::Value>,
}
pub enum Step {
// High-level field combinators
RemoveField { remove_field: String },
RenameField { rename_field: RenameSpec },
AddField { add_field: AddFieldSpec },
// Value-level transforms
ApplyExpr { apply_expr: ApplyExprSpec },
ComputeField { compute_field: ComputeFieldSpec },
// Structural combinators
HoistField { hoist_field: HoistSpec },
NestField { nest_field: NestSpec },
Scoped { scoped: ScopedSpec },
Pullback { pullback: PullbackSpec },
// Sort-level coercions and merges
CoerceSort { coerce_sort: CoerceSortSpec },
MergeSorts { merge_sorts: MergeSortsSpec },
// Elementary theory operations
AddSort { add_sort: AddSortSpec },
DropSort { drop_sort: String },
RenameSort { rename_sort: RenameSpec },
AddOp { add_op: AddOpSpec },
DropOp { drop_op: String },
RenameOp { rename_op: RenameSpec },
AddEquation { add_equation: EquationSpec },
DropEquation { drop_equation: String },
}
The top-level type is LensDocument, not LensSpec. The document carries source and target NSID fields naming the two schemas, and exactly one body variant (steps, rules, compose, or auto). The resolver loads the source schema and the compiler applies the body to it, then checks that the result matches the named target. See panproto_lens_dsl::compile.
Semantic domain
For schemas , and complement type , the set of lenses on is
with elements written as triples . The implementation in panproto-lens represents this triple by the Lens struct; the categorical model is the standard asymmetric-lens construction of Foster et al. (2007) with complements after Litt et al. (2020). The semantic function
takes a document and a source schema and returns a concrete lens. (The target schema and the complement type are determined by the document and the source.)
The three laws
A lens is lawful iff for all and :
panproto_lens::laws::check_get_put, check_put_get, and check_put_put are property-test runners that sample , , , from the schema’s value space and assert each equation.
Semantic equations
For each step constructor, is a function , where a is a schema-parameterised lens (see Protolens composition). The document-level semantics is the left-to-right composition of the per-step semantics, applied to the source schema:
where and . The composition operator is the sequential lens composition
For a representative subset of steps:
where is the trivial complement (a singleton) and on the right-hand side of is the expression-language semantics. The remaining 15 step constructors follow the same pattern; the implementation of each lives under crates/panproto-lens-dsl/src/steps.
Composition between adjacent step semantics is gated at construction time by protolens_composable: see Protolens composition. Compilation in the implementation is handled by panproto_lens_dsl::compile, which produces a CompiledLens carrying the ProtolensChain corresponding to along with the value-level FieldTransforms extracted from any steps.
Complement composition
Sequential composition of lenses requires composing their complements. Complement::compose is a partial commutative monoid:
- Identity. The empty complement satisfies .
- Commutativity. When defined, .
- Associativity. When defined, .
- Partiality: is defined iff:
- and have the same source-schema fingerprint (otherwise
ComplementFingerprintMismatch); and - For every key in both, (otherwise
ComplementConflictwith the offending key).
- and have the same source-schema fingerprint (otherwise
Pre-flight predicate: Complement::is_compatible(c1, c2).
The fingerprint is a 64-bit hash of the source schema (computed with the standard library’s DefaultHasher in panproto_lens::asymmetric::schema_fingerprint), so complements computed against syntactically distinct but structurally equal schemas are still compatible.
Soundness
The compilation function preserves lawfulness: if every step compiles to a lawful lens (which the combinator algebra guarantees), the composed result is lawful. Property tests in crates/panproto-lens/src/laws.rs verify each combinator against random inputs sampled from the schema’s value space.
What is intentionally not modelled
- Lossy migrations as full lenses. A migration that drops information cannot satisfy GetPut. The DSL allows
DropField, but the resulting object is a partial lens; the laws hold only on the surviving structure, and CI tests skip the GetPut law for steps annotated as lossy. - Time complexity of
put. Some combinators have linearputcost in the size of the source; the semantics fixes the value, not the cost. - Equivalence of two distinct DSL specs that compile to the same lens. The DSL deliberately exposes step ordering even when steps commute; canonicalisation is left to the user.
See also
- Reference: lens combinators for the combinator algebra.
- How-to: write lenses in the lens DSL.
- Lenses and round-trip laws (plain-terms version).
- Protolens composition for schema-parameterised lenses.
- Foster et al. (2007) and Litt et al. (2020).
Theory DSL: denotational semantics
In plain terms
The theory DSL is what you use to declare a new protocol’s schema language. You write down the basic kinds of thing the protocol talks about (its sorts), the constructors that build them (its operations), and the equations they satisfy. The DSL compiles to a generalized algebraic theory (GAT) that the rest of panproto consumes uniformly.
This page pins down what a GAT presentation is, what category it generates, and what counts as a valid presentation.
Surface syntax
The Nickel surface (canonical authoring form). JSON and YAML surfaces are isomorphic via serde. Every document carries an id, a description, and exactly one body variant (theory, morphism, composition, protocol, bundle, class, instance, or inductive type).
A bare theory body:
{
id = "dev.example.thgraph",
description = "Directed multigraph with identity edges",
theory = "ThGraph",
sorts = [ { name = "Vertex" }, { name = "Edge" } ],
ops = [
{ name = "src", inputs = [{ name = "e", sort = "Edge" }], output = "Vertex" },
{ name = "tgt", inputs = [{ name = "e", sort = "Edge" }], output = "Vertex" },
{ name = "id", inputs = [{ name = "v", sort = "Vertex" }], output = "Edge" },
],
equations = [
{ name = "src-id", lhs = "src(id(v))", rhs = "v", context = [{ name = "v", sort = "Vertex" }] },
{ name = "tgt-id", lhs = "tgt(id(v))", rhs = "v", context = [{ name = "v", sort = "Vertex" }] },
],
}
The full grammar is in crates/panproto-theory-dsl/src/document.rs.
Abstract syntax
pub struct TheoryDocument {
pub id: String,
pub description: String,
pub body: TheoryBody,
}
pub enum TheoryBody {
Theory(TheorySpec),
Morphism(MorphismSpec),
Composition(CompositionBody),
Protocol(Box<ProtocolSpec>),
Bundle(Box<BundleSpec>),
Class(ClassSpec),
Instance(InstanceSpec),
Inductive(InductiveSpec),
}
pub struct TheorySpec {
pub theory: String, // theory name
pub extends: Vec<String>, // parent theories
pub imports: Vec<ImportSpec>, // imports with optional aliases
pub sorts: Vec<SortSpec>, // sort declarations (with dependent params)
pub ops: Vec<OpSpec>, // operation declarations
pub equations: Vec<EquationSpec>, // judgemental equalities
pub directed_equations: Vec<DirectedEqSpec>, // rewrite rules
pub policies: Vec<PolicySpec>, // conflict policies
}
The DSL document compiles to panproto_gat::Theory (and, for non-Theory bodies, to TheoryMorphism or Protocol). The intermediate surface types (TheorySpec, OpSpec, etc.) are deserialisation targets, not the categorical objects; the categorical objects are the GAT types.
Sort, operation, and equation judgements
A sort is well-formed in a theory context when it appears in the sort list:
An operation is well-typed when its inputs are well-typed sorts and its output is a well-typed sort dependent on the inputs:
An equation is well-formed when both sides type-check at the same sort under the same context:
Semantic domain
The semantic universe is , the (2-)category of categories with families (Cartmell-style: contexts as objects, substitutions as morphisms, types and terms forming a presheaf over substitutions). The denotational semantics is the functor
defined by case on the document body (with the bundle case mapping to a tuple of CwFs). For the bare theory case:
where is the initial CwF satisfying the sort, operation, and equation declarations of . Initiality is the semantic content of the GAT construction: for every CwF that interprets ’s sorts and operations and validates its equations, there is a unique structure-preserving functor . The GAT presentation framework is due to Cartmell (Cartmell (1986)), and the category-with-families packaging of dependent type theory is Dybjer’s (Dybjer (1996)); the two compose to give the initial-model semantics used here.
A schema in panproto is a model of in the CwF of finite sets and functions: equivalently, a CwF morphism .
Semantic equations for the body cases
where is the iterated pushout described under Pushouts and merge and the auxiliary class-to-theory and inductive-to-theory are the desugarings implemented in panproto-theory-dsl/src/compile_class.rs and compile_inductive.rs.
The interpretation of the composition body factors through the colimit construction in : , exploiting the fact that preserves the relevant colimits.
Soundness and registration
A protocol registration is the construction of a colimit diagram. If any pushout step in the diagram fails to satisfy the universal property (because two equations contradict on a shared sort), registration panics with a message naming the failing intermediate step:
panic: colimit ThGraph + ThConstraint over ThVertex failed:
equation `src(id(v)) = v` contradicts `src(id(v)) = source(v)`
This is a build-time bug in the theory composition; the panic is intentional. See crates/panproto-protocols/src/theories.rs.
What is intentionally not modelled
- Higher dimensions. The DSL is for 1-categorical GATs only. We do not model homotopical structure, 2-cells, or coherent equations.
- Infinitary signatures. Operations have finite arity. There is no way to declare an operation that takes an unbounded list of arguments.
- Term rewriting decidability. Equation orientation and confluence are the user’s responsibility; the colimit construction does not check for a complete rewriting system.
See also
- Pushouts and merge for the colimit construction and verified universal property.
- Composing protocols by colimit (plain-terms).
- Reference: protocol catalogue.
- How-to: build a custom protocol.
- Cartmell (1986) for the original GAT formulation.
Pushouts and merge
In plain terms
When two branches of a schema repository diverge from a common ancestor and you want to merge them, you need a precise rule for combining their changes. panproto’s rule is the pushout construction from category theory.
A pushout is the smallest object that contains two given objects and respects the way they share a common subobject. For schemas, “smallest” means containing exactly the union of the two branches’ changes, with their shared subschema identified rather than duplicated. The construction has a universal property: any other schema that also contains both branches admits a unique morphism from the pushout. That uniqueness is what makes the merge result the answer rather than an answer.
panproto-vcs does not just compute the pushout. It also verifies the universal property at runtime: it constructs an alternative cocone and checks that the unique mediator exists. If the check fails, the merge raises an error rather than producing a wrong result.
The categorical setup
Let be the category of GAT presentations and theory morphisms. A pushout in is defined as follows. Given:
- An object (the base or shared ancestor),
- Two morphisms and (the changes on each branch),
the pushout is an object together with morphisms and satisfying the cocone condition and the universal property: for any other object and morphisms , with , there exists a unique morphism such that and .
The pushout square:
The universal property says that for any alternative cocone , the mediator exists and is unique:
with and .
Construction
The pushout is constructed as the disjoint union of the sorts of and , quotiented by the equivalence relation that identifies with for every sort , with the operations and equations from and assembled accordingly.
The implementation is in crates/panproto-gat/src/colimit.rs. The result type is ColimitResult, which carries:
- The pushout object .
- The two inclusion morphisms , .
- A method
verify_universalto check the universal property against an alternative cocone.
Helper accessors merge_mediator_assignments and pushout_by_name expose the inclusions for downstream use.
The verified universal property
ColimitResult::verify_universal takes an alternative cocone and computes the unique mediator . It then checks that and . If either equation fails, the check returns EquationNotPreserved carrying the offending sort or operation.
In the VCS layer, vcs::merge::verify_pushout_universal runs the check against a constructed alternative cocone derived from the merge candidates. Failure raises PushoutError::UniversalFactorizationFailure and the merge does not produce a result.
This is the new behaviour introduced in a7fb636 (the correctness pass). Previously, panproto-vcs computed the pushout but did not verify it. The merge could produce an object satisfying cocone commutativity (the pushout square commutes) without satisfying the universal property (without being the minimal such object). Cocone commutativity is necessary for a pushout but not sufficient. Verifying the universal property is what makes the result the pushout rather than a cocone.
What this guarantees
- Determinism. Two repositories with the same base and the same branch changes produce the same merge result, up to isomorphism.
- Minimality. No spurious sorts, operations, or equations are introduced.
- Safety. A failing universal-property check raises an error rather than silently producing a wrong merge.
What is intentionally not modelled
- Conflict resolution policy. When the pushout would introduce a contradiction (two branches add fields with the same name but incompatible types), panproto raises a conflict object for the user to resolve. The resolution policy is up to the user; the pushout construction does not invent compromises.
- Three-way merges with non-pushout common ancestors. If the branches’ divergence cannot be expressed as a span (for instance, if the ancestor was rewritten by a rebase), the merge falls back to an interactive resolution; the formal pushout is not defined.
- Merge time complexity. The construction is polynomial in the size of the inputs but no specific bound is guaranteed.
See also
- Schema version control semantics (plain terms).
- Composing protocols by colimit for the same construction applied at protocol-registration time.
- What panproto verifies for the catalogue of universal-property checks.
- Theory DSL for the source category of the construction.
Protolens composition
In plain terms
A protolens is a lens recipe parameterised over a schema rather than a fixed pair of schemas. Where a lens lives between two specific schemas and , a protolens is a rule that says “for any schema satisfying some precondition, here is a lens from to a derived schema .” Applying a protolens to a fleet of related schemas is one operation; you do not write one lens per schema.
Composing protolenses requires more care than composing plain lenses. Two protolenses can be glued together end-to-end only when the schema produced by the first is precisely the schema consumed by the second. “Precisely” here is structural: the intermediate schemas have to agree as endofunctors on theories, not just on names. This page pins down that condition.
Semantic domain
A protolens is a natural transformation between schema endofunctors. Concretely:
where and are functors on the category of schemas , and for each schema , is a lens
satisfying the lens laws (see Lens DSL).
The naturality condition is: for every schema morphism , the square
commutes: . Applying the protolens then transporting along gives the same result as transporting then applying the protolens.
Composition
Two protolenses and compose vertically into pointwise, by :
The composition is well-defined only when the intermediate functor of matches the source functor of . In panproto-lens, this match is checked by protolens_composable:
pub fn protolens_composable(eta: &Protolens, theta: &Protolens) -> bool {
matches!(theta.source.transform, TheoryTransform::Identity)
|| theory_endofunctor_equiv(&eta.target, &theta.source)
}
A Protolens carries its source and target TheoryEndofunctors as public fields. The equality theory_endofunctor_equiv is structural: the endofunctors agree iff their preconditions and their transforms agree, ignoring the human-readable name field. A trivial special case: when the source of is the identity functor (i.e., is the identity), the match is automatic regardless of ’s target.
vertical_compose enforces the check at construction time and returns LensError::CompositionMismatch on failure, naming the offending intermediate functor. This catches an entire class of bug at the type-construction stage rather than at instantiation time.
Sequential vs fused instantiation
When a chain of composable protolenses is applied to a base schema , two instantiation strategies are exposed:
- Fused instantiation (
instantiate): construct the composed protolens first, then apply it to once. Produces a single morphism with the migration metadata preserved as one chain. - Sequential instantiation (
instantiate_sequential): apply to to get , apply to to get , and so on. Produces a list of morphisms.
Both satisfy the lens laws. Fused is the production default because it preserves migration metadata as a single object; sequential is exposed for property tests that need to inspect each intermediate schema.
Soundness
The composition operation preserves naturality: if and are natural transformations and protolens_composable(P, Q) holds, then is a natural transformation. The pointwise lens laws follow from the laws on each and .
The structural-equality check on intermediate functors is the necessary condition for the composite to be well-defined. Without it, vertical composition could be invoked with mismatched functors and the result would silently fail naturality on some schemas. The check makes such composition a runtime error at construction time rather than a semantic bug at application time.
What is intentionally not modelled
- Horizontal composition of protolenses. Naturality also supports horizontal composition (whiskering), but the implementation does not currently expose it. Adding it would require a notion of natural transformation between protolens-shaped functors, which is not yet defined in the codebase.
- Identity-sourced protolens equivalence beyond structural. Two protolenses that compute the same transform via different intermediate forms are treated as distinct.
- Performance characteristics of fused vs sequential. The choice is semantic-equivalent; fused is preferred for metadata reasons, not performance.
See also
- Lens DSL: denotational semantics for the underlying lens model and the three laws.
- Reference: lens combinators for the protolens module and entry points.
- How-to: use protolenses.
- What panproto verifies for the
CompositionMismatchruntime check.
REPL command language: denotational semantics
In plain terms
The REPL is the interactive surface for inspecting theories and terms: load a theory document, switch the active theory, ask for the type of a term, normalize a term under the directed equations, enumerate the free model. Every interactive command is a small operation on a stateful environment that holds the loaded theories, the loaded morphisms, and a pointer to the currently active theory.
This page pins down what each command does, what state it touches, and what the bare-term-typecheck path means.
Surface syntax
line ::= command | term
command ::= ":" cmd args
cmd ::= "load" | "theories" | "use" | "sorts" | "ops"
| "type" | "normalize" | "model" | "instance"
| "quit" | "q"
args ::= /* command-specific, see table below */
A line that does not begin with : is treated as a term and routed through the typecheck path under the active theory. Comments and blank lines are ignored. The grammar lives in crates/panproto-repl/src/lib.rs.
Abstract syntax
pub enum ReplCommand {
Load(PathBuf),
Theories,
Use(String),
Sorts,
Ops,
TypeOf(String),
Normalize(String),
Model(Option<u32>),
Instance { class: String, target: String, bindings: Vec<(String, String)> },
Quit,
}
pub enum ReplLine {
Command(ReplCommand),
Term(String),
}
(The actual implementation does not export this enum; commands are dispatched directly from string parsing in Repl::handle_command. The shape above is the denotational model.)
Semantic domain
The REPL state is
The semantic codomain is the set of user-visible effects (printed string, typed-term result, error, quit signal). The denotational semantics is the pair of functions
where is state-preserving: .
Semantic equations
Write for a state and for the obvious update. The equations:
The bare-term path:
When (no active theory), all -dependent equations short-circuit to . When any auxiliary (compile, typecheck_term, normalize, free_model, compile_instance) returns an error, the outcome is and the state is unchanged.
Repl::handle_command and Repl::handle_term_typecheck in crates/panproto-repl/src/lib.rs implement these equations pointwise.
Soundness
The REPL is a thin orchestration layer over the GAT engine and the theory DSL compiler. It introduces no new failure modes; every error it reports comes from one of:
panproto_theory_dsl::LoadError(during:loador:instance)panproto_gat::GatError(during:type,:normalize,:model)- A REPL-level
UnknownCommand/UnknownTheoryfor command-shape errors
State updates are atomic per line: a failed compile in :load rolls back the partial insertions before returning the error, so the post-state is either fully updated or unchanged.
The bare-term path is total in the technical sense: every input string either parses and produces a TypeOf or Error outcome; the REPL does not deadlock or spin.
What is intentionally not modelled
- Multi-line input. Commands and terms are single-line. Continuations are the user’s responsibility (concatenate into a single line before submission).
- Macro expansion. There is no
:defineor:macro; the REPL is a pure inspection interface, not a programming environment. - Concurrent state. A
Replinstance is single-threaded. The shape above does not model concurrent line submission. - Persistent history. The REPL relies on
rustylinefor history; the persistence model isrustyline’s, not panproto’s.
See also
- Theory DSL: denotational semantics for what
:loadconsumes. - Reference: CLI for the
schema replinvocation that wraps this. - Crate map for
panproto-repl.
Glossary
Each entry gives the formal definition first and a one-sentence intuition second.
Abstract schema
A Schema whose constraint set contains no sort in the layout enrichment fibre: no start-byte, end-byte, interstitial-N, chose-alt-fingerprint, or chose-alt-child-kinds. The Rust newtype is panproto_schema::AbstractSchema. Intuition: the schema you would build by hand with SchemaBuilder before any parser or decorate has attached layout data.
Decorated schema
A Schema carrying a complete layout enrichment fibre. The Rust newtype is panproto_schema::DecoratedSchema. Constructed by ParserRegistry::parse_with_protocol, by ParserRegistry::decorate, or by explicit wrapping via DecoratedSchema::wrap_unchecked. Intuition: the schema you get back from parsing source code, with every byte position and inter-token whitespace recorded.
Decorate
The function (AbstractSchema, LayoutPolicy) → DecoratedSchema that attaches a layout fibre to an abstract schema by running emit_pretty_with_policy to produce canonical bytes and re-parsing those bytes. The result satisfies forget_layout(decorate(a, p)) ≅ a up to vertex-id renaming and kind / edge multiset equivalence. Intuition: the section of the schema-level forgetful U; the put-direction of the parse / emit lens at the schema level.
Forget layout
The function Schema → Schema (or DecoratedSchema → AbstractSchema in typed form) that drops every constraint whose sort belongs to the layout enrichment fibre. Implemented as Schema::forget_layout, Schema::forget_layout_in_place, and DecoratedSchema::forget_layout. Intuition: the schema-level forgetful functor stripping parser-only metadata to leave the abstract content.
Layout enrichment / Layout fibre
The family of constraint sorts (start-byte, end-byte, interstitial-N for any N, chose-alt-fingerprint, chose-alt-child-kinds) that attach byte-position and parser-discriminator data to vertices of a parsed schema. Classified by panproto_gat::EnrichmentKind::Layout and identified by the panproto_gat::is_layout_sort predicate. Intuition: the parser-only metadata the emitter needs to render bytes back; everything parse adds that SchemaBuilder does not produce by hand.
Layout policy
The configuration object passed to decorate and pretty_with_protocol controlling whitespace, indentation, separators, newline conventions, and the line-break / indent-open / indent-close token sets that the put direction of the parse / emit lens uses. Aliased to panproto_parse::emit_pretty::FormatPolicy; the wire-serialisable projection is panproto_gat::LayoutPolicySpec. Intuition: the put-direction complement of the parse / emit lens, namely what whitespace and CHOICE-alternative defaults to apply when parsing is not there to dictate them.
Layout enricher
A trait implementation registered in panproto-lens::enrichment_registry that materialises a layout fibre on a schema. The one in-tree implementation, panproto_parse::decorate::ParserLayoutEnricher, runs emit_pretty_with_policy + parse to recover the fibre. Intuition: the cross-crate bridge that lets panproto-lens dispatch enrichment synthesis without depending on tree-sitter.
Parse / emit lens
The lens between byte sequences and decorated schemas. The get direction is parse; the put direction is emit_pretty. The complement (the data the schema does not pin down) is the byte-position layout fibre. Verified at the schema-level retraction law granularity by panproto_parse::parse_emit_lens::check_emit_parse and check_parse_emit. Intuition: the lens whose get reads source code into a schema and whose put writes a schema back to source code.
Parse / decorate / emit lens
The schema-level version of the parse / emit lens, with the byte step skipped. The get direction is forget_layout : DecoratedSchema → AbstractSchema; the put direction is decorate : AbstractSchema → DecoratedSchema. The section law forget_layout ∘ decorate ≅ id holds up to kind / edge multiset equivalence. Intuition: the lens between abstract and decorated schemas, parameterised by a LayoutPolicy.
Grammar cassette
A per-language implementation of GrammarCassette supplying default text for external scanner tokens that grammar.json cannot describe (variable-text delimiters, layout markers, scanner-state markers). Composed with the universal pattern table common_external_default via resolve_external_token: per-grammar override first, universal layer as fallback. Intuition: the small per-language patch sitting on top of the grammar-derived emit pipeline, supplying text for tokens whose actual content grammar.json alone cannot pin down.
Token role
Structural classification of every STRING literal in a grammar rule, derived from the literal’s position in the production body. Eight variants of panproto_parse::emit_pretty::TokenRole: BracketOpen, BracketClose, Separator, Keyword, Operator, Connector (a non-algebraic structural connector such as . or ::), Terminal (text from a leaf vertex’s literal-value), and Immediate (a token the grammar wraps in IMMEDIATE_TOKEN, glued to its neighbour with no whitespace). Computed once at Grammar::from_bytes time and stored as the per-rule token_roles map; consumed by the layout pass via the needs_space_by_role table. Intuition: what the emitter uses instead of inspecting the token text; every spacing decision follows from the role pair, not from any character set.
Acceptance predicate
The inductive function accepts_first_edge(production, edge_field, target_kind) over the production tree that decides whether a given alternative is structurally compatible with the cursor’s first unconsumed edge. Fuses FIELD-name matching, SYMBOL subtype dispatch, ALIAS rewrite, and yield-set admission into a single categorical rule. Implemented in panproto-parse::emit_pretty::accepts_first_edge. Intuition: the categorical core of CHOICE dispatch; the predicate the emitter consults before any heuristic tiebreaker.
Pre-alias symbol
The walker-recorded pre-alias-symbol constraint capturing tree_sitter::Node::grammar_name() (the SYMBOL name as it appears in the rule body before ALIAS { value: V } rewriting). Only recorded when it differs from the post-alias kind(). Consumed by alt_satisfies_pre_alias_constraints as the alias-source discriminator: an alt with a named ALIAS over a SYMBOL is structurally compatible iff the cursor edge’s pre-alias-symbol matches that SYMBOL. Intuition: the only ALIAS-disambiguation signal tree-sitter 0.25 / 0.26 surfaces through its C API.
Emit verification status
The programmatic tier reported by ParserRegistry::emit_verification_status classifying every protocol as Verified (every entry of the grammar author’s own test/corpus/ round-trips under the strict emit_corpus_audit oracle, or the protocol is pinned by a quivers backend test), Generic (registered with vendored grammar.json, no test asserts emit correctness), or Unsupported (no grammar, emit will fail). The verified set is the 255 names in VERIFIED_EMIT_PROTOCOLS. Downstream tooling calls this upfront to refuse emit on protocols whose correctness has not been exercised. Intuition: panproto’s own honesty signal about which protocols its test suite verifies for round-trip correctness.
Fixed-point law (emit)
The correctness witness for source-code emission: emit(parse(emit(s))) == emit(s). Asserted per-protocol by <lang>_emit_is_fixed_point regression tests in crates/panproto-parse/tests/emit_pretty_regressions.rs, and enforced over every grammar author’s full test/corpus/ by the strict emit_corpus_audit gate, which conjoins this fixed point with kind- and edge-multiset preservation. Stronger than the section law (which holds at the kind / edge multiset level); equality is byte-for-byte after the first emit. Intuition: the emitter has reached a fixed point of the parse / emit cycle, which is what guarantees that downstream re-parsing pipelines remain stable.
Section law
For the parse / decorate / emit lens at protocol under policy:
The equivalence is up to vertex-id renaming and the vertex-kind / edge-shape multiset. Verified for every grammar with a parse fixture in crates/panproto-parse/tests/decorate_section_law.rs. Intuition: decorating an abstract schema and stripping the layout back returns the same abstract content modulo the fresh IDs the parser invents.
See also
For longer treatments: Source-code emission, Schemas as theories, Lenses and round-trip laws, Layout enrichment.