graph BT
subgraph Level 0
gat["panproto-gat"]
end
subgraph "Level 0.5"
expr["panproto-expr"]
expr-parser["panproto-expr-parser"]
end
subgraph Level 1
schema["panproto-schema"]
inst["panproto-inst"]
end
subgraph Level 2
mig["panproto-mig"]
lens["panproto-lens"]
check["panproto-check"]
end
subgraph Level 3
protocols["panproto-protocols"]
end
subgraph "Level 3.5"
io["panproto-io"]
end
subgraph Level 4
core["panproto-core"]
wasm["panproto-wasm"]
cli["panproto-cli"]
end
expr-parser --> expr
schema --> gat
inst --> gat
inst --> expr
inst --> schema
mig --> gat
mig --> schema
mig --> inst
lens --> gat
lens --> schema
lens --> mig
check --> gat
check --> schema
check --> mig
protocols --> gat
protocols --> schema
io --> schema
io --> inst
io --> protocols
core --> gat
core --> schema
core --> inst
core --> mig
core --> lens
core --> check
core --> protocols
core --> io
wasm --> core
wasm --> expr-parser
cli --> core
cli --> expr-parser
5 Architecture Overview
panproto is organized into eleven crates arranged in six dependency levels. Consider a simple example: panproto::gat::Theory flows down through panproto-schema (which builds hypergraphs from theories) and panproto-inst (which populates those graphs with data), then up through panproto-mig (which compiles migrations between schema versions). Each level depends only on the levels below it, producing a strict DAG: \(L_0 \to L_1 \to \cdots \to L_4\).
5.1 The crate dependency hierarchy
| Level | Crates | Role |
|---|---|---|
| 0 | panproto-gat |
The GAT engine: theories, morphisms, colimits |
| 0.5 | panproto-expr |
Pure expression language: AST, evaluator, builtins |
| 0.5 | panproto-expr-parser |
Haskell-style surface syntax parser (logos + chumsky) |
| 1 | panproto-schema, panproto-inst |
Schema graphs and instance trees |
| 2 | panproto-mig, panproto-lens, panproto-check |
Migration compilation, lenses, breaking-change analysis |
| 3 | panproto-protocols |
Protocol definitions: theory composition, schema-level presentation (the parse/emit layer that converts native format syntax to panproto’s internal representation) |
| 3.5 | panproto-io |
Instance-level presentation: format-specific parse/emit for all 76 protocols |
| 4 | panproto-core, panproto-wasm, panproto-cli |
Re-export facade, WASM boundary, CLI |
Level 0 is the algebraic foundation. It knows nothing about schemas, instances, or protocols. It knows only the algebra of generalized algebraic theories.
Level 1 introduces the data model. panproto-schema defines the hypergraph structure that represents a data format’s type system. panproto-inst defines instances (actual data) conforming to those schemas.
Level 2 is where the interesting operations live. panproto-mig compiles migration specifications into executable form. panproto-lens constructs bidirectional lenses. panproto-check classifies schema changes as breaking or non-breaking. The protolens layer in panproto-lens lifts concrete lenses to schema-parameterized families (see Chapter 19). This layer depends on panproto-gat’s schema_functor and factorize modules for theory-level transforms and morphism decomposition, and on panproto-check for schema diffing that feeds the automated lens generation pipeline (Chapter 20). panproto-lens thus has a dependency on panproto-check in addition to panproto-mig.
Level 3 bundles concrete protocol definitions. Each protocol (AT Protocol, SQL, Protobuf, etc.) is a pair of GATs plus well-formedness rules, all built on the primitives from Levels 0 through 2. It also provides schema-level presentation: the parse/emit operations that convert native format specifications (.proto files, SQL DDL, GraphQL SDL) into abstract Schema models.
Level 3.5 provides instance-level presentation via panproto-io. This crate connects raw data bytes (HTML, CoNLL-U, Protobuf wire messages, JSON documents, CSV rows) to abstract instance models (WInstance/FInstance). Together with panproto-protocols, it completes both levels of the data migration pipeline: schema presentations map format specs to schema models, and instance presentations map format data to instance models. The commutativity guarantee from Spivak (2012) ensures that parse, restrict, and emit compose correctly.
Level 4 provides the consumer-facing surfaces. panproto-core re-exports everything through a single dependency. panproto-wasm exposes a handle-based WebAssembly API. panproto-cli wraps it all in a command-line tool.
5.1.1 The re-export facade
panproto-core is the entry point most consumers import. It re-exports all sub-crate APIs under short module names:
//! # panproto-core
//!
//! Core re-export facade for panproto.
//!
//! This crate provides a single, convenient entry point for consumers
//! by re-exporting the public APIs of all panproto sub-crates.
/// Re-export of `panproto-check` for validation and axiom checking.
pub use panproto_check as check;
/// Re-export of `panproto-gat` for GAT (generalized algebraic theory) types.
pub use panproto_gat as gat;
/// Re-export of `panproto-inst` for instance representations.
pub use panproto_inst as inst;
/// Re-export of `panproto-io` for instance-level parse/emit across all protocols.
pub use panproto_io as io;
/// Re-export of `panproto-lens` for bidirectional lenses and protolenses.
pub use panproto_lens as lens;
/// Re-export of `panproto-mig` for migration and lifting operations.
pub use panproto_mig as mig;
/// Re-export of `panproto-protocols` for built-in protocol definitions.
pub use panproto_protocols as protocols;
/// Re-export of `panproto-schema` for schema types and builders.
pub use panproto_schema as schema;
/// Re-export of `panproto-vcs` for schematic version control.
pub use panproto_vcs as vcs;
Downstream code writes panproto::gat::Theory rather than importing panproto-gat directly. The facade adds no logic of its own.
5.1.2 Crate dependency graph
5.2 Key design decisions
5.2.1 Why GATs
panproto formalizes data format specifications as generalized algebraic theories (GATs).1 A GAT describes a schema language’s vocabulary: what kinds of entities exist (sorts, i.e., types in a formal theory, like Vertex and Edge), what relationships connect them (operations, like src: Edge → Vertex), and what consistency rules hold (equations, like “a foreign key’s source column must belong to its source table”). The “generalized” part is that sorts can depend on other sorts: the set of valid constraints for a string vertex is different from the set for an integer vertex. This dependent structure appears in every real schema language. The type of a Protobuf field depends on a discriminator tag. The valid constraints on a JSON Schema node depend on its type keyword. The applicable column constraints in SQL depend on the column’s data type.
GATs give panproto three capabilities:
- Dependent sorts: the set of valid constraints for a vertex depends on what kind of vertex it is.
Constraint(v: Vertex)is a different set for eachv. - Structure-preserving morphisms: maps between theories that respect arities and equations. A schema migration is a morphism: it maps old sorts to new sorts and old operations to new operations, and the engine checks that all equations still hold.
- Compositional assembly: theories can be merged along shared structure, so protocols are assembled from reusable building blocks (e.g., “graph + constraints + multiplicity” for ATProto).
For the user-facing introduction to theories and protocols, see the tutorial chapters 3 and 4.
If your new functionality depends on Level 0 but nothing in Levels 1 or 2, it belongs as a new Level 1 crate (or an extension of an existing one). Adding a cross-level dependency breaks the DAG invariant and increases compile times for everything downstream. When in doubt, check the dependency graph in Figure 5.1.
5.2.2 Why handle-based WASM API
The WASM boundary (panproto-wasm) uses a handle/slab architecture rather than serde-wasm-bindgen serialization. Each Rust object (theory, schema, instance, migration) is stored in a typed slab on the Rust side. JavaScript receives an opaque u32 handle.
The reason is performance. Serializing a large schema through serde-wasm-bindgen on every API call would dominate the cost of actual computation. With handles, the only data that crosses the WASM boundary is numeric IDs and small MessagePack payloads for results. The TypeScript SDK wraps these handles in ergonomic classes with Symbol.dispose support for deterministic cleanup.
5.2.3 Why MessagePack
panproto uses MessagePack (via rmp-serde) as its primary serialization format for schema and instance data at rest. The choice over JSON, CBOR, or Protobuf-the-format comes down to three properties:
- Deterministic serialization: MessagePack’s canonical form guarantees that structurally identical objects produce byte-identical output, which is essential for content-addressed storage and migration hash verification.
- Compact binary representation: schemas with hundreds of vertices serialize to kilobytes rather than tens of kilobytes.
- Wide library support: MessagePack libraries exist for every language panproto targets.
JSON is still used as the human-facing format for CLI output and the parse_json/to_json round-trip in panproto-inst.
5.2.4 Why immutable fluent builders
SchemaBuilder and the migration DSL use an immutable fluent pattern: each method consumes self and returns Result<Self, Error>. This prevents partial construction. You can’t hold a reference to a half-built schema and accidentally use it. The pattern also makes validation composable: each step validates its own invariant, and build() only needs to check global properties (like non-emptiness) and compute indices.
An &mut self builder would allow callers to hold a reference to a partially constructed schema. A bug that reads from the builder mid-construction would see an incomplete adjacency index. The consume-self pattern makes this impossible at the type level.
5.3 The two-parameter architecture
Every protocol in panproto is defined by a pair of GATs. Formally, a protocol \(P = (T_{\mathrm{schema}}, T_{\mathrm{inst}}, R)\) where \(T_{\mathrm{schema}}\) is the schema theory, \(T_{\mathrm{inst}}\) is the instance theory, and \(R\) is a set of edge rules:
- Schema theory describes the shape of a data format. Its models (concrete instances that satisfy a theory’s requirements) are schemas (hypergraphs of vertex kinds and edge rules).
- Instance theory describes the content that inhabits a schema. Its models are instances (trees or tables of data).
This factorization is the key architectural insight. A single schema theory can have multiple instance theories (e.g., W-type trees for document data, set-valued models for relational data). A single instance theory can serve multiple schema theories. The Protocol struct in panproto-schema captures this pairing:
Protocol = (schema_theory: GAT, instance_theory: GAT, edge_rules, ...)
Migration between protocol versions is then a morphism at the schema theory level, compiled into executable operations at the instance theory level. The guarantee is that restriction (projecting data from the target schema onto the source schema, keeping only what the source schema expects) along a theory morphism composes correctly and preserves the structural axioms of the instance theory.2
5.4 Migration lifecycle
The following sequence diagram traces the full lifecycle from defining a protocol through migrating data:
sequenceDiagram
participant User
participant IO as panproto-io
participant Protocols as panproto-protocols
participant Schema as panproto-schema
participant Check as panproto-check
participant Mig as panproto-mig
participant Inst as panproto-inst
User->>Protocols: Define protocol (schema theory + instance theory)
User->>Schema: Build schema v1 via SchemaBuilder
User->>Schema: Build schema v2 via SchemaBuilder
User->>Check: check_existence(v1, v2)
Check-->>User: ExistenceReport (added/removed/modified)
User->>Mig: compile(migration_spec)
Mig-->>User: CompiledMigration
User->>IO: parse(protocol, schema_v1, raw_bytes)
IO-->>User: WInstance / FInstance (source)
User->>Inst: wtype_restrict(instance, v1, v2, compiled)
Inst-->>User: Migrated WInstance
User->>IO: emit(protocol, schema_v2, migrated_instance)
IO-->>User: Raw bytes (target format)
The pipeline is:
- Define protocols: select or compose GATs from the
panproto-protocolsregistry. - Build schemas: construct v1 and v2 using
SchemaBuilder, which validates each element against the protocol’s edge rules. - Check existence: use
panproto-checkto classify the differences between v1 and v2 (what was added, removed, modified). - Compile migration:
panproto-migcompiles a migration specification into aCompiledMigrationcontaining vertex remaps, edge remaps, and contraction resolvers. - Lift data:
panproto-instexecutes the 5-stepwtype_restrictpipeline to project instance data from v1 to v2.
5.5 What lives where
A quick reference for locating functionality:
| If you need to… | Look in… |
|---|---|
| Define or inspect a GAT | panproto-gat |
| Build a schema graph | panproto-schema |
| Parse generic JSON into a tree instance | panproto-inst |
| Parse format-specific data (HTML, CoNLL-U, Protobuf, …) | panproto-io |
| Compile a migration | panproto-mig |
| Build a bidirectional lens | panproto-lens |
| Detect breaking changes | panproto-check |
| Use a built-in protocol (AT Proto, SQL, …) | panproto-protocols |
| Import everything at once | panproto-core |
| Call from JavaScript | panproto-wasm + @panproto/core |
| Run from the command line | panproto-cli |
Protocol definitions belong in panproto-protocols (Level 3). Format-specific parsers and emitters belong in panproto-io (Level 3.5). Don’t put parse/emit logic in panproto-protocols; that crate handles only the schema-level presentation.
The next four chapters dive into the core crates, starting with the GAT engine in Chapter 6, then schemas and protocols in Chapter 7, and instances in Chapter 8.
GATs were introduced by Cartmell (1986), extending the ordinary algebraic theories of Lawvere (1963).↩︎
Restriction composes correctly: restricting along two successive migrations produces the same result as restricting along their composition. This compositionality property is called functoriality, and is established formally in Spivak (2012).↩︎