5  Architecture Overview

panproto is organized into eleven crates arranged in six dependency levels. Consider a simple example: panproto::gat::Theory flows down through panproto-schema (which builds hypergraphs from theories) and panproto-inst (which populates those graphs with data), then up through panproto-mig (which compiles migrations between schema versions). Each level depends only on the levels below it, producing a strict DAG: \(L_0 \to L_1 \to \cdots \to L_4\).

5.1 The crate dependency hierarchy

Table 5.1: The six-level crate hierarchy.
Level Crates Role
0 panproto-gat The GAT engine: theories, morphisms, colimits
0.5 panproto-expr Pure expression language: AST, evaluator, builtins
0.5 panproto-expr-parser Haskell-style surface syntax parser (logos + chumsky)
1 panproto-schema, panproto-inst Schema graphs and instance trees
2 panproto-mig, panproto-lens, panproto-check Migration compilation, lenses, breaking-change analysis
3 panproto-protocols Protocol definitions: theory composition, schema-level presentation (the parse/emit layer that converts native format syntax to panproto’s internal representation)
3.5 panproto-io Instance-level presentation: format-specific parse/emit for all 76 protocols
4 panproto-core, panproto-wasm, panproto-cli Re-export facade, WASM boundary, CLI

Level 0 is the algebraic foundation. It knows nothing about schemas, instances, or protocols. It knows only the algebra of generalized algebraic theories.

Level 1 introduces the data model. panproto-schema defines the hypergraph structure that represents a data format’s type system. panproto-inst defines instances (actual data) conforming to those schemas.

Level 2 is where the interesting operations live. panproto-mig compiles migration specifications into executable form. panproto-lens constructs bidirectional lenses. panproto-check classifies schema changes as breaking or non-breaking. The protolens layer in panproto-lens lifts concrete lenses to schema-parameterized families (see Chapter 19). This layer depends on panproto-gat’s schema_functor and factorize modules for theory-level transforms and morphism decomposition, and on panproto-check for schema diffing that feeds the automated lens generation pipeline (Chapter 20). panproto-lens thus has a dependency on panproto-check in addition to panproto-mig.

Level 3 bundles concrete protocol definitions. Each protocol (AT Protocol, SQL, Protobuf, etc.) is a pair of GATs plus well-formedness rules, all built on the primitives from Levels 0 through 2. It also provides schema-level presentation: the parse/emit operations that convert native format specifications (.proto files, SQL DDL, GraphQL SDL) into abstract Schema models.

Level 3.5 provides instance-level presentation via panproto-io. This crate connects raw data bytes (HTML, CoNLL-U, Protobuf wire messages, JSON documents, CSV rows) to abstract instance models (WInstance/FInstance). Together with panproto-protocols, it completes both levels of the data migration pipeline: schema presentations map format specs to schema models, and instance presentations map format data to instance models. The commutativity guarantee from Spivak (2012) ensures that parse, restrict, and emit compose correctly.

Level 4 provides the consumer-facing surfaces. panproto-core re-exports everything through a single dependency. panproto-wasm exposes a handle-based WebAssembly API. panproto-cli wraps it all in a command-line tool.

5.1.1 The re-export facade

panproto-core is the entry point most consumers import. It re-exports all sub-crate APIs under short module names:

//! # panproto-core
//!
//! Core re-export facade for panproto.
//!
//! This crate provides a single, convenient entry point for consumers
//! by re-exporting the public APIs of all panproto sub-crates.

/// Re-export of `panproto-check` for validation and axiom checking.
pub use panproto_check as check;
/// Re-export of `panproto-gat` for GAT (generalized algebraic theory) types.
pub use panproto_gat as gat;
/// Re-export of `panproto-inst` for instance representations.
pub use panproto_inst as inst;
/// Re-export of `panproto-io` for instance-level parse/emit across all protocols.
pub use panproto_io as io;
/// Re-export of `panproto-lens` for bidirectional lenses and protolenses.
pub use panproto_lens as lens;
/// Re-export of `panproto-mig` for migration and lifting operations.
pub use panproto_mig as mig;
/// Re-export of `panproto-protocols` for built-in protocol definitions.
pub use panproto_protocols as protocols;
/// Re-export of `panproto-schema` for schema types and builders.
pub use panproto_schema as schema;
/// Re-export of `panproto-vcs` for schematic version control.
pub use panproto_vcs as vcs;

Downstream code writes panproto::gat::Theory rather than importing panproto-gat directly. The facade adds no logic of its own.

5.1.2 Crate dependency graph

graph BT
    subgraph Level 0
        gat["panproto-gat"]
    end
    subgraph "Level 0.5"
        expr["panproto-expr"]
        expr-parser["panproto-expr-parser"]
    end
    subgraph Level 1
        schema["panproto-schema"]
        inst["panproto-inst"]
    end
    subgraph Level 2
        mig["panproto-mig"]
        lens["panproto-lens"]
        check["panproto-check"]
    end
    subgraph Level 3
        protocols["panproto-protocols"]
    end
    subgraph "Level 3.5"
        io["panproto-io"]
    end
    subgraph Level 4
        core["panproto-core"]
        wasm["panproto-wasm"]
        cli["panproto-cli"]
    end

    expr-parser --> expr
    schema --> gat
    inst --> gat
    inst --> expr
    inst --> schema
    mig --> gat
    mig --> schema
    mig --> inst
    lens --> gat
    lens --> schema
    lens --> mig
    check --> gat
    check --> schema
    check --> mig
    protocols --> gat
    protocols --> schema
    io --> schema
    io --> inst
    io --> protocols
    core --> gat
    core --> schema
    core --> inst
    core --> mig
    core --> lens
    core --> check
    core --> protocols
    core --> io
    wasm --> core
    wasm --> expr-parser
    cli --> core
    cli --> expr-parser
Figure 5.1: Crate dependency graph. Arrows point from dependant to dependency.

5.2 Key design decisions

5.2.1 Why GATs

panproto formalizes data format specifications as generalized algebraic theories (GATs).1 A GAT describes a schema language’s vocabulary: what kinds of entities exist (sorts, i.e., types in a formal theory, like Vertex and Edge), what relationships connect them (operations, like src: Edge → Vertex), and what consistency rules hold (equations, like “a foreign key’s source column must belong to its source table”). The “generalized” part is that sorts can depend on other sorts: the set of valid constraints for a string vertex is different from the set for an integer vertex. This dependent structure appears in every real schema language. The type of a Protobuf field depends on a discriminator tag. The valid constraints on a JSON Schema node depend on its type keyword. The applicable column constraints in SQL depend on the column’s data type.

GATs give panproto three capabilities:

  • Dependent sorts: the set of valid constraints for a vertex depends on what kind of vertex it is. Constraint(v: Vertex) is a different set for each v.
  • Structure-preserving morphisms: maps between theories that respect arities and equations. A schema migration is a morphism: it maps old sorts to new sorts and old operations to new operations, and the engine checks that all equations still hold.
  • Compositional assembly: theories can be merged along shared structure, so protocols are assembled from reusable building blocks (e.g., “graph + constraints + multiplicity” for ATProto).
Note

For the user-facing introduction to theories and protocols, see the tutorial chapters 3 and 4.

CautionWhen should a new crate be introduced?

If your new functionality depends on Level 0 but nothing in Levels 1 or 2, it belongs as a new Level 1 crate (or an extension of an existing one). Adding a cross-level dependency breaks the DAG invariant and increases compile times for everything downstream. When in doubt, check the dependency graph in Figure 5.1.

5.2.2 Why handle-based WASM API

The WASM boundary (panproto-wasm) uses a handle/slab architecture rather than serde-wasm-bindgen serialization. Each Rust object (theory, schema, instance, migration) is stored in a typed slab on the Rust side. JavaScript receives an opaque u32 handle.

The reason is performance. Serializing a large schema through serde-wasm-bindgen on every API call would dominate the cost of actual computation. With handles, the only data that crosses the WASM boundary is numeric IDs and small MessagePack payloads for results. The TypeScript SDK wraps these handles in ergonomic classes with Symbol.dispose support for deterministic cleanup.

5.2.3 Why MessagePack

panproto uses MessagePack (via rmp-serde) as its primary serialization format for schema and instance data at rest. The choice over JSON, CBOR, or Protobuf-the-format comes down to three properties:

  • Deterministic serialization: MessagePack’s canonical form guarantees that structurally identical objects produce byte-identical output, which is essential for content-addressed storage and migration hash verification.
  • Compact binary representation: schemas with hundreds of vertices serialize to kilobytes rather than tens of kilobytes.
  • Wide library support: MessagePack libraries exist for every language panproto targets.

JSON is still used as the human-facing format for CLI output and the parse_json/to_json round-trip in panproto-inst.

5.2.4 Why immutable fluent builders

SchemaBuilder and the migration DSL use an immutable fluent pattern: each method consumes self and returns Result<Self, Error>. This prevents partial construction. You can’t hold a reference to a half-built schema and accidentally use it. The pattern also makes validation composable: each step validates its own invariant, and build() only needs to check global properties (like non-emptiness) and compute indices.

CautionWhy not interior mutability for the builder?

An &mut self builder would allow callers to hold a reference to a partially constructed schema. A bug that reads from the builder mid-construction would see an incomplete adjacency index. The consume-self pattern makes this impossible at the type level.

5.3 The two-parameter architecture

Every protocol in panproto is defined by a pair of GATs. Formally, a protocol \(P = (T_{\mathrm{schema}}, T_{\mathrm{inst}}, R)\) where \(T_{\mathrm{schema}}\) is the schema theory, \(T_{\mathrm{inst}}\) is the instance theory, and \(R\) is a set of edge rules:

  1. Schema theory describes the shape of a data format. Its models (concrete instances that satisfy a theory’s requirements) are schemas (hypergraphs of vertex kinds and edge rules).
  2. Instance theory describes the content that inhabits a schema. Its models are instances (trees or tables of data).

This factorization is the key architectural insight. A single schema theory can have multiple instance theories (e.g., W-type trees for document data, set-valued models for relational data). A single instance theory can serve multiple schema theories. The Protocol struct in panproto-schema captures this pairing:

Protocol = (schema_theory: GAT, instance_theory: GAT, edge_rules, ...)

Migration between protocol versions is then a morphism at the schema theory level, compiled into executable operations at the instance theory level. The guarantee is that restriction (projecting data from the target schema onto the source schema, keeping only what the source schema expects) along a theory morphism composes correctly and preserves the structural axioms of the instance theory.2

5.4 Migration lifecycle

The following sequence diagram traces the full lifecycle from defining a protocol through migrating data:

sequenceDiagram
    participant User
    participant IO as panproto-io
    participant Protocols as panproto-protocols
    participant Schema as panproto-schema
    participant Check as panproto-check
    participant Mig as panproto-mig
    participant Inst as panproto-inst

    User->>Protocols: Define protocol (schema theory + instance theory)
    User->>Schema: Build schema v1 via SchemaBuilder
    User->>Schema: Build schema v2 via SchemaBuilder
    User->>Check: check_existence(v1, v2)
    Check-->>User: ExistenceReport (added/removed/modified)
    User->>Mig: compile(migration_spec)
    Mig-->>User: CompiledMigration
    User->>IO: parse(protocol, schema_v1, raw_bytes)
    IO-->>User: WInstance / FInstance (source)
    User->>Inst: wtype_restrict(instance, v1, v2, compiled)
    Inst-->>User: Migrated WInstance
    User->>IO: emit(protocol, schema_v2, migrated_instance)
    IO-->>User: Raw bytes (target format)
Figure 5.2: Full migration lifecycle, from raw data through migration to target format.

The pipeline is:

  1. Define protocols: select or compose GATs from the panproto-protocols registry.
  2. Build schemas: construct v1 and v2 using SchemaBuilder, which validates each element against the protocol’s edge rules.
  3. Check existence: use panproto-check to classify the differences between v1 and v2 (what was added, removed, modified).
  4. Compile migration: panproto-mig compiles a migration specification into a CompiledMigration containing vertex remaps, edge remaps, and contraction resolvers.
  5. Lift data: panproto-inst executes the 5-step wtype_restrict pipeline to project instance data from v1 to v2.

5.5 What lives where

A quick reference for locating functionality:

Table 5.2: Where to find things.
If you need to… Look in…
Define or inspect a GAT panproto-gat
Build a schema graph panproto-schema
Parse generic JSON into a tree instance panproto-inst
Parse format-specific data (HTML, CoNLL-U, Protobuf, …) panproto-io
Compile a migration panproto-mig
Build a bidirectional lens panproto-lens
Detect breaking changes panproto-check
Use a built-in protocol (AT Proto, SQL, …) panproto-protocols
Import everything at once panproto-core
Call from JavaScript panproto-wasm + @panproto/core
Run from the command line panproto-cli
CautionWhere do new protocol definitions go?

Protocol definitions belong in panproto-protocols (Level 3). Format-specific parsers and emitters belong in panproto-io (Level 3.5). Don’t put parse/emit logic in panproto-protocols; that crate handles only the schema-level presentation.

The next four chapters dive into the core crates, starting with the GAT engine in Chapter 6, then schemas and protocols in Chapter 7, and instances in Chapter 8.

Cartmell, John. 1986. “Generalised Algebraic Theories and Contextual Categories.” Annals of Pure and Applied Logic 32: 209–43. https://doi.org/10.1016/0168-0072(86)90053-9.
Lawvere, F. William. 1963. “Functorial Semantics of Algebraic Theories.” PhD thesis, Columbia University.
Spivak, David I. 2012. “Functorial Data Migration.” Information and Computation 217: 31–51. https://arxiv.org/abs/1009.1166.

  1. GATs were introduced by Cartmell (1986), extending the ordinary algebraic theories of Lawvere (1963).↩︎

  2. Restriction composes correctly: restricting along two successive migrations produces the same result as restricting along their composition. This compositionality property is called functoriality, and is established formally in Spivak (2012).↩︎