1 Welcome to panproto Development

panproto is a schematic version control system parameterized by pairs of Generalized Algebraic Theories (GATs). A protocol $\mathcal{P} = (\mathbb{T}_S, \mathbb{T}_I)$ consists of a schema theory $\mathbb{T}_S$ (what schemas look like) and an instance theory $\mathbb{T}_I$ (what data looks like). Migration, diffing, compatibility checking, and bidirectional lenses are derived from $\mathcal{P}$ by the same algorithms regardless of protocol.¹

The crate hierarchy mirrors this algebraic structure. Each level builds on the ones below it:

Level 0 (panproto-gat): sorts, operations, equations, theory morphisms, colimits.
Level 1 (panproto-schema, panproto-inst): schema graphs as models of $\mathbb{T}_S$; instances as models of $\mathbb{T}_I$.
Level 2 (panproto-mig, panproto-lens, panproto-check): migration compilation, bidirectional lenses, breaking-change detection.
Level 3 (panproto-protocols, panproto-io): protocol definitions, format-specific parse/emit.
Level 4 (panproto-core, panproto-wasm, panproto-cli): re-export facade, WASM boundary, CLI.

Each level depends only on the levels below it. panproto-gat has no panproto dependencies. panproto-core re-exports everything.

1.1 Three audiences

Contributors fixing bugs or adding features start with Chapter 2, then Chapter 3, then the architecture chapter for the crate you’re modifying. Chapter 16 through 18 cover the contribution process.

Protocol authors adding a new schema language define a pair of GATs ($\mathbb{T}_S$, $\mathbb{T}_I$) and implement the Protocol trait. Read Chapter 5, Chapter 6, Chapter 7, and Chapter 12.

Researchers exploring the internals read the architecture section sequentially (chapters 5 through 11), then Chapter 18 for the building-block theory catalog.

1.2 Key design decisions

Theories are data, not code. A protocol is defined by constructing Theory values, not by implementing an interface with methods. The engine inspects the theory’s sorts, operations, and equations to derive behavior.
Two parameters, not one. Both the schema shape ($\mathbb{T}_S$) and the data shape ($\mathbb{T}_I$) are parameters. The instance theory determines the restriction operator: restrict for $\text{ThFunctor}$, tree surgery for $\text{ThWType}$, field projection for $\text{ThFlat}$.
Content-addressed everything. Schemas, migrations, and commits are identified by blake3 hashes of their canonical forms. Identity comes from structure, not naming.

The tutorial teaches how to use panproto. This guide explains how the engine works. The tutorial defines what a theory morphism does; this guide shows how colimits are computed. The tutorial shows how to run a migration; this guide shows how the surviving-set computation and W-type restrict pipeline are implemented.

Cartmell, John. 1986. “Generalised Algebraic Theories and Contextual Categories.” Annals of Pure and Applied Logic 32: 209–43. https://doi.org/10.1016/0168-0072(86)90053-9.

Lynch, Owen, Kris Brown, James Fairbanks, and Evan Patterson. 2024. “GATlab: Modeling and Programming with Generalized Algebraic Theories.” arXiv Preprint arXiv:2404.04837, ahead of print. https://doi.org/10.48550/arXiv.2404.04837.

The foundational treatment of GATs is due to Cartmell (1986). panproto’s computational architecture adapts GATlab (Lynch et al. 2024) to Rust.↩︎

# Welcome to panproto Development {#sec-welcome} panproto is a schematic version control system parameterized by pairs of Generalized Algebraic Theories (GATs). A protocol $\mathcal{P} = (\mathbb{T}_S, \mathbb{T}_I)$ consists of a schema theory $\mathbb{T}_S$ (what schemas look like) and an instance theory $\mathbb{T}_I$ (what data looks like). Migration, diffing, compatibility checking, and bidirectional lenses are derived from $\mathcal{P}$ by the same algorithms regardless of protocol.^[The foundational treatment of GATs is due to @cartmell1986. panproto's computational architecture adapts [GATlab](https://algebraicjulia.github.io/GATlab.jl/) [@lynch2024] to Rust.] The crate hierarchy mirrors this algebraic structure. Each level builds on the ones below it: - **Level 0** (`panproto-gat`): sorts, operations, equations, theory morphisms, colimits. - **Level 1** (`panproto-schema`, `panproto-inst`): schema graphs as models of $\mathbb{T}_S$; instances as models of $\mathbb{T}_I$. - **Level 2** (`panproto-mig`, `panproto-lens`, `panproto-check`): migration compilation, bidirectional lenses, breaking-change detection. - **Level 3** (`panproto-protocols`, `panproto-io`): protocol definitions, format-specific parse/emit. - **Level 4** (`panproto-core`, `panproto-wasm`, `panproto-cli`): re-export facade, WASM boundary, CLI. Each level depends only on the levels below it. `panproto-gat` has no panproto dependencies. `panproto-core` re-exports everything. ## Three audiences **Contributors** fixing bugs or adding features start with @sec-first-contribution, then @sec-building-testing, then the architecture chapter for the crate you're modifying. Chapter 16 through 18 cover the contribution process. **Protocol authors** adding a new schema language define a pair of GATs ($\mathbb{T}_S$, $\mathbb{T}_I$) and implement the `Protocol` trait. Read @sec-architecture-overview, @sec-gat-engine, @sec-schema-protocols, and @sec-protocols. **Researchers** exploring the internals read the architecture section sequentially (chapters 5 through 11), then @sec-advanced-internals for the building-block theory catalog. ## Key design decisions 1. **Theories are data, not code.** A protocol is defined by constructing `Theory` values, not by implementing an interface with methods. The engine inspects the theory's sorts, operations, and equations to derive behavior. 2. **Two parameters, not one.** Both the schema shape ($\mathbb{T}_S$) and the data shape ($\mathbb{T}_I$) are parameters. The instance theory determines the restriction operator: `restrict` for $\text{ThFunctor}$, tree surgery for $\text{ThWType}$, field projection for $\text{ThFlat}$. 3. **Content-addressed everything.** Schemas, migrations, and commits are identified by blake3 hashes of their canonical forms. Identity comes from structure, not naming. The [tutorial](https://panproto.dev/tutorial/) teaches how to *use* panproto. This guide explains how the engine *works*. The tutorial defines what a theory morphism does; this guide shows how colimits are computed. The tutorial shows how to run a migration; this guide shows how the surviving-set computation and W-type restrict pipeline are implemented.