6  panproto-gat: The GAT Engine

Every protocol panproto supports—ATProto, SQL, Protobuf, GraphQL, the rest—bottoms out in the same algebraic vocabulary. That vocabulary lives in panproto-gat, the Level 0 crate at the base of the dependency hierarchy. Nothing in panproto depends on anything lower. Everything above—schemas, instances, migrations, lenses, the CLI—depends on the structures defined here.

The crate implements Generalized Algebraic Theories as defined by Cartmell (1986). If you have worked through the tutorial’s chapter on theories, you already have the intuitions. This chapter is the Rust-level view: what the data structures look like, how the algorithms work, and where the performance-sensitive paths are.

6.1 Sorts and sort parameters

Consider the theory of categories. It has a sort \(\mathrm{Ob}\) for objects and a sort \(\mathrm{Hom}(a{:}\,\mathrm{Ob},\; b{:}\,\mathrm{Ob})\) for morphisms between two objects. The first sort is simple—it stands alone, no parameters. The second is dependent—it only makes sense once you pick two objects. That distinction between simple and dependent sorts is what separates GATs from ordinary algebraic theories in the sense of Lawvere (1963), and it is the reason panproto needs GATs rather than something simpler.

In Rust, a sort is a name plus a (possibly empty) list of parameters:

use std::sync::Arc;

/// Classification of a coercion's round-trip properties.
///
/// Forms a four-element lattice under information loss, shaped as a
/// diamond:
///
/// ```text
///          Iso
///         /   \
///  Retraction   Projection
///         \   /
///         Opaque
/// ```
///
/// Categorically, this classifies the adjunction witness of a fiber
/// morphism in the Grothendieck fibration over the schema category:
///
/// - `Iso`: both unit and counit are identities (isomorphism in the
///   fiber). Complement stores nothing.
/// - `Retraction`: the forward map has a left inverse
///   (`inverse(forward(v)) = v`). The forward map is injective
///   (information-preserving). Complement stores the residual between
///   the original and the round-tripped value.
/// - `Projection`: the forward map is a dependent projection from the
///   total fiber. The result is deterministically re-derivable from
///   the source data, but no inverse exists that recovers the source
///   from the result alone. Complement stores nothing for the derived
///   field because `get` re-derives it; this is the dual of
///   `Retraction` in the information-loss lattice.
/// - `Opaque`: no structural relationship between forward and backward
///   maps. Complement stores the entire original value.
///
/// Composition follows the lattice meet (in the quality ordering):
/// `Iso` is the identity, `Opaque` is the absorber, same-kind composes
/// idempotently, and cross-kind (`Retraction` ∘ `Projection`) collapses
/// to `Opaque`.
#[derive(

The arity() method—the number of parameters—is the key invariant that theory morphisms must preserve. A morphism that sends a 2-ary sort to a 0-ary sort is ill-formed, full stop.

The sorts you will encounter most often in panproto’s protocol definitions are Vertex and Edge (both simple, arity 0), Constraint(v: Vertex) (dependent, arity 1, tying a constraint to a specific vertex), and Hom(a: Ob, b: Ob) (dependent, arity 2, from the categorical machinery).

6.2 Operations

Sorts name the types. Operations name the functions between them. In a graph theory, src and tgt are operations: src takes an edge and returns a vertex, tgt does the same. A monoid theory has a nullary operation unit (a constant—zero inputs, one output) and a binary operation mul.

An operation’s signature is the pair of its input sorts and its output sort. Morphisms must preserve these signatures after applying the sort mapping.

use std::sync::Arc;

/// An operation (term constructor) in a GAT.
///
/// Operations are the functions/constructors of a GAT. Each operation
/// has typed inputs and a typed output, where types are sort names.
///
/// # Examples
///
/// - `src: Edge → Vertex` (graph source map)
/// - `add: (a: Int, b: Int) → Int` (integer addition)
/// - `id: (x: Ob) → Hom(x, x)` (identity morphism)
///
/// Based on the formal definition of GAT operations from Cartmell (1986).
#[derive(Debug, Clone, PartialEq, Eq, Hash, serde::Serialize, serde::Deserialize)]
pub struct Operation {
    /// The operation name (e.g., "src", "tgt", "compose").
    pub name: Arc<str>,
    /// Typed inputs as `(param_name, sort_name)` pairs.
    pub inputs: Vec<(Arc<str>, Arc<str>)>,
    /// The output sort name.
    pub output: Arc<str>,
}

The inputs field stores (param_name, sort_name) pairs—enough for first-order operations. panproto does not need higher-order operations (operations whose inputs are themselves operations), so the representation stays simple.

6.3 Equations

Operations without constraints can do anything. Equations rein them in. An equation \(\ell = r\) asserts that two terms built from variables and operation applications are equal. In a category theory, the associativity equation says that composing three morphisms in two different groupings yields the same result. In a monoid theory, the identity equations say that multiplying by the unit is a no-op.

panproto represents terms with a small two-variant enum: Term::Var(name) for variable references and Term::App(op_name, args) for operation applications. Equations are pairs of terms. They are not computationally evaluated—they serve as proof obligations. When the engine checks a theory morphism, it verifies that mapping each equation through the morphism yields an equation that holds in the target theory.

6.4 Sort kinds

Not all sorts play the same role. panproto classifies each sort by a kind that separates structural schema-level sorts from data-level value sorts:

Structural (the default) covers sorts like Vertex, Edge, and Constraint—the scaffolding of a schema graph.

Val(kind) marks a sort that carries data of a specific ValueKind: Bool, Int, Float, Str, Bytes, Token, Null, or Any. The sort Val(Str) classifies string values.

Coercion { from, to } marks a sort whose values are directed morphisms between value kinds. A Coercion { from: Str, to: Int } classifies functions that convert strings to integers—parse_int and its cousins.

Merger(kind) marks a sort whose values combine (merge) two values of the same kind. The VCS merge algorithm uses mergers to reconcile conflicting edits.

6.5 Directed equations

Standard equations are symmetric: \(\ell = r\) means the same thing as \(r = \ell\). But migrations need direction. When the engine encounters a value matching \(\ell\), it should rewrite to \(r\), not the other way around.

A directed equation \(\ell \longrightarrow r\) adds an impl_term—a computable expression from panproto-expr (a pure functional language with lambdas, pattern matching, records, lists, and roughly 50 built-in operations)—and an optional inverse for the backward (put) direction of a lens. The directed equation is the mechanism that turns an abstract algebraic relationship into executable migration code.

6.6 Conflict policies

When two branches of a VCS merge produce different values for the same field, someone has to decide who wins. A ConflictPolicy attached to a sort kind makes that decision: KeepLeft always prefers the left branch, KeepRight the right, Fail reports a conflict (the default), and Custom(expr) evaluates a panproto_expr::Expr to decide programmatically.

6.7 Theories

All of the above—sorts, operations, equations, directed equations, sort kinds, conflict policies—come together in a single structure:

/// A conflict resolution strategy for merge operations.
#[derive(Debug, Clone, PartialEq, Eq, Hash, serde::Serialize, serde::Deserialize)]
pub enum ConflictStrategy {
    /// Keep the left (ours) value.
    KeepLeft,
    /// Keep the right (theirs) value.
    KeepRight,
    /// Fail on conflict.
    Fail,
    /// Custom resolution via an expression.
    Custom(panproto_expr::Expr),
}

/// A conflict resolution policy for merge operations.
#[derive(Debug, Clone, PartialEq, Eq, Hash, serde::Serialize, serde::Deserialize)]
pub struct ConflictPolicy {
    /// A human-readable name for this policy.
    pub name: Arc<str>,
    /// The value kind this policy applies to.
    pub value_kind: ValueKind,
    /// The strategy to use when values conflict.
    pub strategy: ConflictStrategy,
}

/// A generalized algebraic theory (GAT).
///
/// Theories are named collections of sorts, operations, and equations,
/// with optional inheritance via `extends`. When a theory extends another,
/// it inherits all of the parent's sorts, operations, and equations.
///
/// # Index cache
///
/// Precomputed `FxHashMap` indices provide O(1) lookup by name for sorts,
/// operations, and equations. These are rebuilt from the vectors at
/// construction and deserialization time.

A theory \(\mathcal{T} = (S, \Omega, E)\) in the classical sense is a set of sorts, a set of operations typed over those sorts, and a set of equations over terms built from the operations. panproto extends this with the directed equations, conflict policies, and an inheritance relation (extends).

6.7.1 Inheritance

The extends field enables compositional theory construction. When theory B extends theory A, B inherits all of A’s sorts, operations, and equations. The resolve_theory function computes the transitive closure of the inheritance chain, merging declarations and detecting cycles.

Diamond inheritance works correctly. If theories B and C both extend A, and theory D extends both B and C, the shared declarations from A appear exactly once in the resolved D. This is checked, not assumed.

Note

For the conceptual introduction to GATs, see the tutorial chapter 3.

When should you use inheritance versus colimits? Use extends when the child genuinely specializes the parent—adding sorts or operations to an existing vocabulary. Use colimits (described below) when composing two independent theory fragments that happen to share a sub-theory. The heuristic: if the overlap is intentional (the child is meant to be a refinement of the parent), inherit. If the overlap is accidental (same names, same roles, but developed independently), take a colimit.

6.7.2 Structure diagram

classDiagram
    class Theory {
        +String name
        +Vec~String~ extends
        +Vec~Sort~ sorts
        +Vec~Operation~ ops
        +Vec~Equation~ eqs
        +find_sort(name) Option~Sort~
        +find_op(name) Option~Operation~
        +find_eq(name) Option~Equation~
    }

    class Sort {
        +String name
        +Vec~SortParam~ params
        +is_simple() bool
        +arity() usize
    }

    class SortParam {
        +String name
        +String sort
    }

    class Operation {
        +String name
        +Vec~(String, String)~ inputs
        +String output
        +arity() usize
    }

    class Equation {
        +String name
        +Term lhs
        +Term rhs
    }

    class Term {
        <<enumeration>>
        Var(String)
        App(String, Vec~Term~)
    }

    Theory *-- Sort : contains
    Theory *-- Operation : contains
    Theory *-- Equation : contains
    Sort *-- SortParam : params
    Equation *-- Term : lhs, rhs
    Operation ..> Sort : references by name
Figure 6.1: Internal structure of a Theory. Sorts may have SortParam dependencies; Operations reference sorts by name; Equations contain Term trees.

6.8 Theory morphisms

A theory morphism \(F\colon \mathcal{T}_1 \to \mathcal{T}_2\) is the mathematical heart of a migration. It maps sorts to sorts and operations to operations, preserving three things: arity (a 2-ary sort maps to a 2-ary sort), signature (if \(f\colon A \times B \to C\), then \(F(f)\colon F(A) \times F(B) \to F(C)\)), and equations (if \(\ell = r\) in \(\mathcal{T}_1\), then \(F(\ell) = F(r)\) in \(\mathcal{T}_2\)).


/// A structure-preserving map between two theories.
///
/// Maps sorts to sorts and operations to operations. A valid morphism
/// must preserve sort arities, operation type signatures, and equations.
#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
pub struct TheoryMorphism {
    /// A human-readable name for this morphism.
    pub name: Arc<str>,
    /// The name of the domain theory.
    pub domain: Arc<str>,
    /// The name of the codomain theory.
    pub codomain: Arc<str>,
    /// Mapping from domain sort names to codomain sort names.
    pub sort_map: HashMap<Arc<str>, Arc<str>>,
    /// Mapping from domain operation names to codomain operation names.
    pub op_map: HashMap<Arc<str>, Arc<str>>,

The check_morphism function validates all three preservation conditions. This is the critical correctness check for migrations: a migration specification is valid if and only if it induces a valid theory morphism between the source and target schema theories.

6.8.1 Why the direction is reversed

When we migrate data from schema v1 to schema v2, the morphism goes from v2’s theory to v1’s theory—backwards from what you might expect. The reason is contravariance. The restrict functor that lifts instances goes in the opposite direction from the schema morphism, just as a pullback in topology goes in the opposite direction from a continuous map. The CompiledMigration in panproto-inst is the executable form of this morphism. The wtype_restrict pipeline is the functor it induces on instances.

Because restriction is functorial, composing two migrations and restricting once is the same as restricting twice (Spivak 2012). There are no migration bugs that arise from order-of-operations issues—the algebra rules them out.

6.8.2 Inducing schema renames

Theory morphisms induce schema-level renames via induce_schema_renames(). Sort-map entries where old \(\neq\) new become NameSite::VertexKind renames (sorts map to vertex kinds at the schema level). Op-map entries where old \(\neq\) new become NameSite::EdgeKind renames. This sits at the top of the morphism tower described in Section 18.2.

6.9 Identity and naming

The ident module solves a problem that bites most algebraic implementations eventually: conflating identity with display name. A sort named “Vertex” in protocol A and a sort named “Vertex” in protocol B are not the same sort, even though they print the same way. panproto uses three types to keep the distinction clean:

ScopeTag is an opaque scope identifier generated by ScopeTag::fresh(). Two sorts in different scopes are always distinct, regardless of their display names.

Ident is a first-class identifier whose equality uses (scope, index) only, never the display name. This follows the design of GATlab (Lynch et al. 2024).

Name is an interned string handle (Arc<str>) with a pointer-equality fast path—a drop-in replacement for String in hot-path types, avoiding repeated allocation and comparison of the same string.

The NameSite enum lists the nine naming sites (edge labels, vertex IDs, vertex kinds, edge kinds, NSIDs, constraint sorts, instance anchors, theory names, sort names). SiteRename pairs a site with a rename operation. The full contributor guide is in Section 18.2.

6.10 Colimits: merging theories

Protocol schema theories are not written monolithically. The ATProto schema theory, for example, is assembled by combining a base graph theory (ThGraph, with Vertex, Edge, src, tgt) with a constraint theory (ThConstraint, with Constraint, target) over their shared ThVertex. The combining operation is a colimit—specifically, a pushout.

Given inclusions \(i_1\colon S \hookrightarrow T_1\) and \(i_2\colon S \hookrightarrow T_2\) of a shared base theory \(S\) into two extensions, the pushout \(T_1 +_S T_2\) contains all declarations from both, with the shared declarations identified (appearing once, not duplicated).

use crate::morphism::TheoryMorphism;
use crate::theory::Theory;

/// Result of a categorical pushout (colimit) computation.
///
/// Contains the pushout theory along with inclusion morphisms from
/// both input theories into the pushout. The cocone condition
/// `j1 ∘ i1 = j2 ∘ i2` is verified at construction time.
#[derive(Debug, Clone)]
pub struct ColimitResult {
    /// The pushout theory.
    pub theory: Theory,
    /// Inclusion morphism from the first theory into the pushout: j1: T1 → P.
    pub inclusion1: TheoryMorphism,
    /// Inclusion morphism from the second theory into the pushout: j2: T2 → P.
    pub inclusion2: TheoryMorphism,
}

impl ColimitResult {
graph LR
    S["Shared theory S"]
    T1["Theory T1"]
    T2["Theory T2"]
    C["Colimit T1 +_S T2"]

    S -->|"inclusion"| T1
    S -->|"inclusion"| T2
    T1 -->|"injection"| C
    T2 -->|"injection"| C

    style C fill:#e8f4e8,stroke:#2d7d2d
Figure 6.2: Pushout (colimit) of two theories over a shared base. The shared sorts, operations, and equations appear exactly once in the result.

6.10.1 Conflict detection

Three kinds of conflicts can arise during a colimit. Sort conflicts: \(T_1\) and \(T_2\) both declare a sort with the same name but different parameter lists, and the sort is not in the shared base. Operation conflicts: same-named operations with incompatible signatures. Equation conflicts: same-named equations with different term content.

Compatible duplicates—same name, same definition—are silently deduplicated. This allows independent protocol extensions to share common vocabulary without coordination, as long as they agree on what the shared names mean.

CautionWhat happens when two protocol fragments define the same sort name with different arities?

The colimit function returns a GatError::SortConflict. You must either rename one sort before merging or use a quotient to explicitly identify them. There is no implicit resolution.

6.11 Type-checking

With the data structures in place, we need a way to verify that a theory is internally consistent—that every equation is well-typed, that every operation application uses the right number of arguments of the right sorts.

The core function, typecheck_term, infers the output sort of a term given a variable typing context and a theory. For Term::Var(x), it returns the sort from the context (or GatError::UnboundVariable). For Term::App { op, args }, it looks up the operation, recursively typechecks each argument, checks that each argument’s sort matches the operation’s signature, and returns the output sort.

pub fn typecheck_term(
    term: &Term,
    ctx: &VarContext,
    theory: &Theory,
) -> Result<Arc<str>, GatError>

The VarContext is a FxHashMap<Arc<str>, Arc<str>> mapping variable names to sort names. You can construct one manually, but more commonly the engine infers it automatically via infer_var_sorts, which walks both sides of an equation collecting sort constraints from every operation application site. If the same variable appears at two positions requiring different sorts, the function returns GatError::ConflictingVarSort.

Building on these, typecheck_equation infers variable sorts, typechecks both sides, and verifies they produce the same output sort. typecheck_theory iterates over all equations and typechecks each one. The VCS pipeline calls typecheck_theory on every schema add and schema commit.

Tip

Type-checking is fast—linear in the size of the term tree for each equation. Even theories with hundreds of equations typecheck in microseconds.

The error variants carry enough context for actionable diagnostics:

Table 6.1: Type-checking error variants.
Error variant Cause
UnboundVariable Variable name not in the typing context
OpNotFound Operation name does not exist in the theory
TermArityMismatch Wrong number of arguments to an operation
ArgTypeMismatch Argument sort does not match expected input sort
EquationSortMismatch Two sides of an equation have different sorts
ConflictingVarSort Variable used at positions requiring incompatible sorts

6.12 Model checking

Type-checking tells you whether a theory’s equations are well-formed. Model checking tells you whether a concrete interpretation satisfies them.

A model \(M\) of a theory \(\mathcal{T}\) assigns a carrier set \(M(S)\) to each sort and a function \(M(f)\colon M(S_1) \times \cdots \times M(S_n) \to M(T)\) to each operation. The model satisfies \(\mathcal{T}\) if every equation evaluates to equal values under all variable assignments drawn from the carriers.

In Rust, a Model maps each sort to a Vec<ModelValue> and each operation to a closure. ModelValue is a small enum—Int(i64) and Str(String)—sufficient for the finite models panproto needs for equation verification.

pub fn check_model(model: &Model, theory: &Theory) -> Result<Vec<EquationViolation>, GatError>

For each equation, the checker infers variable sorts, builds the cartesian product of the relevant carrier sets, and evaluates both sides under every assignment. When the two sides disagree, the assignment is recorded as an EquationViolation, which captures the equation name, the variable assignment, and the two values—enough for the CLI to print something like “equation ‘assoc’ violated when a=2, b=3, c=1: LHS=6, RHS=4”.

The CheckModelOptions struct provides safety bounds. The max_assignments field (default: 10,000) caps the total assignments evaluated per equation. An equation with 3 variables over a carrier of 100 elements would require 1,000,000 assignments; the limit prevents this from running for minutes. When exceeded, the function returns GatError::ModelError rather than silently skipping. The VCS layer reports this as a non-blocking warning.

6.13 Pullbacks: finding shared structure

Colimits combine theories. Pullbacks go the other direction: given two morphisms \(m_1\colon T_1 \to C\) and \(m_2\colon T_2 \to C\) into a common codomain, the pullback theory \(P\) contains a sort for each pair \((s_1, s_2)\) where \(m_1(s_1) = m_2(s_2)\) and the arities match, an operation for each compatible pair of operations, and an equation for each compatible pair of equations.

pub struct PullbackResult {
    pub theory: Theory,
    pub proj1: TheoryMorphism,
    pub proj2: TheoryMorphism,
}

The result includes projection morphisms \(\pi_1\colon P \to T_1\) and \(\pi_2\colon P \to T_2\) satisfying \(m_1 \circ \pi_1 = m_2 \circ \pi_2\). Both projections are validated by check_morphism in the test suite.

When both sides contribute an element with the same name, the pullback element keeps that name. When the names differ, it gets a compound name: pulling back theories where V1 and V2 both map to Vertex in the codomain produces a pullback sort named V1=V2.

The VCS merge algorithm is the primary consumer. When merging branches A and B, the three-way merge computes the pullback of the two branch theories over the merge base theory. Non-empty pullback elements indicate shared structure that must be reconciled; an empty pullback means the branches touched disjoint parts of the schema.

graph LR
    P["Pullback P"]
    T1["Theory T1"]
    T2["Theory T2"]
    C["Codomain C"]

    P -->|"proj1"| T1
    P -->|"proj2"| T2
    T1 -->|"m1"| C
    T2 -->|"m2"| C

    style P fill:#e8f4e8,stroke:#2d7d2d
Figure 6.3: Pullback of two theories over a common codomain. The pullback contains paired elements that agree in the codomain.

6.14 Natural transformations: maps between mappings

Sometimes you have two morphisms \(F, G\colon \mathcal{T}_1 \to \mathcal{T}_2\)—two different migration paths between the same pair of theories—and you need to relate them. A natural transformation \(\alpha\colon F \Rightarrow G\) assigns to each sort \(S\) a term \(\alpha_S\) that converts values of sort \(F(S)\) into values of sort \(G(S)\), subject to a coherence condition: for each operation \(f\colon S_1 \to S_2\), the naturality square must commute (\(\alpha_{S_2} \circ F(f) = G(f) \circ \alpha_{S_1}\)).

pub struct NaturalTransformation {
    pub name: Arc<str>,
    pub source: Arc<str>,
    pub target: Arc<str>,
    pub components: HashMap<Arc<str>, Term>,
}

Each component is a Term with a single free variable "x" representing the input value. For the identity transformation, every component is Term::Var("x").

The checker, check_natural_transformation, verifies four conditions: domain/codomain agreement (\(F\) and \(G\) must share a domain and codomain), component completeness (every sort in the domain needs a component), component validity (all operations in the component terms must exist in the codomain), and naturality itself (verified symbolically by substituting and comparing term trees).

6.14.1 Vertical and horizontal composition

Natural transformations compose in two ways. Vertical composition chains two transformations end to end: given \(\alpha\colon F \Rightarrow G\) and \(\beta\colon G \Rightarrow H\), vertical_compose produces \(\beta \circ \alpha\colon F \Rightarrow H\) by substituting \(\alpha\)’s components into \(\beta\)’s. The function checks that alpha.target == beta.source—the intermediate morphism must match—and returns GatError::NatTransComposeMismatch if they disagree.

Horizontal composition combines transformations along different morphism pairs: given \(\alpha\colon F \Rightarrow G\) (between \(T_1\) and \(T_2\)) and \(\beta\colon H \Rightarrow K\) (between \(T_2\) and \(T_3\)), horizontal_compose produces \(\beta * \alpha\colon H \circ F \Rightarrow K \circ G\). The VCS merge pipeline uses horizontal composition when composing migration paths through different branch histories.

6.15 Free models: minimal valid instances

The free model of a theory is its most economical interpretation—the smallest model that satisfies all equations, with nothing extra thrown in. Its carrier sets are equivalence classes of closed terms (terms with no free variables), quotiented by the equations.

The construction proceeds in three phases. First, term generation: enumerate all closed terms up to a configurable depth, organized by sort. Nullary operations (constants) provide the seed; non-nullary operations are applied to all combinations of existing terms of the correct sorts. Second, quotient by equations: use a union-find to identify terms that the equations force to be equal. For each equation, enumerate all valid variable substitutions using the generated terms and union the LHS and RHS. Third, build the model: select one representative per equivalence class as the carrier, and define operations by symbolic term application.

pub struct FreeModelConfig {
    pub max_depth: usize,           // default: 3
    pub max_terms_per_sort: usize,  // default: 1000
}

The depth bound controls how deep the enumeration goes. For a theory with a successor operation, depth 3 generates zero, succ(zero), succ(succ(zero)), and succ(succ(succ(zero))). The per-sort limit prevents combinatorial explosion in theories with binary operations.

Some illustrative cases: a pointed set theory (one sort, one constant) has a free model with exactly 1 element. Two constants and no equations: 2 elements. Two constants with \(a = b\): 1 element (the equation collapses them). A monoid with identity laws: at depth 1, the free model collapses mul(unit, x) and mul(x, unit) with x, leaving only the identity element. A graph theory with no constants has empty carriers, since there are no closed terms to start from.

The schema scaffold CLI command uses free_model to generate skeleton instances. Its --depth and --max-terms flags map directly to FreeModelConfig.

6.16 Quotient theories: identifying equivalent elements

Colimits glue theories together along shared structure. Quotients do something complementary: they collapse existing structure by declaring that certain elements are the same.

Given a theory and a list of identification pairs \((a, b)\), the quotient function classifies each pair as a sort or operation identification, builds a union-find for each, computes the transitive closure, verifies compatibility (identified sorts must have the same arity; identified operations must have compatible signatures after sort renaming), and rebuilds the theory with one representative per equivalence class. Sort parameters, operation inputs, and equation terms are all renamed through the resulting maps. Equations that become duplicates after renaming are deduplicated.

Quotients are most useful when composing protocol fragments that have structurally equivalent but differently-named elements. If protocol A has sort Node and protocol B has sort Vertex, and both play the same role in the combined theory, quotienting by (Node, Vertex) collapses them into one sort. The alphabetically-first representative is selected deterministically.

Errors: GatError::QuotientIncompatible when identified elements have incompatible structure, and the usual SortNotFound / OpNotFound when a name referenced during rebuilding does not exist.

6.17 Schema transforms and factorization

The schema_functor and factorize modules extend the GAT engine with theory endofunctors and morphism decomposition—the machinery that the protolens engine (Chapter 19) builds on.

A theory endofunctor \(\Phi\) maps theories to theories. It pairs a precondition (which theories it applies to) with a transformation. The interesting part is factorization: given a morphism \(F\colon T_1 \to T_2\), decompose it into a sequence of elementary endofunctors, following a fixed phase ordering:

  1. Drops (equations, then operations, then sorts): remove elements not in the codomain.
  2. Renames (sorts, then operations): apply the sort-map and op-map.
  3. Adds (sorts topologically sorted by dependencies, then operations, then equations): introduce elements in the codomain but not in the (renamed) domain.
        /// The new sort name.
        new: Arc<str>,
    },
    /// Add an operation.
    AddOp(Operation),
    /// Drop an operation and dependent equations.
    DropOp(Arc<str>),
    /// Rename an operation.
    RenameOp {

The ordering matters. Drops come before renames because renaming a sort that will be dropped wastes work and risks name collisions. Renames come before adds because a newly-added sort might share a name with one being renamed away. Violating this ordering can produce intermediate theories that are ill-formed.

A key design principle is cascading: dropping a sort automatically drops all operations that reference it, and dropping an operation automatically drops all equations that reference it. TheoryTransform::apply always produces a well-formed theory; you cannot accidentally leave a dangling reference.

Endofunctors compose via TheoryEndofunctor::compose. The precondition of the composite conservatively requires the first endofunctor’s precondition and defers the second’s to runtime checking.

The Factorization struct holds the ordered step sequence, and validate_factorization applies the steps to the domain and checks that the result matches the codomain—an end-to-end correctness guarantee. This factorization is the foundation of the automated lens generation pipeline (Chapter 20): each elementary step maps directly to a protolens constructor, and the full sequence becomes a ProtolensChain.

Cartmell, John. 1986. “Generalised Algebraic Theories and Contextual Categories.” Annals of Pure and Applied Logic 32: 209–43. https://doi.org/10.1016/0168-0072(86)90053-9.
Lawvere, F. William. 1963. “Functorial Semantics of Algebraic Theories.” PhD thesis, Columbia University.
Lynch, Owen, Kris Brown, James Fairbanks, and Evan Patterson. 2024. GATlab: Modeling and Programming with Generalized Algebraic Theories.” arXiv Preprint arXiv:2404.04837, ahead of print. https://doi.org/10.48550/arXiv.2404.04837.
Spivak, David I. 2012. “Functorial Data Migration.” Information and Computation 217: 31–51. https://arxiv.org/abs/1009.1166.