14 Self-Description and Building Blocks

Throughout this tutorial, theories, schemas, instances, and migrations have been presented as distinct concepts. This chapter reveals that they are all instances of the same framework, organized into four levels. Each level is defined as a model of the level below it. A concrete schema is a model of a theory specification, which is itself data interpreted by the GAT engine. This self-similarity has a practical payoff: the mechanism for extending the system is the same mechanism the system uses on itself.

Following the GATlab design (Lynch et al. 2024):

14.1 Four levels

graph TB
    L0["Level 0: GAT Engine<br/>(hardcoded Rust, ~2000 lines)"]
    L1["Level 1: Theory Specifications<br/>(GAT data: ThGraph, ThConstraint, …)"]
    L2["Level 2: Concrete Schemas<br/>(models of Level 1 theories)"]
    L3["Level 3: Concrete Instances<br/>(models of Level 2 schemas)"]
    L0 -->|interprets| L1
    L1 -->|parameterizes| L2
    L2 -->|governs| L3

Figure 14.1: The four levels of panproto’s self-description hierarchy.

14.1.1 Level 0: The GAT engine

Approximately 2,000 lines of Rust implement the core GAT machinery: sorts, operations, equations, theory mappings, composition, and model checking. This is the only part of panproto that isn’t data-driven. It’s the interpreter for everything above.

14.1.2 Level 1: Theory specifications

The theories used throughout this tutorial (ThGraph, ThConstraint, ThConstrainedMultiGraph, and the protocol-specific theories for ATProto, SQL, and so on) are all data at this level. They’re values of type Theory, constructed by the Level 0 engine. Adding a new schema language means adding new Level 1 data; no Rust code changes.

14.1.3 Level 2: Concrete schemas

A schema is a model of a Level 1 theory (Chapter 3 for the definition). Writing an ATProto lexicon means constructing a model of ThConstrainedMultiGraph. Writing SQL DDL means constructing a model of the SQL protocol theory. The schema is governed by the theory it instantiates, just as a value is governed by its type.

14.1.4 Level 3: Concrete instances

The actual data (a Bluesky post, a database row, a Protobuf message) lives at Level 3. An instance is a model of a Level 2 schema. It’s the concrete content that conforms to a specific schema within a specific protocol.

Note

Migrations happen at every level, using the same engine:

Level 1 to Level 1: A theory mapping transforms one schema language into another
Level 2 to Level 2: A schema mapping transforms one schema into another within the same protocol
Level 3 to Level 3: Data migration via restrict transforms instances to conform to a new schema

The machinery is uniform. The same lift, get, put, and checkExistence operations work at each level.

14.2 ThGAT: panproto can describe its own type system

The theory of GATs is itself a GAT. panproto defines ThGAT: a theory whose models are themselves theories.

ThGAT has:

Sorts: Sort, Op, Eq, Theory, Param(s: Sort), Name
Operations: sort_name, op_name, op_output, op_param, eq_lhs, eq_rhs, and others that specify how sorts, operations, and equations relate within a theory

A model of ThGAT is a concrete theory (ThGraph, ThConstraint, or any protocol theory). This isn’t a metaphor—it’s a literal statement about the GAT formalism. The same Theory type that represents ThGraph also represents ThGAT. The same model-checking machinery that validates a schema against ThGraph validates a theory against ThGAT.

14.2.1 Why self-description matters

Self-description has three practical consequences.

Versioning the theory language. If you need to extend what a “sort” or “operation” can express (say, adding support for dependent sorts with multiple parameters), you can define a new version of ThGAT and construct a theory mapping from the old version to the new one. The migration engine then handles the upgrade of all Level 1 data automatically. Without self-description, extending the theory language would require rewriting the Level 0 engine and manually migrating all existing theory definitions. With it, the extension is a data migration like any other.

Uniform tooling. Because theories are models of ThGAT, all the tooling that operates on models (diff, existence checking, lifting, lenses) can be applied to theories themselves. You can diff two versions of a protocol’s theory, check whether a migration between them exists, and lift schemas through the change. The tooling isn’t specialized per level; it’s generic and uniform.

Formal guarantees at every level. The correctness properties that panproto provides (round-tripping for lenses, completeness of existence checking, correct migration behavior) hold at every level. A theory mapping between Level 1 theories satisfies the same laws as a schema mapping between Level 2 schemas. This uniformity follows directly from the self-description architecture: every level is an instance of the same framework.

Exercise: The circularity problem

ThGAT is a model of itself, which sounds circular. How is the circularity resolved?

Answer

The circularity is broken by Level 0: the GAT engine is hardcoded Rust, not data. ThGAT describes what a theory is, but the engine that interprets ThGAT is outside the self-description loop.

14.2.2 The system is closed

There’s an appealing sense in which panproto is closed. The only code that can’t be replaced by data is Level 0 (the GAT engine, which interprets everything above it). Everything at Level 1 and above is defined by the level below and can be extended, versioned, and migrated using the same engine.

This is the sense in which panproto is a general-purpose schematic version control system. It’s not general because it supports every protocol (it doesn’t, yet). It’s general because the mechanism for adding a new protocol is the same mechanism that the existing protocols use, and that mechanism is itself described within the system.

14.3 The building-block catalog

panproto’s 76 protocol definitions are assembled from 27 building-block theories by merging them together (the composition operation from Chapter 9). Each building block captures a structural distinction that affects migration correctness: whether collections are ordered, whether a schema supports sum types (like Protobuf oneof or GraphQL unions), whether references can be recursive, or whether a single vertex can have multiple outgoing edges of the same kind.

The six key building blocks:

Theory	What it captures
ThGraph	Directed binary relationships. The foundation: vertices and edges with `src`/`tgt` operations. Almost every protocol starts here.
ThMultigraph	Parallel edges between the same vertices. Protocols that allow multiple fields of the same type (JSON Schema, ATProto) need this.
ThHypergraph	N-ary relationships. SQL foreign keys spanning multiple columns, or GraphQL arguments, require edges that connect more than two vertices.
ThReflexiveGraph	Identity/default edges with equations `src(id(v))=v` and `tgt(id(v))=v`. Protocols with default values or self-referential types (recursive JSON, XML) use this.
ThCategory	Composable edges with an associativity equation. Protocols with transitive relationships (RDF inference, permission hierarchies) use this.
ThConstrainedGraph	Graph + validation constraints. The colimit of ThGraph and ThConstraint. This is the base for most real-world protocols.

The full catalog of all 27 theories is in Appendix C.

14.3.1 Structure vs. validation

Some schema properties can be expressed directly as sorts, operations, and equations in a theory. Others are inherently computational: you can’t capture “matches this regex” or “is at most 3000 characters” as an equation between algebraic operations.

The distinction matters. Structural properties propagate automatically through migrations and theory composition. Validation constraints must be checked separately.

Structural (captured by theory equations):

Edge uniqueness: at most one edge between any two vertices (ThSimpleGraph)
Optionality: a field that can be present or absent, with a round-trip guarantee (ThPartial)
Recursive types: types that can contain themselves, like a tree node with child nodes (ThRecursion)
Sum types: a value that is one of several alternatives, like Protobuf oneof (ThCoproduct)
Reflexivity: every vertex has a self-loop (identity edge), useful for modeling defaults (ThReflexiveGraph)
Symmetry: if there’s an edge from A to B, there’s also one from B to A (ThSymmetricGraph)

Validation (requires ThConstraint):

maxLength, minLength: arithmetic bounds on string/array size
pattern: regular expression matching
minimum, maximum: numeric range bounds
format (date-time, email, URI): domain-specific format checks

Exercise: Deciding what’s structural

If a property can be expressed as an equation between sorts and operations, should it be structural? What determines when a property belongs in the validation layer instead?

Answer

If the property can be expressed as an equation between sorts and operations (e.g., “every vertex has a self-loop,” “foreign key columns belong to foreign key tables”), make it structural. Structural properties propagate through migrations automatically and are checked at compose time. If the property requires arithmetic, string matching, or domain-specific logic (e.g., “length ≤ 3000,” “matches this regex”), it belongs in the validation layer. The test is whether the property can be stated purely in terms of the theory’s algebraic vocabulary.

14.4 Schema-side theories

14.4.1 Graph shapes (6)

Theory	Sorts	Equations	Captures
ThGraph	Vertex, Edge		Directed binary relationships
ThSimpleGraph	Vertex, Edge(s,t)		Edge uniqueness via dependent sort
ThHypergraph	Vertex, HyperEdge, Label		N-ary relationships
ThReflexiveGraph	Vertex, Edge	src(id(v))=v, tgt(id(v))=v	Identity/default edges
ThSymmetricGraph	Vertex, Edge	src(inv(e))=tgt(e), inv(inv(e))=e	Undirected relationships
ThPetriNet	Place, Transition, Token		Concurrent structure

14.4.2 Modifiers (8)

Theory	Sorts	Equations	Captures
ThConstraint	Vertex, Constraint(v)		Validation constraints
ThMulti	Vertex, Edge, EdgeLabel		Parallel edges
ThInterface	Vertex, Interface		Subtyping/polymorphism
ThOrder	Edge, Position		Ordered collections
ThCoproduct	Vertex, Variant, Tag	variant_of(injection(v))=v	Sum types/unions
ThRecursion	Vertex, Mu	unfold(fold(v))=v	Recursive types
ThPartial	Vertex, Defined	defined(witness(d))=d	Optionality as structure
ThLinear	Edge, Usage		Use-counting (linear/affine)

14.4.3 Structural (4)

Theory	Sorts	Captures
ThNominal	Vertex, Name	Nominal identity
ThSpan	Vertex, Span	Diffs/correspondences
ThCospan	Vertex, Apex	Merge targets
ThOperad	Color, MOperation	Fan-in aggregation

14.5 Instance-side theories (6)

Theory	Sorts	Captures
ThWType	Node, Arc, Value	Tree-shaped data
ThFunctor	Table, Row, ForeignKey	Relational/tabular data
ThFlat	Node, Field, Value	Key-value data
ThGraphInstance	IVertex, IEdge, Value, Vertex, Edge	Graph-shaped data (most general)
ThAnnotation	Vertex, Annotation, AnnotationKey, Value	Out-of-band metadata
ThCausal	Event, Timestamp, Before(e1,e2)	Causal/temporal ordering

14.6 How protocols compose building blocks

Protocols compose building blocks via colimit into theory groups:

Group	Schema theory	Instance theory	Count	Examples
A	colimit(Graph, Constraint, Multi)	WType	30	ATProto, JSON Schema, OpenAPI
B	colimit(Hypergraph, Constraint)	Functor	11	SQL, Cassandra, Parquet
C	colimit(SimpleGraph, Constraint)	Flat	9	Protobuf, Avro, Thrift
D	colimit(Graph, Constraint, Multi, Interface)	WType	16	GraphQL, TypeScript, Rust
E	colimit(Graph, Constraint, Multi)	colimit(WType, Meta)	9	XML, HTML, CSS
F	colimit(Graph, Constraint, Multi)	GraphInstance	1	Neo4j

Each protocol opts into additional theories (ThOrder, ThCoproduct, etc.) by composing them into its schema theory via colimit.

14.7 Equations as structural invariants

Only 6 of the 27 theories have equations. Each has been type-checked against the theory’s sort declarations. As of v0.6, this type-checking happens automatically at commit time. When you schema commit a theory that includes any of these building blocks, panproto verifies that every equation is well-sorted in the composed theory. If a colimit introduces a sort mismatch (by identifying two sorts that break an equation’s typing), the commit is rejected with a diagnostic pointing to the offending equation.

The equations and their sort-checking:

ThRecursion: unfold(fold(v)) = v. fold(v) has sort Mu; unfold(fold(v)) has sort Vertex. The right side v also has sort Vertex. Both sides match.

ThReflexiveGraph: src(id(v)) = v and tgt(id(v)) = v. id(v) has sort Edge; src(id(v)) has sort Vertex, matching v.

ThSymmetricGraph: src(inv(e)) = tgt(e), tgt(inv(e)) = src(e), inv(inv(e)) = e. All sides have matching sorts (Vertex, Vertex, Edge respectively).

ThCoproduct: variant_of(injection(v)) = v. Both sides have sort Variant.

ThPartial: defined(witness(d)) = d. Both sides have sort Defined.

Exercise: Composition and type safety

What happens at commit time when two building blocks are composed via colimit and a sort identification makes an equation ill-typed?

Answer

panproto rejects the composition at commit time. The diagnostic names the equation, the sorts involved, and which sort identification caused the conflict. For example, if composing ThReflexiveGraph with a theory that identifies the Edge and Vertex sorts would make src(id(v)) = v ill-typed (because id(v) would need to be both an Edge and a Vertex), the error message pinpoints the identification and the broken equation.

The next chapter covers testing: confirming that compositions, migrations, and lenses behave correctly on concrete data.

Lynch, Owen, Kris Brown, James Fairbanks, and Evan Patterson. 2024. “GATlab: Modeling and Programming with Generalized Algebraic Theories.” arXiv Preprint arXiv:2404.04837, ahead of print. https://doi.org/10.48550/arXiv.2404.04837.

# Self-Description and Building Blocks {#sec-self-description} Throughout this tutorial, theories, schemas, instances, and migrations have been presented as distinct concepts. This chapter reveals that they are all instances of the same framework, organized into four levels. Each level is defined as a model of the level below it. A concrete schema is a model of a theory specification, which is itself data interpreted by the GAT engine. This self-similarity has a practical payoff: the mechanism for extending the system is the same mechanism the system uses on itself. Following the [GATlab](https://algebraicjulia.github.io/GATlab.jl/) design [@lynch2024]: ## Four levels {#sec-four-levels} ```{mermaid} %%| label: fig-levels %%| fig-cap: "The four levels of panproto's self-description hierarchy." graph TB L0["Level 0: GAT Engine<br/>(hardcoded Rust, ~2000 lines)"] L1["Level 1: Theory Specifications<br/>(GAT data: ThGraph, ThConstraint, …)"] L2["Level 2: Concrete Schemas<br/>(models of Level 1 theories)"] L3["Level 3: Concrete Instances<br/>(models of Level 2 schemas)"] L0 -->|interprets| L1 L1 -->|parameterizes| L2 L2 -->|governs| L3 ``` ### Level 0: The GAT engine Approximately 2,000 lines of Rust implement the core GAT machinery: sorts, operations, equations, theory mappings, composition, and model checking. This is the only part of panproto that isn't data-driven. It's the interpreter for everything above. ### Level 1: Theory specifications The theories used throughout this tutorial (ThGraph, ThConstraint, ThConstrainedMultiGraph, and the protocol-specific theories for ATProto, SQL, and so on) are all *data* at this level. They're values of type `Theory`, constructed by the Level 0 engine. Adding a new schema language means adding new Level 1 data; no Rust code changes. ### Level 2: Concrete schemas A schema is a *model* of a Level 1 theory (@sec-schemas-have-in-common for the definition). Writing an ATProto lexicon means constructing a model of ThConstrainedMultiGraph. Writing SQL DDL means constructing a model of the SQL protocol theory. The schema is governed by the theory it instantiates, just as a value is governed by its type. ### Level 3: Concrete instances The actual data (a Bluesky post, a database row, a Protobuf message) lives at Level 3. An instance is a model of a Level 2 schema. It's the concrete content that conforms to a specific schema within a specific protocol. :::{.callout-note} Migrations happen at every level, using the same engine: - **Level 1 to Level 1**: A theory mapping transforms one schema language into another - **Level 2 to Level 2**: A schema mapping transforms one schema into another within the same protocol - **Level 3 to Level 3**: Data migration via `restrict` transforms instances to conform to a new schema The machinery is uniform. The same `lift`, `get`, `put`, and `checkExistence` operations work at each level. ::: ## ThGAT: panproto can describe its own type system {#sec-thgat} The theory of GATs is itself a GAT. panproto defines **ThGAT**: a theory whose models are themselves theories. ThGAT has: - **Sorts**: `Sort`, `Op`, `Eq`, `Theory`, `Param(s: Sort)`, `Name` - **Operations**: `sort_name`, `op_name`, `op_output`, `op_param`, `eq_lhs`, `eq_rhs`, and others that specify how sorts, operations, and equations relate within a theory A *model* of ThGAT is a concrete theory (ThGraph, ThConstraint, or any protocol theory). This isn't a metaphor—it's a literal statement about the GAT formalism. The same `Theory` type that represents ThGraph also represents ThGAT. The same model-checking machinery that validates a schema against ThGraph validates a theory against ThGAT. ### Why self-description matters Self-description has three practical consequences. **Versioning the theory language.** If you need to extend what a "sort" or "operation" can express (say, adding support for dependent sorts with multiple parameters), you can define a new version of ThGAT and construct a theory mapping from the old version to the new one. The migration engine then handles the upgrade of all Level 1 data automatically. Without self-description, extending the theory language would require rewriting the Level 0 engine and manually migrating all existing theory definitions. With it, the extension is a *data migration* like any other. **Uniform tooling.** Because theories are models of ThGAT, all the tooling that operates on models (diff, existence checking, lifting, lenses) can be applied to theories themselves. You can diff two versions of a protocol's theory, check whether a migration between them exists, and lift schemas through the change. The tooling isn't specialized per level; it's generic and uniform. **Formal guarantees at every level.** The correctness properties that panproto provides (round-tripping for lenses, completeness of existence checking, correct migration behavior) hold at every level. A theory mapping between Level 1 theories satisfies the same laws as a schema mapping between Level 2 schemas. This uniformity follows directly from the self-description architecture: every level is an instance of the same framework. :::{.callout-caution} ## Exercise: The circularity problem ThGAT is a model of itself, which sounds circular. How is the circularity resolved? ::: ::: {.callout-tip collapse=true} ## Answer The circularity is broken by Level 0: the GAT engine is hardcoded Rust, not data. ThGAT describes what a theory *is*, but the engine that *interprets* ThGAT is outside the self-description loop. ::: ### The system is closed There's an appealing sense in which panproto is *closed*. The only code that can't be replaced by data is Level 0 (the GAT engine, which interprets everything above it). Everything at Level 1 and above is defined by the level below and can be extended, versioned, and migrated using the same engine. This is the sense in which panproto is a general-purpose schematic version control system. It's not general because it supports every protocol (it doesn't, yet). It's general because the mechanism for adding a new protocol is the same mechanism that the existing protocols use, and that mechanism is itself described within the system. ## The building-block catalog {#sec-building-block-catalog} panproto's 76 protocol definitions are assembled from 27 building-block theories by merging them together (the composition operation from @sec-building-protocol). Each building block captures a structural distinction that affects migration correctness: whether collections are ordered, whether a schema supports sum types (like Protobuf `oneof` or GraphQL unions), whether references can be recursive, or whether a single vertex can have multiple outgoing edges of the same kind. The six key building blocks: | Theory | What it captures | |--------|-----------------| | **ThGraph** | Directed binary relationships. The foundation: vertices and edges with `src`/`tgt` operations. Almost every protocol starts here. | | **ThMultigraph** | Parallel edges between the same vertices. Protocols that allow multiple fields of the same type (JSON Schema, ATProto) need this. | | **ThHypergraph** | N-ary relationships. SQL foreign keys spanning multiple columns, or GraphQL arguments, require edges that connect more than two vertices. | | **ThReflexiveGraph** | Identity/default edges with equations `src(id(v))=v` and `tgt(id(v))=v`. Protocols with default values or self-referential types (recursive JSON, XML) use this. | | **ThCategory** | Composable edges with an associativity equation. Protocols with transitive relationships (RDF inference, permission hierarchies) use this. | | **ThConstrainedGraph** | Graph + validation constraints. The colimit of ThGraph and ThConstraint. This is the base for most real-world protocols. | The full catalog of all 27 theories is in [Appendix C](../appendices/C-protocol-catalog.qmd). ### Structure vs. validation Some schema properties can be expressed directly as sorts, operations, and equations in a theory. Others are inherently computational: you can't capture "matches this regex" or "is at most 3000 characters" as an equation between algebraic operations. The distinction matters. Structural properties propagate automatically through migrations and theory composition. Validation constraints must be checked separately. **Structural (captured by theory equations):** - Edge uniqueness: at most one edge between any two vertices (ThSimpleGraph) - Optionality: a field that can be present or absent, with a round-trip guarantee (ThPartial) - Recursive types: types that can contain themselves, like a tree node with child nodes (ThRecursion) - Sum types: a value that is one of several alternatives, like Protobuf `oneof` (ThCoproduct) - Reflexivity: every vertex has a self-loop (identity edge), useful for modeling defaults (ThReflexiveGraph) - Symmetry: if there's an edge from A to B, there's also one from B to A (ThSymmetricGraph) **Validation (requires ThConstraint):** - `maxLength`, `minLength`: arithmetic bounds on string/array size - `pattern`: regular expression matching - `minimum`, `maximum`: numeric range bounds - `format` (date-time, email, URI): domain-specific format checks :::{.callout-caution} ## Exercise: Deciding what's structural If a property can be expressed as an equation between sorts and operations, should it be structural? What determines when a property belongs in the validation layer instead? ::: ::: {.callout-tip collapse=true} ## Answer If the property can be expressed as an equation between sorts and operations (e.g., "every vertex has a self-loop," "foreign key columns belong to foreign key tables"), make it structural. Structural properties propagate through migrations automatically and are checked at compose time. If the property requires arithmetic, string matching, or domain-specific logic (e.g., "length ≤ 3000," "matches this regex"), it belongs in the validation layer. The test is whether the property can be stated purely in terms of the theory's algebraic vocabulary. ::: ## Schema-side theories ### Graph shapes (6) | Theory | Sorts | Equations | Captures | |---|---|---|---| | ThGraph | Vertex, Edge | | Directed binary relationships | | ThSimpleGraph | Vertex, Edge(s,t) | | Edge uniqueness via dependent sort | | ThHypergraph | Vertex, HyperEdge, Label | | N-ary relationships | | ThReflexiveGraph | Vertex, Edge | src(id(v))=v, tgt(id(v))=v | Identity/default edges | | ThSymmetricGraph | Vertex, Edge | src(inv(e))=tgt(e), inv(inv(e))=e | Undirected relationships | | ThPetriNet | Place, Transition, Token | | Concurrent structure | ### Modifiers (8) | Theory | Sorts | Equations | Captures | |---|---|---|---| | ThConstraint | Vertex, Constraint(v) | | Validation constraints | | ThMulti | Vertex, Edge, EdgeLabel | | Parallel edges | | ThInterface | Vertex, Interface | | Subtyping/polymorphism | | ThOrder | Edge, Position | | Ordered collections | | ThCoproduct | Vertex, Variant, Tag | variant_of(injection(v))=v | Sum types/unions | | ThRecursion | Vertex, Mu | unfold(fold(v))=v | Recursive types | | ThPartial | Vertex, Defined | defined(witness(d))=d | Optionality as structure | | ThLinear | Edge, Usage | | Use-counting (linear/affine) | ### Structural (4) | Theory | Sorts | Captures | |---|---|---| | ThNominal | Vertex, Name | Nominal identity | | ThSpan | Vertex, Span | Diffs/correspondences | | ThCospan | Vertex, Apex | Merge targets | | ThOperad | Color, MOperation | Fan-in aggregation | ## Instance-side theories (6) | Theory | Sorts | Captures | |---|---|---| | ThWType | Node, Arc, Value | Tree-shaped data | | ThFunctor | Table, Row, ForeignKey | Relational/tabular data | | ThFlat | Node, Field, Value | Key-value data | | ThGraphInstance | IVertex, IEdge, Value, Vertex, Edge | Graph-shaped data (most general) | | ThAnnotation | Vertex, Annotation, AnnotationKey, Value | Out-of-band metadata | | ThCausal | Event, Timestamp, Before(e1,e2) | Causal/temporal ordering | ## How protocols compose building blocks {#sec-theory-groups} Protocols compose building blocks via colimit into theory groups: | Group | Schema theory | Instance theory | Count | Examples | |---|---|---|---|---| | A | colimit(Graph, Constraint, Multi) | WType | 30 | ATProto, JSON Schema, OpenAPI | | B | colimit(Hypergraph, Constraint) | Functor | 11 | SQL, Cassandra, Parquet | | C | colimit(SimpleGraph, Constraint) | Flat | 9 | Protobuf, Avro, Thrift | | D | colimit(Graph, Constraint, Multi, Interface) | WType | 16 | GraphQL, TypeScript, Rust | | E | colimit(Graph, Constraint, Multi) | colimit(WType, Meta) | 9 | XML, HTML, CSS | | F | colimit(Graph, Constraint, Multi) | GraphInstance | 1 | Neo4j | Each protocol opts into additional theories (ThOrder, ThCoproduct, etc.) by composing them into its schema theory via colimit. ## Equations as structural invariants Only 6 of the 27 theories have equations. Each has been type-checked against the theory's sort declarations. As of v0.6, this type-checking happens automatically at commit time. When you `schema commit` a theory that includes any of these building blocks, panproto verifies that every equation is well-sorted in the composed theory. If a colimit introduces a sort mismatch (by identifying two sorts that break an equation's typing), the commit is rejected with a diagnostic pointing to the offending equation. The equations and their sort-checking: **ThRecursion**: `unfold(fold(v)) = v`. `fold(v)` has sort Mu; `unfold(fold(v))` has sort Vertex. The right side `v` also has sort Vertex. Both sides match. **ThReflexiveGraph**: `src(id(v)) = v` and `tgt(id(v)) = v`. `id(v)` has sort Edge; `src(id(v))` has sort Vertex, matching `v`. **ThSymmetricGraph**: `src(inv(e)) = tgt(e)`, `tgt(inv(e)) = src(e)`, `inv(inv(e)) = e`. All sides have matching sorts (Vertex, Vertex, Edge respectively). **ThCoproduct**: `variant_of(injection(v)) = v`. Both sides have sort Variant. **ThPartial**: `defined(witness(d)) = d`. Both sides have sort Defined. :::{.callout-caution} ## Exercise: Composition and type safety What happens at commit time when two building blocks are composed via colimit and a sort identification makes an equation ill-typed? ::: ::: {.callout-tip collapse=true} ## Answer panproto rejects the composition at commit time. The diagnostic names the equation, the sorts involved, and which sort identification caused the conflict. For example, if composing ThReflexiveGraph with a theory that identifies the Edge and Vertex sorts would make `src(id(v)) = v` ill-typed (because `id(v)` would need to be both an Edge and a Vertex), the error message pinpoints the identification and the broken equation. ::: The next chapter covers testing: confirming that compositions, migrations, and lenses behave correctly on concrete data.