graph LR
ThVertex -->|includes Vertex| ThGraph
ThVertex -->|includes Vertex| ThConstraint
ThGraph -->|compose leg| ThConstrainedGraph
ThConstraint -->|compose leg| ThConstrainedGraph
9 Building Your Own Protocol
A protocol is defined by composing building-block theories via colimit—the categorical operation that merges two structures while identifying anything they share. Two theories that both mention a \(\mathrm{Vertex}\) sort get merged into one theory with a single \(\mathrm{Vertex}\), not two copies of it. This chapter walks through building a Kubernetes CRD protocol from \(\mathrm{colimit}(\text{ThConstrainedGraph}, \text{ThWType})\).
9.1 How composition works
When two theories \(T_1\) and \(T_2\) share a common sub-theory \(S\) through theory morphisms, the colimit \(T_1 +_S T_2\) merges them carefully: shared structure gets identified (never duplicated), all sorts and operations from both sides survive, and equations from both are preserved.1 The Rust API is a single function:
pub fn compose(t1: &Theory, t2: &Theory, shared: &Theory) -> Result<Theory, GatError>;A concrete example. Suppose we have two theories:
- ThGraph with sorts \(\{\mathrm{Vertex}, \mathrm{Edge}\}\) and operations \(\{\mathrm{src}: \mathrm{Edge} \to \mathrm{Vertex},\; \mathrm{tgt}: \mathrm{Edge} \to \mathrm{Vertex}\}\).
- ThConstraint with sorts \(\{\mathrm{Vertex}, \mathrm{Constraint}\}\) and operations \(\{\mathrm{target}: \mathrm{Constraint} \to \mathrm{Vertex}\}\).
Both mention \(\mathrm{Vertex}\). The shared base ThVertex contains just \(\{\mathrm{Vertex}\}\) with no operations.
When we compose, calling \(\mathrm{ThConstrainedGraph} = \mathrm{compose}(\mathrm{ThGraph}, \mathrm{ThConstraint}, \mathrm{ThVertex})\), we get:
- Sorts: \(\{\mathrm{Vertex}, \mathrm{Edge}, \mathrm{Constraint}\}\). \(\mathrm{Vertex}\) appears once, not twice.
- Operations: \(\{\mathrm{src}, \mathrm{tgt}, \mathrm{target}\}\). All operations are retained.
Each composition leg is a theory morphism tracking where each sort and operation came from. These provenance records are crucial for computing migrations later. The compositional approach was pioneered by GATlab, Julia’s computational category theory framework (Lynch et al. 2024). panproto follows the same algebraic blueprint, implemented in Rust for the WASM target.
9.2 Building a custom protocol: Kubernetes CRDs
Kubernetes Custom Resource Definitions (CRDs) define the shape of custom API objects: groups, versions, resource kinds, and typed fields with validation. To model this in panproto, pick the right building-block theories and compose them.
Here’s how you define the protocol:
import { Panproto } from '@panproto/core';
const panproto = await Panproto.init();
const k8sCrd = panproto.defineProtocol({
name: 'k8s-crd',
schemaTheory: 'ThConstrainedGraph',
instanceTheory: 'ThWType',
edgeRules: [
{ edgeKind: 'field', srcKinds: ['object'], tgtKinds: [] },
{ edgeKind: 'nested', srcKinds: ['object'], tgtKinds: ['object'] },
{ edgeKind: 'items', srcKinds: ['array'], tgtKinds: [] },
{ edgeKind: 'version', srcKinds: ['group'], tgtKinds: ['object'] },
],
objKinds: ['group', 'object', 'array'],
constraintSorts: ['required', 'immutable', 'minimum', 'maximum', 'pattern', 'enum'],
});This declares what a CRD schema can contain: API groups, objects, arrays, fields. Behind the scenes, panproto merges the component theories using colimit. With the protocol defined, build actual schemas within it:
const certManagerCrd = k8sCrd.schema()
.vertex('certificates.cert-manager.io', 'group')
.vertex('spec', 'object')
.vertex('spec.secretName', 'string')
.vertex('spec.issuerRef', 'object')
.vertex('spec.issuerRef.name', 'string')
.vertex('spec.issuerRef.kind', 'string')
.vertex('spec.dnsNames', 'array')
.vertex('spec.dnsNames.item', 'string')
.edge('certificates.cert-manager.io', 'spec', 'version', { name: 'v1' })
.edge('spec', 'spec.secretName', 'field', { name: 'secretName' })
.edge('spec', 'spec.issuerRef', 'nested', { name: 'issuerRef' })
.edge('spec.issuerRef', 'spec.issuerRef.name', 'field', { name: 'name' })
.edge('spec.issuerRef', 'spec.issuerRef.kind', 'field', { name: 'kind' })
.edge('spec', 'spec.dnsNames', 'field', { name: 'dnsNames' })
.edge('spec.dnsNames', 'spec.dnsNames.item', 'items', {})
.constraint('spec.secretName', 'required', 'true')
.constraint('spec.issuerRef.name', 'required', 'true')
.constraint('spec.issuerRef.kind', 'enum', '["Issuer","ClusterIssuer"]')
.build();This models a simplified cert-manager Certificate CRD: the structural shape (groups, nested objects, arrays) plus validation constraints like required and enum.
9.3 Everything works immediately
Since the protocol is defined as a composed theory, every tool in panproto’s machinery operates on it with no additional code.
Diff detects what changed between versions:
const report = panproto.diff(certV1, certV2);
// reports: added field "duration" in spec,
// changed constraint on issuerRef.kind (added "ExternalIssuer")Existence checking tells you whether a migration between versions can be constructed:
const existence = panproto.checkExistence(certV1, certV2, migration);
// reports whether the proposed migration handles all changesLifting migrates actual data through the schema change:
const newCert = migration.lift(oldCert);
// adds default "duration" field, preserves existing valuesLens decomposition gives bidirectional access:
const view = migration.get(oldCert); // forward view
const restored = migration.put(view, complement); // round-tripAll of this came free from defining the theory. The generic engine did the work.
If \(T_1\) and \(T_2\) both impose equations on the shared sort \(\mathrm{Vertex}\), what guarantees those equations remain consistent in \(T_1 +_S T_2\)? Under what conditions could a colimit produce contradictory equations?
The colimit preserves all equations from both sides. If \(T_1\) requires src(id(v)) = v and \(T_2\) requires that no vertex has a self-loop, the colimit contains both, creating a contradiction. No model can satisfy both. panproto detects this at commit time during equation type-checking and rejects the composition.
9.4 An alternative: Terraform HCL
The same composition approach applies to Terraform HCL, modeling infrastructure-as-code resource definitions:
const terraformHcl = panproto.defineProtocol({
name: 'terraform-hcl',
schemaTheory: 'ThConstrainedGraph',
instanceTheory: 'ThWType',
edgeRules: [
{ edgeKind: 'attribute', srcKinds: ['block'], tgtKinds: [] },
{ edgeKind: 'subblock', srcKinds: ['block'], tgtKinds: ['block'] },
],
objKinds: ['block'],
constraintSorts: ['required', 'sensitive', 'computed', 'forceNew'],
});
const awsInstance = terraformHcl.schema()
.vertex('aws_instance', 'block')
.vertex('aws_instance.ami', 'string')
.vertex('aws_instance.instance_type', 'string')
.vertex('aws_instance.tags', 'block')
.vertex('aws_instance.tags.Name', 'string')
.edge('aws_instance', 'aws_instance.ami', 'attribute', { name: 'ami' })
.edge('aws_instance', 'aws_instance.instance_type', 'attribute', { name: 'instance_type' })
.edge('aws_instance', 'aws_instance.tags', 'subblock', { name: 'tags' })
.edge('aws_instance.tags', 'aws_instance.tags.Name', 'attribute', { name: 'Name' })
.constraint('aws_instance.ami', 'required', 'true')
.constraint('aws_instance.instance_type', 'required', 'true')
.build();When your Terraform provider adds a required attribute or restructures nested blocks, panproto diffs the schema versions, checks migration existence, and lifts your .tfstate data through the change. The HCL protocol is just another composed theory.
The K8s CRD and Terraform HCL protocols both use ThConstrainedGraph and ThWType. Does the choice of shared sub-theory \(S\) in the colimit \(T_1 +_S T_2\) affect the resulting protocol? What happens if you pick a larger or smaller \(S\)?
Yes. A larger shared sub-theory \(S\) identifies more structure between \(T_1\) and \(T_2\), producing a smaller colimit. A smaller \(S\) identifies less, producing a larger colimit with more independent sorts. In the extreme, \(S = \emptyset\) (no shared structure) gives the disjoint union. The choice of \(S\) determines which concepts are treated as “the same thing” versus “different things that happen to look similar.”
9.5 Building-block theories
panproto ships with a library of reusable component theories designed for composition:
| Theory | What it provides |
|---|---|
| ThGraph | Vertices and directed edges (\(\mathrm{src}, \mathrm{tgt}\)) |
| ThConstraint | Constraint annotations on vertices |
| ThMulti | Multiple edge types between the same vertices |
| ThHypergraph | Hyperedges connecting arbitrary sets of vertices |
| ThInterface | Input/output interface boundaries |
| ThWType | \(W\)-type structure for recursive/inductive data |
| ThFunctor | Functorial mappings between graph-like structures |
| ThFlat | Flat key-value records (no nesting) |
See Appendix C for the complete catalog. Every protocol panproto ships is a composition of theories from this library. The library isn’t closed; define your own component theories and use them alongside the built-in ones.
Since two protocols sharing a component theory have overlapping theories, panproto can automatically construct partial translations between them. This is the foundation for cross-protocol translation (Chapter 11).
9.6 All protocols as compositions
Every one of panproto’s 76 built-in protocols belongs to one of six theory groups, determined by which component theories its composition requires. See Chapter 4 for the definitions.
| Theory Group | Schema Theory | Instance Theory | Count | Examples |
|---|---|---|---|---|
| A | \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\) | ThWType | 30 | ATProto, JSON Schema, OpenAPI, FHIR |
| B | \(\mathrm{compose}(\text{ThHypergraph}, \text{ThConstraint})\) | ThFunctor | 11 | SQL, Cassandra, Parquet, CSV |
| C | \(\mathrm{compose}(\text{ThSimpleGraph}, \text{ThConstraint})\) | ThFlat | 9 | Protobuf, Avro, Thrift, HCL |
| D | \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti}, \text{ThInterface})\) | ThWType | 16 | GraphQL, TypeScript, Python, Java |
| E | \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\) | \(\mathrm{compose}(\text{ThWType}, \text{ThMeta})\) | 9 | XML/XSD, HTML, CSS |
| F | \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\) | ThGraphInstance | 1 | Neo4j |
Group membership determines which algorithms apply and how cross-group migrations work. Migrating from Group C (Protobuf) to Group A (JSON Schema) requires embedding the simpler \(\text{ThSimpleGraph}\) into the richer \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\). The migration morphism itself remains a graph homomorphism.
See Chapter 30 for the complete catalog with theory group assignments.
Group C protocols use \(\text{ThSimpleGraph}\) and Group A protocols use \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\). Migrating C to A embeds a simpler theory into a richer one. Is the reverse direction (A to C) always possible? What structural information could be lost?
Not always. Group A schemas use multigraph structure (parallel edges, constraints, multiplicity) that Group C’s simple graph theory cannot represent. Converting A to C loses parallel edges, constraint annotations, and multiplicity metadata. If the Group A schema has no parallel edges and no constraints, the embedding into Group C is lossless. Otherwise, information is dropped at the theory level.
9.7 Further reading
The theory of composing generalized algebraic theories is developed in Cartmell (1986). The computational algorithm for computing compositions of GAT presentations is described in Lynch et al. (2024). The GATlab documentation provides an accessible introduction in a Julia context.
This is a categorical pushout. See Appendix A for the formal definition.↩︎