9  Building Your Own Protocol

A protocol is defined by composing building-block theories via colimit—the categorical operation that merges two structures while identifying anything they share. Two theories that both mention a \(\mathrm{Vertex}\) sort get merged into one theory with a single \(\mathrm{Vertex}\), not two copies of it. This chapter walks through building a Kubernetes CRD protocol from \(\mathrm{colimit}(\text{ThConstrainedGraph}, \text{ThWType})\).

9.1 How composition works

When two theories \(T_1\) and \(T_2\) share a common sub-theory \(S\) through theory morphisms, the colimit \(T_1 +_S T_2\) merges them carefully: shared structure gets identified (never duplicated), all sorts and operations from both sides survive, and equations from both are preserved.1 The Rust API is a single function:

pub fn compose(t1: &Theory, t2: &Theory, shared: &Theory) -> Result<Theory, GatError>;

A concrete example. Suppose we have two theories:

  • ThGraph with sorts \(\{\mathrm{Vertex}, \mathrm{Edge}\}\) and operations \(\{\mathrm{src}: \mathrm{Edge} \to \mathrm{Vertex},\; \mathrm{tgt}: \mathrm{Edge} \to \mathrm{Vertex}\}\).
  • ThConstraint with sorts \(\{\mathrm{Vertex}, \mathrm{Constraint}\}\) and operations \(\{\mathrm{target}: \mathrm{Constraint} \to \mathrm{Vertex}\}\).

Both mention \(\mathrm{Vertex}\). The shared base ThVertex contains just \(\{\mathrm{Vertex}\}\) with no operations.

When we compose, calling \(\mathrm{ThConstrainedGraph} = \mathrm{compose}(\mathrm{ThGraph}, \mathrm{ThConstraint}, \mathrm{ThVertex})\), we get:

  • Sorts: \(\{\mathrm{Vertex}, \mathrm{Edge}, \mathrm{Constraint}\}\). \(\mathrm{Vertex}\) appears once, not twice.
  • Operations: \(\{\mathrm{src}, \mathrm{tgt}, \mathrm{target}\}\). All operations are retained.
graph LR
    ThVertex -->|includes Vertex| ThGraph
    ThVertex -->|includes Vertex| ThConstraint
    ThGraph -->|compose leg| ThConstrainedGraph
    ThConstraint -->|compose leg| ThConstrainedGraph
Figure 9.1: Composition square: ThGraph and ThConstraint merged along their shared ThVertex.

Each composition leg is a theory morphism tracking where each sort and operation came from. These provenance records are crucial for computing migrations later. The compositional approach was pioneered by GATlab, Julia’s computational category theory framework (Lynch et al. 2024). panproto follows the same algebraic blueprint, implemented in Rust for the WASM target.

9.2 Building a custom protocol: Kubernetes CRDs

Kubernetes Custom Resource Definitions (CRDs) define the shape of custom API objects: groups, versions, resource kinds, and typed fields with validation. To model this in panproto, pick the right building-block theories and compose them.

Here’s how you define the protocol:

import { Panproto } from '@panproto/core';

const panproto = await Panproto.init();

const k8sCrd = panproto.defineProtocol({
  name: 'k8s-crd',
  schemaTheory: 'ThConstrainedGraph',
  instanceTheory: 'ThWType',
  edgeRules: [
    { edgeKind: 'field',      srcKinds: ['object'], tgtKinds: [] },
    { edgeKind: 'nested',     srcKinds: ['object'], tgtKinds: ['object'] },
    { edgeKind: 'items',      srcKinds: ['array'],  tgtKinds: [] },
    { edgeKind: 'version',    srcKinds: ['group'],  tgtKinds: ['object'] },
  ],
  objKinds: ['group', 'object', 'array'],
  constraintSorts: ['required', 'immutable', 'minimum', 'maximum', 'pattern', 'enum'],
});

This declares what a CRD schema can contain: API groups, objects, arrays, fields. Behind the scenes, panproto merges the component theories using colimit. With the protocol defined, build actual schemas within it:

const certManagerCrd = k8sCrd.schema()
  .vertex('certificates.cert-manager.io', 'group')
  .vertex('spec', 'object')
  .vertex('spec.secretName', 'string')
  .vertex('spec.issuerRef', 'object')
  .vertex('spec.issuerRef.name', 'string')
  .vertex('spec.issuerRef.kind', 'string')
  .vertex('spec.dnsNames', 'array')
  .vertex('spec.dnsNames.item', 'string')
  .edge('certificates.cert-manager.io', 'spec', 'version', { name: 'v1' })
  .edge('spec', 'spec.secretName', 'field', { name: 'secretName' })
  .edge('spec', 'spec.issuerRef', 'nested', { name: 'issuerRef' })
  .edge('spec.issuerRef', 'spec.issuerRef.name', 'field', { name: 'name' })
  .edge('spec.issuerRef', 'spec.issuerRef.kind', 'field', { name: 'kind' })
  .edge('spec', 'spec.dnsNames', 'field', { name: 'dnsNames' })
  .edge('spec.dnsNames', 'spec.dnsNames.item', 'items', {})
  .constraint('spec.secretName', 'required', 'true')
  .constraint('spec.issuerRef.name', 'required', 'true')
  .constraint('spec.issuerRef.kind', 'enum', '["Issuer","ClusterIssuer"]')
  .build();

This models a simplified cert-manager Certificate CRD: the structural shape (groups, nested objects, arrays) plus validation constraints like required and enum.

9.3 Everything works immediately

Since the protocol is defined as a composed theory, every tool in panproto’s machinery operates on it with no additional code.

Diff detects what changed between versions:

const report = panproto.diff(certV1, certV2);
// reports: added field "duration" in spec,
//          changed constraint on issuerRef.kind (added "ExternalIssuer")

Existence checking tells you whether a migration between versions can be constructed:

const existence = panproto.checkExistence(certV1, certV2, migration);
// reports whether the proposed migration handles all changes

Lifting migrates actual data through the schema change:

const newCert = migration.lift(oldCert);
// adds default "duration" field, preserves existing values

Lens decomposition gives bidirectional access:

const view = migration.get(oldCert);         // forward view
const restored = migration.put(view, complement);  // round-trip

All of this came free from defining the theory. The generic engine did the work.

CautionExercise: Why does colimit preserve equations?

If \(T_1\) and \(T_2\) both impose equations on the shared sort \(\mathrm{Vertex}\), what guarantees those equations remain consistent in \(T_1 +_S T_2\)? Under what conditions could a colimit produce contradictory equations?

The colimit preserves all equations from both sides. If \(T_1\) requires src(id(v)) = v and \(T_2\) requires that no vertex has a self-loop, the colimit contains both, creating a contradiction. No model can satisfy both. panproto detects this at commit time during equation type-checking and rejects the composition.

9.4 An alternative: Terraform HCL

The same composition approach applies to Terraform HCL, modeling infrastructure-as-code resource definitions:

const terraformHcl = panproto.defineProtocol({
  name: 'terraform-hcl',
  schemaTheory: 'ThConstrainedGraph',
  instanceTheory: 'ThWType',
  edgeRules: [
    { edgeKind: 'attribute', srcKinds: ['block'], tgtKinds: [] },
    { edgeKind: 'subblock',  srcKinds: ['block'], tgtKinds: ['block'] },
  ],
  objKinds: ['block'],
  constraintSorts: ['required', 'sensitive', 'computed', 'forceNew'],
});

const awsInstance = terraformHcl.schema()
  .vertex('aws_instance', 'block')
  .vertex('aws_instance.ami', 'string')
  .vertex('aws_instance.instance_type', 'string')
  .vertex('aws_instance.tags', 'block')
  .vertex('aws_instance.tags.Name', 'string')
  .edge('aws_instance', 'aws_instance.ami', 'attribute', { name: 'ami' })
  .edge('aws_instance', 'aws_instance.instance_type', 'attribute', { name: 'instance_type' })
  .edge('aws_instance', 'aws_instance.tags', 'subblock', { name: 'tags' })
  .edge('aws_instance.tags', 'aws_instance.tags.Name', 'attribute', { name: 'Name' })
  .constraint('aws_instance.ami', 'required', 'true')
  .constraint('aws_instance.instance_type', 'required', 'true')
  .build();

When your Terraform provider adds a required attribute or restructures nested blocks, panproto diffs the schema versions, checks migration existence, and lifts your .tfstate data through the change. The HCL protocol is just another composed theory.

CautionExercise: Shared sub-theory choice

The K8s CRD and Terraform HCL protocols both use ThConstrainedGraph and ThWType. Does the choice of shared sub-theory \(S\) in the colimit \(T_1 +_S T_2\) affect the resulting protocol? What happens if you pick a larger or smaller \(S\)?

Yes. A larger shared sub-theory \(S\) identifies more structure between \(T_1\) and \(T_2\), producing a smaller colimit. A smaller \(S\) identifies less, producing a larger colimit with more independent sorts. In the extreme, \(S = \emptyset\) (no shared structure) gives the disjoint union. The choice of \(S\) determines which concepts are treated as “the same thing” versus “different things that happen to look similar.”

9.5 Building-block theories

panproto ships with a library of reusable component theories designed for composition:

Theory What it provides
ThGraph Vertices and directed edges (\(\mathrm{src}, \mathrm{tgt}\))
ThConstraint Constraint annotations on vertices
ThMulti Multiple edge types between the same vertices
ThHypergraph Hyperedges connecting arbitrary sets of vertices
ThInterface Input/output interface boundaries
ThWType \(W\)-type structure for recursive/inductive data
ThFunctor Functorial mappings between graph-like structures
ThFlat Flat key-value records (no nesting)

See Appendix C for the complete catalog. Every protocol panproto ships is a composition of theories from this library. The library isn’t closed; define your own component theories and use them alongside the built-in ones.

Note

Since two protocols sharing a component theory have overlapping theories, panproto can automatically construct partial translations between them. This is the foundation for cross-protocol translation (Chapter 11).

9.6 All protocols as compositions

Every one of panproto’s 76 built-in protocols belongs to one of six theory groups, determined by which component theories its composition requires. See Chapter 4 for the definitions.

Table 9.1: The six theory groups organizing panproto’s 76 built-in protocols.
Theory Group Schema Theory Instance Theory Count Examples
A \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\) ThWType 30 ATProto, JSON Schema, OpenAPI, FHIR
B \(\mathrm{compose}(\text{ThHypergraph}, \text{ThConstraint})\) ThFunctor 11 SQL, Cassandra, Parquet, CSV
C \(\mathrm{compose}(\text{ThSimpleGraph}, \text{ThConstraint})\) ThFlat 9 Protobuf, Avro, Thrift, HCL
D \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti}, \text{ThInterface})\) ThWType 16 GraphQL, TypeScript, Python, Java
E \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\) \(\mathrm{compose}(\text{ThWType}, \text{ThMeta})\) 9 XML/XSD, HTML, CSS
F \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\) ThGraphInstance 1 Neo4j

Group membership determines which algorithms apply and how cross-group migrations work. Migrating from Group C (Protobuf) to Group A (JSON Schema) requires embedding the simpler \(\text{ThSimpleGraph}\) into the richer \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\). The migration morphism itself remains a graph homomorphism.

See Chapter 30 for the complete catalog with theory group assignments.

CautionExercise: Cross-group migration cost

Group C protocols use \(\text{ThSimpleGraph}\) and Group A protocols use \(\mathrm{compose}(\text{ThGraph}, \text{ThConstraint}, \text{ThMulti})\). Migrating C to A embeds a simpler theory into a richer one. Is the reverse direction (A to C) always possible? What structural information could be lost?

Not always. Group A schemas use multigraph structure (parallel edges, constraints, multiplicity) that Group C’s simple graph theory cannot represent. Converting A to C loses parallel edges, constraint annotations, and multiplicity metadata. If the Group A schema has no parallel edges and no constraints, the embedding into Group C is lossless. Otherwise, information is dropped at the theory level.

9.7 Further reading

The theory of composing generalized algebraic theories is developed in Cartmell (1986). The computational algorithm for computing compositions of GAT presentations is described in Lynch et al. (2024). The GATlab documentation provides an accessible introduction in a Julia context.

Cartmell, John. 1986. “Generalised Algebraic Theories and Contextual Categories.” Annals of Pure and Applied Logic 32: 209–43. https://doi.org/10.1016/0168-0072(86)90053-9.
Lynch, Owen, Kris Brown, James Fairbanks, and Evan Patterson. 2024. GATlab: Modeling and Programming with Generalized Algebraic Theories.” arXiv Preprint arXiv:2404.04837, ahead of print. https://doi.org/10.48550/arXiv.2404.04837.

  1. This is a categorical pushout. See Appendix A for the formal definition.↩︎