Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Composing protocols by colimit

In plain terms

Every schema language has a lot in common with every other. They all have records, fields, references between records, ways to constrain what values a field can take, and ways to express that something is one of a fixed set of alternatives. Building each protocol’s schema model from scratch would mean reimplementing those shared pieces dozens of times.

panproto solves this by defining each shared piece as a small theory, then combining them. The theory of graphs (vertices and edges) is one piece. The theory of constraints (predicates on edge values) is another. The theory of multigraphs (graphs that allow multiple parallel edges between the same pair of vertices) is a third. Each protocol is built by gluing these pieces together at the points where they share structure.

The gluing operation is called a colimit. Conceptually, it takes several theories plus a description of how their shared parts identify, and produces a single combined theory whose vertices and edges are the union of the inputs, with the shared parts collapsed. The result is a new theory that has all the structure of the inputs and respects all their equations.

This is why adding a new protocol to panproto is mostly a matter of declaring which building-block theories it uses and how they fit together: the colimit construction does the rest.

The construction

A protocol’s schema theory is built as the colimit of a diagram of building-block theories. The diagram is a small category whose objects are theories and whose morphisms are theory inclusions describing the shared structure.

For example, the typed-multigraph-with-W-types theory used by JSON Schema is constructed as:

ThGraph + ThConstraint + ThMulti + ThWType

where each + is a pushout (a binary colimit) along the shared sort Vertex. The result has:

  • The vertices, edges, and identity laws from ThGraph.
  • The constraint sort and predicate operations from ThConstraint.
  • The multi-edge labelling from ThMulti.
  • The instance-level W-type structure (tree-shaped data anchored at schema vertices) from ThWType.

If a colimit step fails, registration panics with a message naming the failing intermediate step (colimit ThGraph + ThConstraint over ThVertex failed: ...). This is intentional: a failed registration is a build-time bug in the theory composition, not user input.

Why colimits, specifically

The colimit construction has three properties that matter for panproto:

  1. Universality. The colimit is the minimal theory containing all the inputs and respecting their shared structure. No spurious extra equations are introduced.
  2. Existence checking. The construction is mechanical and can fail predictably. If two building blocks have incompatible equations on a shared sort, the colimit step fails at registration time.
  3. Functoriality of migration. Because protocols are colimits, a migration between two protocols can be defined componentwise on the building blocks; the colimit assembles the components into a single protocol-level migration.

The merge operation in schema version control is also a colimit (specifically a pushout); the same machinery powers both.

Reusable building blocks

The shared theory library lives in crates/panproto-protocols/src/theories.rs. The major pieces:

TheoryPurpose
ThGraphVertices and edges, with source and target operations.
ThConstraintVertex-attached constraints (a dependent Constraint(v: Vertex) sort).
ThMultiParallel edges distinguished by edge labels.
ThWTypeRecursive type constructors (W-types) at the instance level.
ThMetaMetadata edges on instance nodes.

The library also exposes higher-level pieces built by composing these (ThSimpleGraph, ThHypergraph, ThInterface, ThFunctor, ThFlat, ThGraphInstance) for protocols that want to start from a richer base.

A protocol’s registration function is a recipe for combining these. To define a new protocol, see Build a custom protocol.

Cross-protocol translation has two mature precedents. The data-exchange line (Fagin, Kolaitis, Miller, and Popa on universal solutions and the chase (Fagin et al. 2005); Fagin, Kolaitis, Popa, and Tan on second-order tgds as a closed-under-composition class (Fagin et al. 2005)) sets the logical machinery: a universal solution is a colimit; composition needs a Skolemised second-order extension; cores pick out the canonical materialisation. The categorical version is the CQL line of Schultz & Wisnesky (2017) and Schultz et al. (2017), where a schema mapping is a functor, universal solutions are colimits in the category of instances, and data integration is a pushout. The engineering-IR line (Apache Calcite (Begoli et al. 2018), Substrait, Apache Arrow, MLIR) supplies the hub-and-spoke adapter pattern. panproto’s colimit and pushout_by_name run the same construction over GAT-presented wire-format schemas. See Related work for the full discussion.

See also