18  Advanced Internals: Building Blocks, Naming, Auto-Morphisms, and Type Checking

panproto defines 27 building-block GATs. We build up from the three simplest, then catalog the rest.

18.1 Building-block theories

18.1.1 Formal invariants

Every building-block theory satisfies:

  1. Sort closure: every sort name in an operation’s input/output types is declared in the theory’s sorts list.
  2. Equation well-typedness: both sides of every equation have the same output sort.
  3. No tautologies: \(\text{lhs} \neq \text{rhs}\) syntactically for all equations.

18.1.2 The property-to-structure principle

Several building blocks encode conditional properties as unconditional equations by adding operations and dependent sorts. For example, injectivity of coproduct injection is not injection(v1) = injection(v2) -> v1 = v2 (a conditional equation, which GATs can’t express). Instead, ThCoproduct adds a retraction variant_of : Vertex -> Variant with the equation variant_of(injection(v)) = v. The retraction is structure, not a property.

This separation yields a clean boundary: structural properties (transitivity, injectivity, uniqueness, optionality) belong in GAT sorts+equations; validation properties (cardinality, patterns, ranges) belong in ThConstraint.

18.1.3 The thgraph to threflexivegraph to thcoproduct progression

The simplest theory is ThGraph: just vertices, edges, and two operations.

ThGraph
  Sorts: Vertex, Edge
  Ops:   src : Edge → Vertex
         tgt : Edge → Vertex
  Eqs:   (none)

ThReflexiveGraph extends ThGraph with identity edges and two equations that force id to be a section of both src and tgt:

ThReflexiveGraph
  Sorts: Vertex, Edge
  Ops:   id  : Vertex → Edge
         src : Edge → Vertex
         tgt : Edge → Vertex
  Eqs:   src(id(v)) = v
         tgt(id(v)) = v

Type check: id(v) : Edge, src(id(v)) : Vertex, v : Vertex. Both sides have sort Vertex.

Without the equations, this is just ThGraph + an extra operation. The equations make the identity edge load-bearing.

ThCoproduct adds the retraction pattern to encode injectivity:

ThCoproduct
  Sorts: Vertex, Variant, Tag
  Ops:   injection : Variant → Vertex
         tag : Variant → Tag
         variant_of : Vertex → Variant
  Eq:    variant_of(injection(v)) = v

Type check: injection(v) : Vertex, variant_of(injection(v)) : Variant, v : Variant. Both sides have sort Variant.

The equation variant_of(injection(v)) = v encodes that variant_of is a left inverse of injection. This is how panproto represents injectivity without conditional equations.

NoteWhy Not Conditional Equations?

GATs only support unconditional equations. Conditional equations (implications) would require dependent type theory or first-order logic. The retraction trick sidesteps this by adding structure (the variant_of operation) that witnesses the property.

18.1.4 Theories with equations

18.1.4.1 ThRecursion: Fold-Unfold

Sorts: Vertex, Mu
Ops:   fold : Vertex → Mu
       unfold : Mu → Vertex
Eq:    unfold(fold(v)) = v

This is the isorecursive fold-unfold law. fold is a section of unfold.

18.1.4.2 ThPartial: Definedness Witness

Sorts: Vertex, Defined
Ops:   defined : Vertex → Defined
       witness : Defined → Vertex
Eq:    defined(witness(d)) = d

This encodes definedness: defined is a surjection with witness as its retraction.

18.1.4.3 ThSymmetricGraph: Involution

Sorts: Vertex, Edge
Ops:   inv : Edge → Edge
       src : Edge → Vertex
       tgt : Edge → Vertex
Eqs:   src(inv(e)) = tgt(e)
       tgt(inv(e)) = src(e)
       inv(inv(e)) = e

The involution law: inverting twice is identity.

18.1.5 All 27 theories (complete list)

18.1.5.1 Schema shapes (6)

# Name Sorts Key operations
1 ThGraph Vertex, Edge src, tgt
2 ThSimpleGraph Vertex, Edge(s:V,t:V) src, tgt (unique by endpoints)
3 ThHypergraph Vertex, HyperEdge, Label incident, parent_label
4 ThReflexiveGraph Vertex, Edge id, src, tgt
5 ThSymmetricGraph Vertex, Edge inv, src, tgt
6 ThPetriNet Place, Transition, Token pre, post, marking

18.1.5.2 Schema modifiers (8)

# Name Sorts Key operations
7 ThConstraint Vertex, Constraint(v:V) target
8 ThMulti Vertex, Edge, EdgeLabel edge_label
9 ThInterface Vertex, Interface implements
10 ThOrder Edge, Position edge_position, succ
11 ThCoproduct Vertex, Variant, Tag injection, tag, variant_of
12 ThRecursion Vertex, Mu unfold, fold
13 ThPartial Vertex, Defined defined, witness
14 ThLinear Edge, Usage use_count

18.1.5.3 Schema structural (4)

# Name Sorts Key operations
15 ThNominal Vertex, Name name
16 ThSpan Vertex, Span span_left, span_right
17 ThCospan Vertex, Apex inl, inr
18 ThOperad Color, MOperation arity, op_output

18.1.5.4 Instance (6)

# Name Sorts Key operations
19 ThWType Node, Arc, Value anchor, arc_src, arc_tgt, arc_edge, node_value
20 ThFunctor Table, Row, ForeignKey table_vertex, fk_src, fk_tgt
21 ThFlat Node, Field, Value field_node, field_value, node_anchor
22 ThGraphInstance IVertex, IEdge, Value, Vertex, Edge i_src, i_tgt, iv_anchor, ie_anchor, iv_value
23 ThAnnotation Vertex, Annotation, AnnotationKey, Value annotated, ann_key, ann_value
24 ThCausal Event, Timestamp, Before(e1,e2) time, before_refl, before_trans

18.1.5.5 Architectural (3)

# Name Sorts Key operations
25 ThTracedMonoidal Wire, Box trace_in, trace_out, feedback
26 ThSimplicial Simplex, Face, Degeneracy face_map, degeneracy_map
27 ThMeta Node, Discriminator, ExtraField discriminator, extra_field

18.1.6 Theory groups

Group Schema composition Instance Protocols
A colimit(Graph, Constraint, Multi) WType 30
B colimit(Hypergraph, Constraint) Functor 11
C colimit(SimpleGraph, Constraint) Flat 9
D colimit(Graph, Constraint, Multi, Interface) WType 16
E colimit(Graph, Constraint, Multi) colimit(WType, Meta) 9
F colimit(Graph, Constraint, Multi) GraphInstance 1

18.2 Naming, identity, and the Morphism tower

This section covers the panproto-gat identifier abstraction, the unified naming layer, the morphism tower that cascades theory-level renames into data migrations, and the VCS rename detection heuristic.

18.2.1 The ident abstraction

The Ident type separates stable identity from display name, following GATlab:

pub struct Ident {
    pub scope: ScopeTag,   // which theory/schema this belongs to
    pub index: u32,        // positional index within scope
    pub name: Arc<str>,    // human-readable display name
}

PartialEq and Hash use only (scope, index), making comparisons \(O(1)\) regardless of name length. Renaming an identifier (via ident.renamed("new_name")) doesn’t change its identity, so HashMap<Ident, _> entries survive renames without rehashing.

ScopeTag is a monotonically increasing u32 assigned via ScopeTag::fresh(). Each Theory or SchemaBuilder gets a unique scope at construction time. ScopeTag::LEGACY (scope 0) is used for deserializing old-format data.

18.2.2 The name Type

Name wraps Arc<str> with a pointer-equality fast path:

pub struct Name(pub Arc<str>);

Equality checks use Arc::ptr_eq first (single pointer comparison), falling back to string comparison only on pointer mismatch. In the restrict hot path, both sides typically originate from the same schema construction, so the fast path hits frequently.

Performance: Benchmarks show no regression from the String to Name migration. The Arc::clone fast path eliminates heap allocations in the restrict pipeline’s inner loop.

18.2.3 NameSite and SiteRename

All 9 naming sites in panproto are unified under the NameSite enum:

pub enum NameSite {
    EdgeLabel, VertexId, VertexKind, EdgeKind,
    Nsid, ConstraintSort, InstanceAnchor,
    TheoryName, SortName,
}

SiteRename { site, old, new } is a site-qualified rename operation. These are stored in SchemaMorphism provenance and CommitObject rename metadata.

18.2.4 The morphism tower

A theory morphism \(F: T_1 \to T_2\) with sort_map and op_map induces schema-level renames:

  • Each sort_map entry where old != new becomes a NameSite::VertexKind rename
  • Each op_map entry where old != new becomes a NameSite::EdgeKind rename

This cascades into SchemaMorphism, then into CompiledMigration for the restrict pipeline:

Theory morphism  ──induce_schema_morphism──→  Schema morphism
                                                    │
                                          induce_data_migration
                                                    │
                                                    ▼
                                            CompiledMigration

The cascade module (crates/panproto-mig/src/cascade.rs) connects all three levels. This implements the action described in Spivak 2012: a schema morphism \(F\) induces the pullback functor \(\Delta_F\) on instances.

TipFrom Theory Rename to Data Migration

A sort rename Vertex → Node in the theory becomes a vertex-kind rename in the schema, which becomes an anchor remap in the instance. Each level of the tower translates the same conceptual change into a more concrete representation.

18.2.5 New combinators

Seven new combinator variants complement the original six:

Combinator Site Lossless?
RenameVertex VertexId Yes
RenameKind VertexKind Yes
RenameEdgeKind EdgeKind Yes
RenameNsid Nsid Yes
RenameConstraintSort ConstraintSort Yes
ApplyTheoryMorphism Multiple Yes
Rename Any Yes

Each combinator has a corresponding apply_* function and build_compiled_migration arm. All are lossless (empty complement), so the lens laws hold trivially.

18.3 Automatic morphisms: extend, restrict, and complete

This section covers automatic homomorphism search, the extend/restrict/complete operations, schema-level pushouts, overlap discovery, and the chase algorithm.

18.3.2 Extend, restrict, and complete

Extend (\(\Sigma_F\)): wtype_extend and functor_extend map all source nodes into the target schema. Remap node anchors and arc edges; fill missing data with Value::Null.

Restrict (\(\Delta_F\)): Drop source nodes that don’t map to any target. This is the hot path used throughout panproto for data migration. Complexity: \(O(|N| + |E|)\) where \(N\) and \(E\) are instance nodes and edges.

Complete (\(\Pi_F\)): functor_pi computes the product over fibers. wtype_pi currently only handles injective migrations (single-element fibers). Multi-element fiber products for trees require Cartesian products of subtrees, deferred to a future release.

18.3.3 Schema pushout

The schema pushout combines two schemas along an overlap. SchemaOverlap specifies which elements are identified:

pub struct SchemaOverlap {
    pub vertex_pairs: Vec<(Name, Name)>,
    pub edge_pairs: Vec<(Edge, Edge)>,
}

The schema_pushout algorithm:

  1. Merged vertices: all left vertices + non-overlapping right vertices
  2. Merged edges: all left edges + right edges with remapped vertex references
  3. Constraint/metadata union for overlapping vertices
  4. Build morphisms: left to pushout (identity for left vertices), right to pushout (identity or remapped for right vertices)
  5. Rebuild adjacency indices

18.3.4 Overlap discovery

Uses find_best_morphism with monic: true in both directions (left to right and right to left), taking whichever produces more vertex pairs. The vertex map entries from the best morphism define the overlap pairs.

18.3.5 Chase algorithm

The chase algorithm implements embedded dependencies:

pub struct EmbeddedDependency {
    pub pattern_vertex: Name,
    pub pattern_values: HashMap<String, Value>,
    pub consequence_vertex: Name,
    pub consequence_values: HashMap<String, Value>,
}

Algorithm:

  1. Clone instance
  2. For each iteration (up to max_iterations):
    • For each dependency, find pattern matches without consequences
    • Add missing consequence rows
    • If nothing changed, return fixpoint
  3. If limit reached, return ChaseError::NonTermination

Termination is guaranteed for acyclic dependencies. For cyclic dependencies, the iteration bound prevents infinite loops. panproto’s equational theories are purely algebraic, so termination is guaranteed for all standard schema theories.

18.4 The type-checking pipeline

Type-checking in panproto operates at three levels, each building on the one below.

18.4.1 Pipeline overview

flowchart TD
    T["Raw Term / Equation"] --> TC["typecheck_term / typecheck_equation<br>(panproto-gat)"]
    TC -->|GatError| GD["GatDiagnostics<br>(panproto-vcs gat_validate)"]
    GD -->|type_errors, equation_errors| VS["ValidationStatus<br>(panproto-vcs index)"]
    VS -->|Invalid(reasons)| VF["VcsError::ValidationFailed<br>(panproto-vcs repo)"]
    VF -->|miette diagnostic| CLI["CLI error output<br>(panproto-cli)"]

    style TC fill:#e8f0fe,stroke:#1a73e8
    style GD fill:#fef3e8,stroke:#e87a1a
    style VS fill:#fef3e8,stroke:#e87a1a
    style VF fill:#fee8e8,stroke:#e81a1a
    style CLI fill:#fee8e8,stroke:#e81a1a
Figure 18.1: Type-checking flows from raw terms through GAT-level checks, VCS-level diagnostics, and CLI-level formatting.

Level 0 (GAT, panproto-gat): The typecheck module provides primitive type-checking functions. These operate on Term, Equation, and Theory values and return Result<_, GatError>. No knowledge of schemas, migrations, or the VCS.

Level 1 (VCS, panproto-vcs): The gat_validate module wraps GAT-level functions and collects results into GatDiagnostics. The index module stores diagnostics in the staging area, and the repo module consults them during commit.

Level 2 (CLI, panproto-cli): The CLI converts VcsError::ValidationFailed into user-facing miette diagnostics with source context, suggestions, and error codes.

18.4.2 Level 0: GAT type-checking

Four public functions form the foundation:

18.4.2.1 typecheck_term

pub fn typecheck_term(
    term: &Term,
    ctx: &VarContext,
    theory: &Theory,
) -> Result<Arc<str>, GatError>

Given a term, a variable typing context, and a theory, infers the output sort. For App { op, args }, it looks up op in the theory, typechecks each argument recursively, checks that each argument’s inferred sort matches the operation’s declared input sort, and returns the output sort.

18.4.2.2 infer_var_sorts

pub fn infer_var_sorts(eq: &Equation, theory: &Theory) -> Result<VarContext, GatError>

Infers variable sorts from an equation by walking both sides and collecting sort constraints. Eliminates the need for explicit sort annotations on equation variables.

18.4.2.3 typecheck_equation

pub fn typecheck_equation(eq: &Equation, theory: &Theory) -> Result<(), GatError>

Combines infer_var_sorts and typecheck_term: infers the variable context, typechecks both sides, and verifies they produce the same output sort.

18.4.2.4 typecheck_theory

pub fn typecheck_theory(theory: &Theory) -> Result<(), GatError>

Iterates over all equations in a theory and typechecks each one. Returns the first error encountered. This is the function that the VCS layer calls.

18.4.2.5 Error Types

Variant Meaning
UnboundVariable(name) Variable not in context
OpNotFound(name) Operation not in theory
TermArityMismatch { op, expected, got } Wrong argument count
ArgTypeMismatch { op, arg_index, expected, got } Wrong argument sort
EquationSortMismatch { equation, lhs_sort, rhs_sort } Equation sides differ
ConflictingVarSort { var, sort1, sort2 } Variable at incompatible positions

18.4.3 Level 1: VCS integration

Type-checking integrates at three points in schema evolution: staging (schema add), committing (schema commit), and theory validation.

18.4.3.1 GatDiagnostics

The GatDiagnostics struct bridges GAT-level errors and VCS-level validation:

pub struct GatDiagnostics {
    pub type_errors: Vec<String>,
    pub equation_errors: Vec<String>,
    pub migration_warnings: Vec<String>,
}
  • type_errors collects results from typecheck_theory. Each entry is the Display output of a GatError.
  • equation_errors collects results from check_model_with_options. Each entry describes a specific equation violation with its variable assignment and the values that disagreed.
  • migration_warnings collects structural warnings from validate_migration; these are non-blocking.

The is_clean() method returns true only when both type_errors and equation_errors are empty. Migration warnings don’t block commits.

18.4.3.2 Integration Points

When Repository::add stages a schema:

  1. Compute the schema diff between HEAD and the new schema.
  2. Call auto_mig::derive_migration to produce a Migration.
  3. Call gat_validate::validate_migration to check migration structure.
  4. Store diagnostics in the StagedSchema::gat_diagnostics field.
  5. If gat_diagnostics.has_errors(), set ValidationStatus::Invalid(diag.all_errors()).

The index file (.panproto/index.json) persists the diagnostics so that schema commit can consult them without re-running validation.

When Repository::commit_with_options creates a commit, it checks two gates:

  1. ValidationStatus: if ValidationStatus::Invalid, the commit is blocked with VcsError::ValidationFailed.
  2. GatDiagnostics: even if ValidationStatus is Valid, the commit checks gat_diagnostics.has_errors(). This handles cases where validation status was manually set to Valid but diagnostics were populated later.

Both gates are bypassed when CommitOptions::skip_verify is true. The --skip-verify CLI flag sets this option, providing an escape hatch for work-in-progress schemas.

The validate_theory_equations function provides standalone theory type-checking, called by the schema validate and schema typecheck CLI commands for schemas that aren’t part of a VCS repository.

18.4.4 Level 2: CLI error formatting

The CLI layer converts VcsError::ValidationFailed into user-facing diagnostics using miette.

18.4.4.1 Error Flow

GatError::ArgTypeMismatch → GatDiagnostics → ValidationStatus::Invalid
  → VcsError::ValidationFailed → miette::Report

Each layer adds context:

  • GatError: the raw error with structured fields.
  • GatDiagnostics: the error serialized to a string via Display, prefixed with its category.
  • ValidationStatus: the collected error strings.
  • VcsError: the error wrapped in a ValidationFailed variant.
  • miette: the error rendered with ANSI colors, source labels, and help text.

18.4.4.2 Equation Violation Formatting

Equation violations receive special formatting:

equation 'assoc' violated when a=2, b=3, c=1: LHS=6, RHS=4

This format shows the equation name, the variable assignment that triggered the violation, and the concrete values that the two sides evaluated to.

18.4.5 Design rationale

The three-level design separates concerns cleanly:

  • Level 0 is pure mathematics. No knowledge of files, schemas, or version control. Easy to test in isolation.
  • Level 1 is integration logic. Knows how to translate between GAT errors and VCS concepts. Serializes diagnostics for persistence.
  • Level 2 is presentation. Renders errors for humans with colors and suggestions. Uses miette, a CLI-only dependency.

Storing diagnostics in the index rather than re-computing at commit time has two benefits:

  1. Performance: type-checking can be expensive for large theories. Running once at staging and persisting avoids redundant work.
  2. Auditability: the index file records exactly what diagnostics were produced, even if the user later modifies the schema file.

The --skip-verify flag is an explicit escape hatch for situations where the pipeline produces false positives or where a work-in-progress schema intentionally violates equations.

The max_assignments bound in CheckModelOptions prevents combinatorial explosion. An equation with 3 variables over a carrier of 100 elements requires \(10^6\) assignments; with 4 variables it’s \(10^8\). The default bound of 10,000 catches most violations while keeping verification fast.