flowchart TD
T["Raw Term / Equation"] --> TC["typecheck_term / typecheck_equation<br>(panproto-gat)"]
TC -->|GatError| GD["GatDiagnostics<br>(panproto-vcs gat_validate)"]
GD -->|type_errors, equation_errors| VS["ValidationStatus<br>(panproto-vcs index)"]
VS -->|Invalid(reasons)| VF["VcsError::ValidationFailed<br>(panproto-vcs repo)"]
VF -->|miette diagnostic| CLI["CLI error output<br>(panproto-cli)"]
style TC fill:#e8f0fe,stroke:#1a73e8
style GD fill:#fef3e8,stroke:#e87a1a
style VS fill:#fef3e8,stroke:#e87a1a
style VF fill:#fee8e8,stroke:#e81a1a
style CLI fill:#fee8e8,stroke:#e81a1a
18 Advanced Internals: Building Blocks, Naming, Auto-Morphisms, and Type Checking
panproto defines 27 building-block GATs. We build up from the three simplest, then catalog the rest.
18.1 Building-block theories
18.1.1 Formal invariants
Every building-block theory satisfies:
- Sort closure: every sort name in an operation’s input/output types is declared in the theory’s sorts list.
- Equation well-typedness: both sides of every equation have the same output sort.
- No tautologies: \(\text{lhs} \neq \text{rhs}\) syntactically for all equations.
18.1.2 The property-to-structure principle
Several building blocks encode conditional properties as unconditional equations by adding operations and dependent sorts. For example, injectivity of coproduct injection is not injection(v1) = injection(v2) -> v1 = v2 (a conditional equation, which GATs can’t express). Instead, ThCoproduct adds a retraction variant_of : Vertex -> Variant with the equation variant_of(injection(v)) = v. The retraction is structure, not a property.
This separation yields a clean boundary: structural properties (transitivity, injectivity, uniqueness, optionality) belong in GAT sorts+equations; validation properties (cardinality, patterns, ranges) belong in ThConstraint.
18.1.3 The thgraph to threflexivegraph to thcoproduct progression
The simplest theory is ThGraph: just vertices, edges, and two operations.
ThGraph
Sorts: Vertex, Edge
Ops: src : Edge → Vertex
tgt : Edge → Vertex
Eqs: (none)
ThReflexiveGraph extends ThGraph with identity edges and two equations that force id to be a section of both src and tgt:
ThReflexiveGraph
Sorts: Vertex, Edge
Ops: id : Vertex → Edge
src : Edge → Vertex
tgt : Edge → Vertex
Eqs: src(id(v)) = v
tgt(id(v)) = v
Type check: id(v) : Edge, src(id(v)) : Vertex, v : Vertex. Both sides have sort Vertex.
Without the equations, this is just ThGraph + an extra operation. The equations make the identity edge load-bearing.
ThCoproduct adds the retraction pattern to encode injectivity:
ThCoproduct
Sorts: Vertex, Variant, Tag
Ops: injection : Variant → Vertex
tag : Variant → Tag
variant_of : Vertex → Variant
Eq: variant_of(injection(v)) = v
Type check: injection(v) : Vertex, variant_of(injection(v)) : Variant, v : Variant. Both sides have sort Variant.
The equation variant_of(injection(v)) = v encodes that variant_of is a left inverse of injection. This is how panproto represents injectivity without conditional equations.
GATs only support unconditional equations. Conditional equations (implications) would require dependent type theory or first-order logic. The retraction trick sidesteps this by adding structure (the variant_of operation) that witnesses the property.
18.1.4 Theories with equations
18.1.4.1 ThRecursion: Fold-Unfold
Sorts: Vertex, Mu
Ops: fold : Vertex → Mu
unfold : Mu → Vertex
Eq: unfold(fold(v)) = v
This is the isorecursive fold-unfold law. fold is a section of unfold.
18.1.4.2 ThPartial: Definedness Witness
Sorts: Vertex, Defined
Ops: defined : Vertex → Defined
witness : Defined → Vertex
Eq: defined(witness(d)) = d
This encodes definedness: defined is a surjection with witness as its retraction.
18.1.4.3 ThSymmetricGraph: Involution
Sorts: Vertex, Edge
Ops: inv : Edge → Edge
src : Edge → Vertex
tgt : Edge → Vertex
Eqs: src(inv(e)) = tgt(e)
tgt(inv(e)) = src(e)
inv(inv(e)) = e
The involution law: inverting twice is identity.
18.1.5 All 27 theories (complete list)
18.1.5.1 Schema shapes (6)
| # | Name | Sorts | Key operations |
|---|---|---|---|
| 1 | ThGraph | Vertex, Edge | src, tgt |
| 2 | ThSimpleGraph | Vertex, Edge(s:V,t:V) | src, tgt (unique by endpoints) |
| 3 | ThHypergraph | Vertex, HyperEdge, Label | incident, parent_label |
| 4 | ThReflexiveGraph | Vertex, Edge | id, src, tgt |
| 5 | ThSymmetricGraph | Vertex, Edge | inv, src, tgt |
| 6 | ThPetriNet | Place, Transition, Token | pre, post, marking |
18.1.5.2 Schema modifiers (8)
| # | Name | Sorts | Key operations |
|---|---|---|---|
| 7 | ThConstraint | Vertex, Constraint(v:V) | target |
| 8 | ThMulti | Vertex, Edge, EdgeLabel | edge_label |
| 9 | ThInterface | Vertex, Interface | implements |
| 10 | ThOrder | Edge, Position | edge_position, succ |
| 11 | ThCoproduct | Vertex, Variant, Tag | injection, tag, variant_of |
| 12 | ThRecursion | Vertex, Mu | unfold, fold |
| 13 | ThPartial | Vertex, Defined | defined, witness |
| 14 | ThLinear | Edge, Usage | use_count |
18.1.5.3 Schema structural (4)
| # | Name | Sorts | Key operations |
|---|---|---|---|
| 15 | ThNominal | Vertex, Name | name |
| 16 | ThSpan | Vertex, Span | span_left, span_right |
| 17 | ThCospan | Vertex, Apex | inl, inr |
| 18 | ThOperad | Color, MOperation | arity, op_output |
18.1.5.4 Instance (6)
| # | Name | Sorts | Key operations |
|---|---|---|---|
| 19 | ThWType | Node, Arc, Value | anchor, arc_src, arc_tgt, arc_edge, node_value |
| 20 | ThFunctor | Table, Row, ForeignKey | table_vertex, fk_src, fk_tgt |
| 21 | ThFlat | Node, Field, Value | field_node, field_value, node_anchor |
| 22 | ThGraphInstance | IVertex, IEdge, Value, Vertex, Edge | i_src, i_tgt, iv_anchor, ie_anchor, iv_value |
| 23 | ThAnnotation | Vertex, Annotation, AnnotationKey, Value | annotated, ann_key, ann_value |
| 24 | ThCausal | Event, Timestamp, Before(e1,e2) | time, before_refl, before_trans |
18.1.5.5 Architectural (3)
| # | Name | Sorts | Key operations |
|---|---|---|---|
| 25 | ThTracedMonoidal | Wire, Box | trace_in, trace_out, feedback |
| 26 | ThSimplicial | Simplex, Face, Degeneracy | face_map, degeneracy_map |
| 27 | ThMeta | Node, Discriminator, ExtraField | discriminator, extra_field |
18.1.6 Theory groups
| Group | Schema composition | Instance | Protocols |
|---|---|---|---|
| A | colimit(Graph, Constraint, Multi) | WType | 30 |
| B | colimit(Hypergraph, Constraint) | Functor | 11 |
| C | colimit(SimpleGraph, Constraint) | Flat | 9 |
| D | colimit(Graph, Constraint, Multi, Interface) | WType | 16 |
| E | colimit(Graph, Constraint, Multi) | colimit(WType, Meta) | 9 |
| F | colimit(Graph, Constraint, Multi) | GraphInstance | 1 |
18.2 Naming, identity, and the Morphism tower
This section covers the panproto-gat identifier abstraction, the unified naming layer, the morphism tower that cascades theory-level renames into data migrations, and the VCS rename detection heuristic.
18.2.1 The ident abstraction
The Ident type separates stable identity from display name, following GATlab:
pub struct Ident {
pub scope: ScopeTag, // which theory/schema this belongs to
pub index: u32, // positional index within scope
pub name: Arc<str>, // human-readable display name
}PartialEq and Hash use only (scope, index), making comparisons \(O(1)\) regardless of name length. Renaming an identifier (via ident.renamed("new_name")) doesn’t change its identity, so HashMap<Ident, _> entries survive renames without rehashing.
ScopeTag is a monotonically increasing u32 assigned via ScopeTag::fresh(). Each Theory or SchemaBuilder gets a unique scope at construction time. ScopeTag::LEGACY (scope 0) is used for deserializing old-format data.
18.2.2 The name Type
Name wraps Arc<str> with a pointer-equality fast path:
pub struct Name(pub Arc<str>);Equality checks use Arc::ptr_eq first (single pointer comparison), falling back to string comparison only on pointer mismatch. In the restrict hot path, both sides typically originate from the same schema construction, so the fast path hits frequently.
Performance: Benchmarks show no regression from the String to Name migration. The Arc::clone fast path eliminates heap allocations in the restrict pipeline’s inner loop.
18.2.3 NameSite and SiteRename
All 9 naming sites in panproto are unified under the NameSite enum:
pub enum NameSite {
EdgeLabel, VertexId, VertexKind, EdgeKind,
Nsid, ConstraintSort, InstanceAnchor,
TheoryName, SortName,
}SiteRename { site, old, new } is a site-qualified rename operation. These are stored in SchemaMorphism provenance and CommitObject rename metadata.
18.2.4 The morphism tower
A theory morphism \(F: T_1 \to T_2\) with sort_map and op_map induces schema-level renames:
- Each
sort_mapentry whereold != newbecomes aNameSite::VertexKindrename - Each
op_mapentry whereold != newbecomes aNameSite::EdgeKindrename
This cascades into SchemaMorphism, then into CompiledMigration for the restrict pipeline:
Theory morphism ──induce_schema_morphism──→ Schema morphism
│
induce_data_migration
│
▼
CompiledMigration
The cascade module (crates/panproto-mig/src/cascade.rs) connects all three levels. This implements the action described in Spivak 2012: a schema morphism \(F\) induces the pullback functor \(\Delta_F\) on instances.
A sort rename Vertex → Node in the theory becomes a vertex-kind rename in the schema, which becomes an anchor remap in the instance. Each level of the tower translates the same conceptual change into a more concrete representation.
18.2.5 New combinators
Seven new combinator variants complement the original six:
| Combinator | Site | Lossless? |
|---|---|---|
RenameVertex |
VertexId |
Yes |
RenameKind |
VertexKind |
Yes |
RenameEdgeKind |
EdgeKind |
Yes |
RenameNsid |
Nsid |
Yes |
RenameConstraintSort |
ConstraintSort |
Yes |
ApplyTheoryMorphism |
Multiple | Yes |
Rename |
Any | Yes |
Each combinator has a corresponding apply_* function and build_compiled_migration arm. All are lossless (empty complement), so the lens laws hold trivially.
18.3 Automatic morphisms: extend, restrict, and complete
This section covers automatic homomorphism search, the extend/restrict/complete operations, schema-level pushouts, overlap discovery, and the chase algorithm.
18.3.1 Homomorphism search
The homomorphism problem reduces to constraint satisfaction:
- Variables: one per source vertex
- Domains: target vertices with compatible kind
- Constraints: edge-preserving squares
Backtracking search with forward checking:
- Variable ordering: MRV heuristic (assign vertices with smallest domains first)
- Value selection: try each domain value
- Forward check: after assignment, verify all unassigned neighbors still have valid domain values
- Edge compatibility: requires matching edge kind between mapped vertex pairs; edge names are allowed to differ
- Edge map derivation: once all vertices are assigned, the edge map is determined by finding compatible edges between each pair of mapped vertices
Quality scoring in \([0.0, 1.0]\):
- 50% weight: average name similarity across vertex pairs
- 50% weight: fraction of edges where source and target have the same label name
18.3.2 Extend, restrict, and complete
Extend (\(\Sigma_F\)): wtype_extend and functor_extend map all source nodes into the target schema. Remap node anchors and arc edges; fill missing data with Value::Null.
Restrict (\(\Delta_F\)): Drop source nodes that don’t map to any target. This is the hot path used throughout panproto for data migration. Complexity: \(O(|N| + |E|)\) where \(N\) and \(E\) are instance nodes and edges.
Complete (\(\Pi_F\)): functor_pi computes the product over fibers. wtype_pi currently only handles injective migrations (single-element fibers). Multi-element fiber products for trees require Cartesian products of subtrees, deferred to a future release.
18.3.3 Schema pushout
The schema pushout combines two schemas along an overlap. SchemaOverlap specifies which elements are identified:
pub struct SchemaOverlap {
pub vertex_pairs: Vec<(Name, Name)>,
pub edge_pairs: Vec<(Edge, Edge)>,
}The schema_pushout algorithm:
- Merged vertices: all left vertices + non-overlapping right vertices
- Merged edges: all left edges + right edges with remapped vertex references
- Constraint/metadata union for overlapping vertices
- Build morphisms: left to pushout (identity for left vertices), right to pushout (identity or remapped for right vertices)
- Rebuild adjacency indices
18.3.4 Overlap discovery
Uses find_best_morphism with monic: true in both directions (left to right and right to left), taking whichever produces more vertex pairs. The vertex map entries from the best morphism define the overlap pairs.
18.3.5 Chase algorithm
The chase algorithm implements embedded dependencies:
pub struct EmbeddedDependency {
pub pattern_vertex: Name,
pub pattern_values: HashMap<String, Value>,
pub consequence_vertex: Name,
pub consequence_values: HashMap<String, Value>,
}Algorithm:
- Clone instance
- For each iteration (up to
max_iterations):- For each dependency, find pattern matches without consequences
- Add missing consequence rows
- If nothing changed, return fixpoint
- If limit reached, return
ChaseError::NonTermination
Termination is guaranteed for acyclic dependencies. For cyclic dependencies, the iteration bound prevents infinite loops. panproto’s equational theories are purely algebraic, so termination is guaranteed for all standard schema theories.
18.4 The type-checking pipeline
Type-checking in panproto operates at three levels, each building on the one below.
18.4.1 Pipeline overview
Level 0 (GAT, panproto-gat): The typecheck module provides primitive type-checking functions. These operate on Term, Equation, and Theory values and return Result<_, GatError>. No knowledge of schemas, migrations, or the VCS.
Level 1 (VCS, panproto-vcs): The gat_validate module wraps GAT-level functions and collects results into GatDiagnostics. The index module stores diagnostics in the staging area, and the repo module consults them during commit.
Level 2 (CLI, panproto-cli): The CLI converts VcsError::ValidationFailed into user-facing miette diagnostics with source context, suggestions, and error codes.
18.4.2 Level 0: GAT type-checking
Four public functions form the foundation:
18.4.2.1 typecheck_term
pub fn typecheck_term(
term: &Term,
ctx: &VarContext,
theory: &Theory,
) -> Result<Arc<str>, GatError>Given a term, a variable typing context, and a theory, infers the output sort. For App { op, args }, it looks up op in the theory, typechecks each argument recursively, checks that each argument’s inferred sort matches the operation’s declared input sort, and returns the output sort.
18.4.2.2 infer_var_sorts
pub fn infer_var_sorts(eq: &Equation, theory: &Theory) -> Result<VarContext, GatError>Infers variable sorts from an equation by walking both sides and collecting sort constraints. Eliminates the need for explicit sort annotations on equation variables.
18.4.2.3 typecheck_equation
pub fn typecheck_equation(eq: &Equation, theory: &Theory) -> Result<(), GatError>Combines infer_var_sorts and typecheck_term: infers the variable context, typechecks both sides, and verifies they produce the same output sort.
18.4.2.4 typecheck_theory
pub fn typecheck_theory(theory: &Theory) -> Result<(), GatError>Iterates over all equations in a theory and typechecks each one. Returns the first error encountered. This is the function that the VCS layer calls.
18.4.2.5 Error Types
| Variant | Meaning |
|---|---|
UnboundVariable(name) |
Variable not in context |
OpNotFound(name) |
Operation not in theory |
TermArityMismatch { op, expected, got } |
Wrong argument count |
ArgTypeMismatch { op, arg_index, expected, got } |
Wrong argument sort |
EquationSortMismatch { equation, lhs_sort, rhs_sort } |
Equation sides differ |
ConflictingVarSort { var, sort1, sort2 } |
Variable at incompatible positions |
18.4.3 Level 1: VCS integration
Type-checking integrates at three points in schema evolution: staging (schema add), committing (schema commit), and theory validation.
18.4.3.1 GatDiagnostics
The GatDiagnostics struct bridges GAT-level errors and VCS-level validation:
pub struct GatDiagnostics {
pub type_errors: Vec<String>,
pub equation_errors: Vec<String>,
pub migration_warnings: Vec<String>,
}type_errorscollects results fromtypecheck_theory. Each entry is theDisplayoutput of aGatError.equation_errorscollects results fromcheck_model_with_options. Each entry describes a specific equation violation with its variable assignment and the values that disagreed.migration_warningscollects structural warnings fromvalidate_migration; these are non-blocking.
The is_clean() method returns true only when both type_errors and equation_errors are empty. Migration warnings don’t block commits.
18.4.3.2 Integration Points
When Repository::add stages a schema:
- Compute the schema diff between HEAD and the new schema.
- Call
auto_mig::derive_migrationto produce aMigration. - Call
gat_validate::validate_migrationto check migration structure. - Store diagnostics in the
StagedSchema::gat_diagnosticsfield. - If
gat_diagnostics.has_errors(), setValidationStatus::Invalid(diag.all_errors()).
The index file (.panproto/index.json) persists the diagnostics so that schema commit can consult them without re-running validation.
When Repository::commit_with_options creates a commit, it checks two gates:
- ValidationStatus: if
ValidationStatus::Invalid, the commit is blocked withVcsError::ValidationFailed. - GatDiagnostics: even if
ValidationStatusisValid, the commit checksgat_diagnostics.has_errors(). This handles cases where validation status was manually set toValidbut diagnostics were populated later.
Both gates are bypassed when CommitOptions::skip_verify is true. The --skip-verify CLI flag sets this option, providing an escape hatch for work-in-progress schemas.
The validate_theory_equations function provides standalone theory type-checking, called by the schema validate and schema typecheck CLI commands for schemas that aren’t part of a VCS repository.
18.4.4 Level 2: CLI error formatting
The CLI layer converts VcsError::ValidationFailed into user-facing diagnostics using miette.
18.4.4.1 Error Flow
GatError::ArgTypeMismatch → GatDiagnostics → ValidationStatus::Invalid
→ VcsError::ValidationFailed → miette::Report
Each layer adds context:
- GatError: the raw error with structured fields.
- GatDiagnostics: the error serialized to a string via
Display, prefixed with its category. - ValidationStatus: the collected error strings.
- VcsError: the error wrapped in a
ValidationFailedvariant. - miette: the error rendered with ANSI colors, source labels, and help text.
18.4.4.2 Equation Violation Formatting
Equation violations receive special formatting:
equation 'assoc' violated when a=2, b=3, c=1: LHS=6, RHS=4
This format shows the equation name, the variable assignment that triggered the violation, and the concrete values that the two sides evaluated to.
18.4.5 Design rationale
The three-level design separates concerns cleanly:
- Level 0 is pure mathematics. No knowledge of files, schemas, or version control. Easy to test in isolation.
- Level 1 is integration logic. Knows how to translate between GAT errors and VCS concepts. Serializes diagnostics for persistence.
- Level 2 is presentation. Renders errors for humans with colors and suggestions. Uses
miette, a CLI-only dependency.
Storing diagnostics in the index rather than re-computing at commit time has two benefits:
- Performance: type-checking can be expensive for large theories. Running once at staging and persisting avoids redundant work.
- Auditability: the index file records exactly what diagnostics were produced, even if the user later modifies the schema file.
The --skip-verify flag is an explicit escape hatch for situations where the pipeline produces false positives or where a work-in-progress schema intentionally violates equations.
The max_assignments bound in CheckModelOptions prevents combinatorial explosion. An equation with 3 variables over a carrier of 100 elements requires \(10^6\) assignments; with 4 variables it’s \(10^8\). The default bound of 10,000 catches most violations while keeping verification fast.