18 Advanced Internals: Building Blocks, Naming, Auto-Morphisms, and Type Checking

panproto defines 27 building-block GATs. We build up from the three simplest, then catalog the rest.

18.1 Building-block theories

18.1.1 Formal invariants

Every building-block theory satisfies:

Sort closure: every sort name in an operation’s input/output types is declared in the theory’s sorts list.
Equation well-typedness: both sides of every equation have the same output sort.
No tautologies: $\text{lhs} \neq \text{rhs}$ syntactically for all equations.

18.1.2 The property-to-structure principle

Several building blocks encode conditional properties as unconditional equations by adding operations and dependent sorts. For example, injectivity of coproduct injection is not injection(v1) = injection(v2) -> v1 = v2 (a conditional equation, which GATs can’t express). Instead, ThCoproduct adds a retraction variant_of : Vertex -> Variant with the equation variant_of(injection(v)) = v. The retraction is structure, not a property.

This separation yields a clean boundary: structural properties (transitivity, injectivity, uniqueness, optionality) belong in GAT sorts+equations; validation properties (cardinality, patterns, ranges) belong in ThConstraint.

18.1.3 The thgraph to threflexivegraph to thcoproduct progression

The simplest theory is ThGraph: just vertices, edges, and two operations.

ThGraph
  Sorts: Vertex, Edge
  Ops:   src : Edge → Vertex
         tgt : Edge → Vertex
  Eqs:   (none)

ThReflexiveGraph extends ThGraph with identity edges and two equations that force id to be a section of both src and tgt:

ThReflexiveGraph
  Sorts: Vertex, Edge
  Ops:   id  : Vertex → Edge
         src : Edge → Vertex
         tgt : Edge → Vertex
  Eqs:   src(id(v)) = v
         tgt(id(v)) = v

Type check: id(v) : Edge, src(id(v)) : Vertex, v : Vertex. Both sides have sort Vertex.

Without the equations, this is just ThGraph + an extra operation. The equations make the identity edge load-bearing.

ThCoproduct adds the retraction pattern to encode injectivity:

ThCoproduct
  Sorts: Vertex, Variant, Tag
  Ops:   injection : Variant → Vertex
         tag : Variant → Tag
         variant_of : Vertex → Variant
  Eq:    variant_of(injection(v)) = v

Type check: injection(v) : Vertex, variant_of(injection(v)) : Variant, v : Variant. Both sides have sort Variant.

The equation variant_of(injection(v)) = v encodes that variant_of is a left inverse of injection. This is how panproto represents injectivity without conditional equations.

Why Not Conditional Equations?

GATs only support unconditional equations. Conditional equations (implications) would require dependent type theory or first-order logic. The retraction trick sidesteps this by adding structure (the variant_of operation) that witnesses the property.

18.1.4 Theories with equations

18.1.4.1 ThRecursion: Fold-Unfold

Sorts: Vertex, Mu
Ops:   fold : Vertex → Mu
       unfold : Mu → Vertex
Eq:    unfold(fold(v)) = v

This is the isorecursive fold-unfold law. fold is a section of unfold.

18.1.4.2 ThPartial: Definedness Witness

Sorts: Vertex, Defined
Ops:   defined : Vertex → Defined
       witness : Defined → Vertex
Eq:    defined(witness(d)) = d

This encodes definedness: defined is a surjection with witness as its retraction.

18.1.4.3 ThSymmetricGraph: Involution

Sorts: Vertex, Edge
Ops:   inv : Edge → Edge
       src : Edge → Vertex
       tgt : Edge → Vertex
Eqs:   src(inv(e)) = tgt(e)
       tgt(inv(e)) = src(e)
       inv(inv(e)) = e

The involution law: inverting twice is identity.

18.1.5 All 27 theories (complete list)

18.1.5.1 Schema shapes (6)

#	Name	Sorts	Key operations
1	ThGraph	Vertex, Edge	src, tgt
2	ThSimpleGraph	Vertex, Edge(s:V,t:V)	src, tgt (unique by endpoints)
3	ThHypergraph	Vertex, HyperEdge, Label	incident, parent_label
4	ThReflexiveGraph	Vertex, Edge	id, src, tgt
5	ThSymmetricGraph	Vertex, Edge	inv, src, tgt
6	ThPetriNet	Place, Transition, Token	pre, post, marking

18.1.5.2 Schema modifiers (8)

#	Name	Sorts	Key operations
7	ThConstraint	Vertex, Constraint(v:V)	target
8	ThMulti	Vertex, Edge, EdgeLabel	edge_label
9	ThInterface	Vertex, Interface	implements
10	ThOrder	Edge, Position	edge_position, succ
11	ThCoproduct	Vertex, Variant, Tag	injection, tag, variant_of
12	ThRecursion	Vertex, Mu	unfold, fold
13	ThPartial	Vertex, Defined	defined, witness
14	ThLinear	Edge, Usage	use_count

18.1.5.3 Schema structural (4)

#	Name	Sorts	Key operations
15	ThNominal	Vertex, Name	name
16	ThSpan	Vertex, Span	span_left, span_right
17	ThCospan	Vertex, Apex	inl, inr
18	ThOperad	Color, MOperation	arity, op_output

18.1.5.4 Instance (6)

#	Name	Sorts	Key operations
19	ThWType	Node, Arc, Value	anchor, arc_src, arc_tgt, arc_edge, node_value
20	ThFunctor	Table, Row, ForeignKey	table_vertex, fk_src, fk_tgt
21	ThFlat	Node, Field, Value	field_node, field_value, node_anchor
22	ThGraphInstance	IVertex, IEdge, Value, Vertex, Edge	i_src, i_tgt, iv_anchor, ie_anchor, iv_value
23	ThAnnotation	Vertex, Annotation, AnnotationKey, Value	annotated, ann_key, ann_value
24	ThCausal	Event, Timestamp, Before(e1,e2)	time, before_refl, before_trans

18.1.5.5 Architectural (3)

#	Name	Sorts	Key operations
25	ThTracedMonoidal	Wire, Box	trace_in, trace_out, feedback
26	ThSimplicial	Simplex, Face, Degeneracy	face_map, degeneracy_map
27	ThMeta	Node, Discriminator, ExtraField	discriminator, extra_field

18.1.6 Theory groups

Group	Schema composition	Instance	Protocols
A	colimit(Graph, Constraint, Multi)	WType	30
B	colimit(Hypergraph, Constraint)	Functor	11
C	colimit(SimpleGraph, Constraint)	Flat	9
D	colimit(Graph, Constraint, Multi, Interface)	WType	16
E	colimit(Graph, Constraint, Multi)	colimit(WType, Meta)	9
F	colimit(Graph, Constraint, Multi)	GraphInstance	1

18.2 Naming, identity, and the Morphism tower

This section covers the panproto-gat identifier abstraction, the unified naming layer, the morphism tower that cascades theory-level renames into data migrations, and the VCS rename detection heuristic.

18.2.1 The `ident` abstraction

The Ident type separates stable identity from display name, following GATlab:

pub struct Ident {
    pub scope: ScopeTag,   // which theory/schema this belongs to
    pub index: u32,        // positional index within scope
    pub name: Arc<str>,    // human-readable display name
}

PartialEq and Hash use only (scope, index), making comparisons $O(1)$ regardless of name length. Renaming an identifier (via ident.renamed("new_name")) doesn’t change its identity, so HashMap<Ident, _> entries survive renames without rehashing.

ScopeTag is a monotonically increasing u32 assigned via ScopeTag::fresh(). Each Theory or SchemaBuilder gets a unique scope at construction time. ScopeTag::LEGACY (scope 0) is used for deserializing old-format data.

18.2.2 The `name` Type

Name wraps Arc<str> with a pointer-equality fast path:

pub struct Name(pub Arc<str>);

Equality checks use Arc::ptr_eq first (single pointer comparison), falling back to string comparison only on pointer mismatch. In the restrict hot path, both sides typically originate from the same schema construction, so the fast path hits frequently.

Performance: Benchmarks show no regression from the String to Name migration. The Arc::clone fast path eliminates heap allocations in the restrict pipeline’s inner loop.

18.2.3 `NameSite` and `SiteRename`

All 9 naming sites in panproto are unified under the NameSite enum:

pub enum NameSite {
    EdgeLabel, VertexId, VertexKind, EdgeKind,
    Nsid, ConstraintSort, InstanceAnchor,
    TheoryName, SortName,
}

SiteRename { site, old, new } is a site-qualified rename operation. These are stored in SchemaMorphism provenance and CommitObject rename metadata.

18.2.4 The morphism tower

A theory morphism $F: T_1 \to T_2$ with sort_map and op_map induces schema-level renames:

Each sort_map entry where old != new becomes a NameSite::VertexKind rename
Each op_map entry where old != new becomes a NameSite::EdgeKind rename

This cascades into SchemaMorphism, then into CompiledMigration for the restrict pipeline:

Theory morphism  ──induce_schema_morphism──→  Schema morphism
                                                    │
                                          induce_data_migration
                                                    │
                                                    ▼
                                            CompiledMigration

The cascade module (crates/panproto-mig/src/cascade.rs) connects all three levels. This implements the action described in Spivak 2012: a schema morphism $F$ induces the pullback functor $\Delta_F$ on instances.

From Theory Rename to Data Migration

A sort rename Vertex → Node in the theory becomes a vertex-kind rename in the schema, which becomes an anchor remap in the instance. Each level of the tower translates the same conceptual change into a more concrete representation.

18.2.5 New combinators

Seven new combinator variants complement the original six:

Combinator	Site	Lossless?
`RenameVertex`	`VertexId`	Yes
`RenameKind`	`VertexKind`	Yes
`RenameEdgeKind`	`EdgeKind`	Yes
`RenameNsid`	`Nsid`	Yes
`RenameConstraintSort`	`ConstraintSort`	Yes
`ApplyTheoryMorphism`	Multiple	Yes
`Rename`	Any	Yes

Each combinator has a corresponding apply_* function and build_compiled_migration arm. All are lossless (empty complement), so the lens laws hold trivially.

18.3 Automatic morphisms: extend, restrict, and complete

This section covers automatic homomorphism search, the extend/restrict/complete operations, schema-level pushouts, overlap discovery, and the chase algorithm.

18.3.1 Homomorphism search

The homomorphism problem reduces to constraint satisfaction:

Variables: one per source vertex
Domains: target vertices with compatible kind
Constraints: edge-preserving squares

Backtracking search with forward checking:

Variable ordering: MRV heuristic (assign vertices with smallest domains first)
Value selection: try each domain value
Forward check: after assignment, verify all unassigned neighbors still have valid domain values
Edge compatibility: requires matching edge kind between mapped vertex pairs; edge names are allowed to differ
Edge map derivation: once all vertices are assigned, the edge map is determined by finding compatible edges between each pair of mapped vertices

Quality scoring in $[0.0, 1.0]$:

50% weight: average name similarity across vertex pairs
50% weight: fraction of edges where source and target have the same label name

18.3.2 Extend, restrict, and complete

Extend ($\Sigma_F$): wtype_extend and functor_extend map all source nodes into the target schema. Remap node anchors and arc edges; fill missing data with Value::Null.

Restrict ($\Delta_F$): Drop source nodes that don’t map to any target. This is the hot path used throughout panproto for data migration. Complexity: $O(|N| + |E|)$ where $N$ and $E$ are instance nodes and edges.

Complete ($\Pi_F$): functor_pi computes the product over fibers. wtype_pi currently only handles injective migrations (single-element fibers). Multi-element fiber products for trees require Cartesian products of subtrees, deferred to a future release.

18.3.3 Schema pushout

The schema pushout combines two schemas along an overlap. SchemaOverlap specifies which elements are identified:

pub struct SchemaOverlap {
    pub vertex_pairs: Vec<(Name, Name)>,
    pub edge_pairs: Vec<(Edge, Edge)>,
}

The schema_pushout algorithm:

Merged vertices: all left vertices + non-overlapping right vertices
Merged edges: all left edges + right edges with remapped vertex references
Constraint/metadata union for overlapping vertices
Build morphisms: left to pushout (identity for left vertices), right to pushout (identity or remapped for right vertices)
Rebuild adjacency indices

18.3.4 Overlap discovery

Uses find_best_morphism with monic: true in both directions (left to right and right to left), taking whichever produces more vertex pairs. The vertex map entries from the best morphism define the overlap pairs.

18.3.5 Chase algorithm

The chase algorithm implements embedded dependencies:

pub struct EmbeddedDependency {
    pub pattern_vertex: Name,
    pub pattern_values: HashMap<String, Value>,
    pub consequence_vertex: Name,
    pub consequence_values: HashMap<String, Value>,
}

Algorithm:

Clone instance
For each iteration (up to max_iterations):
- For each dependency, find pattern matches without consequences
- Add missing consequence rows
- If nothing changed, return fixpoint
If limit reached, return ChaseError::NonTermination

Termination is guaranteed for acyclic dependencies. For cyclic dependencies, the iteration bound prevents infinite loops. panproto’s equational theories are purely algebraic, so termination is guaranteed for all standard schema theories.

18.4 The type-checking pipeline

Type-checking in panproto operates at three levels, each building on the one below.

18.4.1 Pipeline overview

flowchart TD
    T["Raw Term / Equation"] --> TC["typecheck_term / typecheck_equation<br>(panproto-gat)"]
    TC -->|GatError| GD["GatDiagnostics<br>(panproto-vcs gat_validate)"]
    GD -->|type_errors, equation_errors| VS["ValidationStatus<br>(panproto-vcs index)"]
    VS -->|Invalid(reasons)| VF["VcsError::ValidationFailed<br>(panproto-vcs repo)"]
    VF -->|miette diagnostic| CLI["CLI error output<br>(panproto-cli)"]

    style TC fill:#e8f0fe,stroke:#1a73e8
    style GD fill:#fef3e8,stroke:#e87a1a
    style VS fill:#fef3e8,stroke:#e87a1a
    style VF fill:#fee8e8,stroke:#e81a1a
    style CLI fill:#fee8e8,stroke:#e81a1a

Figure 18.1: Type-checking flows from raw terms through GAT-level checks, VCS-level diagnostics, and CLI-level formatting.

Level 0 (GAT, panproto-gat): The typecheck module provides primitive type-checking functions. These operate on Term, Equation, and Theory values and return Result<_, GatError>. No knowledge of schemas, migrations, or the VCS.

Level 1 (VCS, panproto-vcs): The gat_validate module wraps GAT-level functions and collects results into GatDiagnostics. The index module stores diagnostics in the staging area, and the repo module consults them during commit.

Level 2 (CLI, panproto-cli): The CLI converts VcsError::ValidationFailed into user-facing miette diagnostics with source context, suggestions, and error codes.

18.4.2 Level 0: GAT type-checking

Four public functions form the foundation:

18.4.2.1 typecheck_term

pub fn typecheck_term(
    term: &Term,
    ctx: &VarContext,
    theory: &Theory,
) -> Result<Arc<str>, GatError>

Given a term, a variable typing context, and a theory, infers the output sort. For App { op, args }, it looks up op in the theory, typechecks each argument recursively, checks that each argument’s inferred sort matches the operation’s declared input sort, and returns the output sort.

18.4.2.2 infer_var_sorts

pub fn infer_var_sorts(eq: &Equation, theory: &Theory) -> Result<VarContext, GatError>

Infers variable sorts from an equation by walking both sides and collecting sort constraints. Eliminates the need for explicit sort annotations on equation variables.

18.4.2.3 typecheck_equation

pub fn typecheck_equation(eq: &Equation, theory: &Theory) -> Result<(), GatError>

Combines infer_var_sorts and typecheck_term: infers the variable context, typechecks both sides, and verifies they produce the same output sort.

18.4.2.4 typecheck_theory

pub fn typecheck_theory(theory: &Theory) -> Result<(), GatError>

Iterates over all equations in a theory and typechecks each one. Returns the first error encountered. This is the function that the VCS layer calls.

18.4.2.5 Error Types

Variant	Meaning
`UnboundVariable(name)`	Variable not in context
`OpNotFound(name)`	Operation not in theory
`TermArityMismatch { op, expected, got }`	Wrong argument count
`ArgTypeMismatch { op, arg_index, expected, got }`	Wrong argument sort
`EquationSortMismatch { equation, lhs_sort, rhs_sort }`	Equation sides differ
`ConflictingVarSort { var, sort1, sort2 }`	Variable at incompatible positions

18.4.3 Level 1: VCS integration

Type-checking integrates at three points in schema evolution: staging (schema add), committing (schema commit), and theory validation.

18.4.3.1 GatDiagnostics

The GatDiagnostics struct bridges GAT-level errors and VCS-level validation:

pub struct GatDiagnostics {
    pub type_errors: Vec<String>,
    pub equation_errors: Vec<String>,
    pub migration_warnings: Vec<String>,
}

type_errors collects results from typecheck_theory. Each entry is the Display output of a GatError.
equation_errors collects results from check_model_with_options. Each entry describes a specific equation violation with its variable assignment and the values that disagreed.
migration_warnings collects structural warnings from validate_migration; these are non-blocking.

The is_clean() method returns true only when both type_errors and equation_errors are empty. Migration warnings don’t block commits.

18.4.3.2 Integration Points

When Repository::add stages a schema:

Compute the schema diff between HEAD and the new schema.
Call auto_mig::derive_migration to produce a Migration.
Call gat_validate::validate_migration to check migration structure.
Store diagnostics in the StagedSchema::gat_diagnostics field.
If gat_diagnostics.has_errors(), set ValidationStatus::Invalid(diag.all_errors()).

The index file (.panproto/index.json) persists the diagnostics so that schema commit can consult them without re-running validation.

When Repository::commit_with_options creates a commit, it checks two gates:

ValidationStatus: if ValidationStatus::Invalid, the commit is blocked with VcsError::ValidationFailed.
GatDiagnostics: even if ValidationStatus is Valid, the commit checks gat_diagnostics.has_errors(). This handles cases where validation status was manually set to Valid but diagnostics were populated later.

Both gates are bypassed when CommitOptions::skip_verify is true. The --skip-verify CLI flag sets this option, providing an escape hatch for work-in-progress schemas.

The validate_theory_equations function provides standalone theory type-checking, called by the schema validate and schema typecheck CLI commands for schemas that aren’t part of a VCS repository.

18.4.4 Level 2: CLI error formatting

The CLI layer converts VcsError::ValidationFailed into user-facing diagnostics using miette.

18.4.4.1 Error Flow

GatError::ArgTypeMismatch → GatDiagnostics → ValidationStatus::Invalid
  → VcsError::ValidationFailed → miette::Report

Each layer adds context:

GatError: the raw error with structured fields.
GatDiagnostics: the error serialized to a string via Display, prefixed with its category.
ValidationStatus: the collected error strings.
VcsError: the error wrapped in a ValidationFailed variant.
miette: the error rendered with ANSI colors, source labels, and help text.

18.4.4.2 Equation Violation Formatting

Equation violations receive special formatting:

equation 'assoc' violated when a=2, b=3, c=1: LHS=6, RHS=4

This format shows the equation name, the variable assignment that triggered the violation, and the concrete values that the two sides evaluated to.

18.4.5 Design rationale

The three-level design separates concerns cleanly:

Level 0 is pure mathematics. No knowledge of files, schemas, or version control. Easy to test in isolation.
Level 1 is integration logic. Knows how to translate between GAT errors and VCS concepts. Serializes diagnostics for persistence.
Level 2 is presentation. Renders errors for humans with colors and suggestions. Uses miette, a CLI-only dependency.

Storing diagnostics in the index rather than re-computing at commit time has two benefits:

Performance: type-checking can be expensive for large theories. Running once at staging and persisting avoids redundant work.
Auditability: the index file records exactly what diagnostics were produced, even if the user later modifies the schema file.

The --skip-verify flag is an explicit escape hatch for situations where the pipeline produces false positives or where a work-in-progress schema intentionally violates equations.

The max_assignments bound in CheckModelOptions prevents combinatorial explosion. An equation with 3 variables over a carrier of 100 elements requires $10^6$ assignments; with 4 variables it’s $10^8$. The default bound of 10,000 catches most violations while keeping verification fast.

# Advanced Internals: Building Blocks, Naming, Auto-Morphisms, and Type Checking {#sec-advanced-internals} panproto defines 27 building-block GATs. We build up from the three simplest, then catalog the rest. ## Building-block theories {#sec-building-blocks-section} ### Formal invariants Every building-block theory satisfies: 1. **Sort closure**: every sort name in an operation's input/output types is declared in the theory's sorts list. 2. **Equation well-typedness**: both sides of every equation have the same output sort. 3. **No tautologies**: $\text{lhs} \neq \text{rhs}$ syntactically for all equations. ### The property-to-structure principle Several building blocks encode conditional properties as unconditional equations by adding operations and dependent sorts. For example, injectivity of coproduct injection is not `injection(v1) = injection(v2) -> v1 = v2` (a conditional equation, which GATs can't express). Instead, ThCoproduct adds a retraction `variant_of : Vertex -> Variant` with the equation `variant_of(injection(v)) = v`. The retraction is *structure*, not a *property*. This separation yields a clean boundary: structural properties (transitivity, injectivity, uniqueness, optionality) belong in GAT sorts+equations; validation properties (cardinality, patterns, ranges) belong in ThConstraint. ### The thgraph to threflexivegraph to thcoproduct progression The simplest theory is ThGraph: just vertices, edges, and two operations. ``` ThGraph Sorts: Vertex, Edge Ops: src : Edge → Vertex tgt : Edge → Vertex Eqs: (none) ``` ThReflexiveGraph extends ThGraph with identity edges and two equations that force `id` to be a section of both `src` and `tgt`: ``` ThReflexiveGraph Sorts: Vertex, Edge Ops: id : Vertex → Edge src : Edge → Vertex tgt : Edge → Vertex Eqs: src(id(v)) = v tgt(id(v)) = v ``` Type check: `id(v) : Edge`, `src(id(v)) : Vertex`, `v : Vertex`. Both sides have sort Vertex. Without the equations, this is just ThGraph + an extra operation. The equations make the identity edge load-bearing. ThCoproduct adds the retraction pattern to encode injectivity: ``` ThCoproduct Sorts: Vertex, Variant, Tag Ops: injection : Variant → Vertex tag : Variant → Tag variant_of : Vertex → Variant Eq: variant_of(injection(v)) = v ``` Type check: `injection(v) : Vertex`, `variant_of(injection(v)) : Variant`, `v : Variant`. Both sides have sort Variant. The equation `variant_of(injection(v)) = v` encodes that `variant_of` is a left inverse of `injection`. This is how panproto represents injectivity without conditional equations. ::: {.callout-note} ## Why Not Conditional Equations? GATs only support unconditional equations. Conditional equations (implications) would require dependent type theory or first-order logic. The retraction trick sidesteps this by adding structure (the `variant_of` operation) that *witnesses* the property. ::: ### Theories with equations #### ThRecursion: Fold-Unfold ``` Sorts: Vertex, Mu Ops: fold : Vertex → Mu unfold : Mu → Vertex Eq: unfold(fold(v)) = v ``` This is the isorecursive fold-unfold law. `fold` is a section of `unfold`. #### ThPartial: Definedness Witness ``` Sorts: Vertex, Defined Ops: defined : Vertex → Defined witness : Defined → Vertex Eq: defined(witness(d)) = d ``` This encodes definedness: `defined` is a surjection with `witness` as its retraction. #### ThSymmetricGraph: Involution ``` Sorts: Vertex, Edge Ops: inv : Edge → Edge src : Edge → Vertex tgt : Edge → Vertex Eqs: src(inv(e)) = tgt(e) tgt(inv(e)) = src(e) inv(inv(e)) = e ``` The involution law: inverting twice is identity. ### All 27 theories (complete list) #### Schema shapes (6) | # | Name | Sorts | Key operations | |---|------|-------|---| | 1 | ThGraph | Vertex, Edge | src, tgt | | 2 | ThSimpleGraph | Vertex, Edge(s:V,t:V) | src, tgt (unique by endpoints) | | 3 | ThHypergraph | Vertex, HyperEdge, Label | incident, parent_label | | 4 | ThReflexiveGraph | Vertex, Edge | id, src, tgt | | 5 | ThSymmetricGraph | Vertex, Edge | inv, src, tgt | | 6 | ThPetriNet | Place, Transition, Token | pre, post, marking | #### Schema modifiers (8) | # | Name | Sorts | Key operations | |---|------|-------|---| | 7 | ThConstraint | Vertex, Constraint(v:V) | target | | 8 | ThMulti | Vertex, Edge, EdgeLabel | edge_label | | 9 | ThInterface | Vertex, Interface | implements | | 10 | ThOrder | Edge, Position | edge_position, succ | | 11 | ThCoproduct | Vertex, Variant, Tag | injection, tag, variant_of | | 12 | ThRecursion | Vertex, Mu | unfold, fold | | 13 | ThPartial | Vertex, Defined | defined, witness | | 14 | ThLinear | Edge, Usage | use_count | #### Schema structural (4) | # | Name | Sorts | Key operations | |---|------|-------|---| | 15 | ThNominal | Vertex, Name | name | | 16 | ThSpan | Vertex, Span | span_left, span_right | | 17 | ThCospan | Vertex, Apex | inl, inr | | 18 | ThOperad | Color, MOperation | arity, op_output | #### Instance (6) | # | Name | Sorts | Key operations | |---|------|-------|---| | 19 | ThWType | Node, Arc, Value | anchor, arc_src, arc_tgt, arc_edge, node_value | | 20 | ThFunctor | Table, Row, ForeignKey | table_vertex, fk_src, fk_tgt | | 21 | ThFlat | Node, Field, Value | field_node, field_value, node_anchor | | 22 | ThGraphInstance | IVertex, IEdge, Value, Vertex, Edge | i_src, i_tgt, iv_anchor, ie_anchor, iv_value | | 23 | ThAnnotation | Vertex, Annotation, AnnotationKey, Value | annotated, ann_key, ann_value | | 24 | ThCausal | Event, Timestamp, Before(e1,e2) | time, before_refl, before_trans | #### Architectural (3) | # | Name | Sorts | Key operations | |---|------|-------|---| | 25 | ThTracedMonoidal | Wire, Box | trace_in, trace_out, feedback | | 26 | ThSimplicial | Simplex, Face, Degeneracy | face_map, degeneracy_map | | 27 | ThMeta | Node, Discriminator, ExtraField | discriminator, extra_field | ### Theory groups | Group | Schema composition | Instance | Protocols | |---|---|---|---| | A | colimit(Graph, Constraint, Multi) | WType | 30 | | B | colimit(Hypergraph, Constraint) | Functor | 11 | | C | colimit(SimpleGraph, Constraint) | Flat | 9 | | D | colimit(Graph, Constraint, Multi, Interface) | WType | 16 | | E | colimit(Graph, Constraint, Multi) | colimit(WType, Meta) | 9 | | F | colimit(Graph, Constraint, Multi) | GraphInstance | 1 | ## Naming, identity, and the Morphism tower {#sec-naming-identity-section} This section covers the `panproto-gat` identifier abstraction, the unified naming layer, the morphism tower that cascades theory-level renames into data migrations, and the VCS rename detection heuristic. ### The `ident` abstraction The `Ident` type separates stable identity from display name, following GATlab: ```rust pub struct Ident { pub scope: ScopeTag, // which theory/schema this belongs to pub index: u32, // positional index within scope pub name: Arc<str>, // human-readable display name } ``` `PartialEq` and `Hash` use only `(scope, index)`, making comparisons $O(1)$ regardless of name length. Renaming an identifier (via `ident.renamed("new_name")`) doesn't change its identity, so `HashMap<Ident, _>` entries survive renames without rehashing. `ScopeTag` is a monotonically increasing `u32` assigned via `ScopeTag::fresh()`. Each `Theory` or `SchemaBuilder` gets a unique scope at construction time. `ScopeTag::LEGACY` (scope 0) is used for deserializing old-format data. ### The `name` Type `Name` wraps `Arc<str>` with a pointer-equality fast path: ```rust pub struct Name(pub Arc<str>); ``` Equality checks use `Arc::ptr_eq` first (single pointer comparison), falling back to string comparison only on pointer mismatch. In the restrict hot path, both sides typically originate from the same schema construction, so the fast path hits frequently. **Performance**: Benchmarks show no regression from the `String` to `Name` migration. The `Arc::clone` fast path eliminates heap allocations in the restrict pipeline's inner loop. ### `NameSite` and `SiteRename` All 9 naming sites in panproto are unified under the `NameSite` enum: ```rust pub enum NameSite { EdgeLabel, VertexId, VertexKind, EdgeKind, Nsid, ConstraintSort, InstanceAnchor, TheoryName, SortName, } ``` `SiteRename { site, old, new }` is a site-qualified rename operation. These are stored in `SchemaMorphism` provenance and `CommitObject` rename metadata. ### The morphism tower A theory morphism $F: T_1 \to T_2$ with `sort_map` and `op_map` induces schema-level renames: - Each `sort_map` entry where `old != new` becomes a `NameSite::VertexKind` rename - Each `op_map` entry where `old != new` becomes a `NameSite::EdgeKind` rename This cascades into `SchemaMorphism`, then into `CompiledMigration` for the restrict pipeline: ``` Theory morphism ──induce_schema_morphism──→ Schema morphism │ induce_data_migration │ ▼ CompiledMigration ``` The cascade module (`crates/panproto-mig/src/cascade.rs`) connects all three levels. This implements the action described in Spivak 2012: a schema morphism $F$ induces the pullback functor $\Delta_F$ on instances. ::: {.callout-tip} ## From Theory Rename to Data Migration A sort rename `Vertex → Node` in the theory becomes a vertex-kind rename in the schema, which becomes an anchor remap in the instance. Each level of the tower translates the same conceptual change into a more concrete representation. ::: ### New combinators Seven new combinator variants complement the original six: | Combinator | Site | Lossless? | |-----------|------|-----------| | `RenameVertex` | `VertexId` | Yes | | `RenameKind` | `VertexKind` | Yes | | `RenameEdgeKind` | `EdgeKind` | Yes | | `RenameNsid` | `Nsid` | Yes | | `RenameConstraintSort` | `ConstraintSort` | Yes | | `ApplyTheoryMorphism` | Multiple | Yes | | `Rename` | Any | Yes | Each combinator has a corresponding `apply_*` function and `build_compiled_migration` arm. All are lossless (empty complement), so the lens laws hold trivially. ## Automatic morphisms: extend, restrict, and complete {#sec-auto-morphisms-section} This section covers automatic homomorphism search, the extend/restrict/complete operations, schema-level pushouts, overlap discovery, and the chase algorithm. ### Homomorphism search The homomorphism problem reduces to constraint satisfaction: - **Variables**: one per source vertex - **Domains**: target vertices with compatible kind - **Constraints**: edge-preserving squares Backtracking search with forward checking: 1. **Variable ordering**: MRV heuristic (assign vertices with smallest domains first) 2. **Value selection**: try each domain value 3. **Forward check**: after assignment, verify all unassigned neighbors still have valid domain values 4. **Edge compatibility**: requires matching edge kind between mapped vertex pairs; edge names are allowed to differ 5. **Edge map derivation**: once all vertices are assigned, the edge map is determined by finding compatible edges between each pair of mapped vertices Quality scoring in $[0.0, 1.0]$: - 50% weight: average name similarity across vertex pairs - 50% weight: fraction of edges where source and target have the same label name ### Extend, restrict, and complete **Extend** ($\Sigma_F$): `wtype_extend` and `functor_extend` map all source nodes into the target schema. Remap node anchors and arc edges; fill missing data with `Value::Null`. **Restrict** ($\Delta_F$): Drop source nodes that don't map to any target. This is the hot path used throughout panproto for data migration. Complexity: $O(|N| + |E|)$ where $N$ and $E$ are instance nodes and edges. **Complete** ($\Pi_F$): `functor_pi` computes the product over fibers. `wtype_pi` currently only handles injective migrations (single-element fibers). Multi-element fiber products for trees require Cartesian products of subtrees, deferred to a future release. ### Schema pushout The schema pushout combines two schemas along an overlap. `SchemaOverlap` specifies which elements are identified: ```rust pub struct SchemaOverlap { pub vertex_pairs: Vec<(Name, Name)>, pub edge_pairs: Vec<(Edge, Edge)>, } ``` The `schema_pushout` algorithm: 1. Merged vertices: all left vertices + non-overlapping right vertices 2. Merged edges: all left edges + right edges with remapped vertex references 3. Constraint/metadata union for overlapping vertices 4. Build morphisms: left to pushout (identity for left vertices), right to pushout (identity or remapped for right vertices) 5. Rebuild adjacency indices ### Overlap discovery Uses `find_best_morphism` with `monic: true` in both directions (left to right and right to left), taking whichever produces more vertex pairs. The vertex map entries from the best morphism define the overlap pairs. ### Chase algorithm The chase algorithm implements embedded dependencies: ```rust pub struct EmbeddedDependency { pub pattern_vertex: Name, pub pattern_values: HashMap<String, Value>, pub consequence_vertex: Name, pub consequence_values: HashMap<String, Value>, } ``` Algorithm: 1. Clone instance 2. For each iteration (up to `max_iterations`): - For each dependency, find pattern matches without consequences - Add missing consequence rows - If nothing changed, return fixpoint 3. If limit reached, return `ChaseError::NonTermination` Termination is guaranteed for acyclic dependencies. For cyclic dependencies, the iteration bound prevents infinite loops. panproto's equational theories are purely algebraic, so termination is guaranteed for all standard schema theories. ## The type-checking pipeline {#sec-type-checking-pipeline-section} Type-checking in panproto operates at three levels, each building on the one below. ### Pipeline overview ```{mermaid} %%| label: fig-typecheck-pipeline %%| fig-cap: "Type-checking flows from raw terms through GAT-level checks, VCS-level diagnostics, and CLI-level formatting." flowchart TD T["Raw Term / Equation"] --> TC["typecheck_term / typecheck_equation (panproto-gat)"] TC -->|GatError| GD["GatDiagnostics (panproto-vcs gat_validate)"] GD -->|type_errors, equation_errors| VS["ValidationStatus (panproto-vcs index)"] VS -->|Invalid(reasons)| VF["VcsError::ValidationFailed (panproto-vcs repo)"] VF -->|miette diagnostic| CLI["CLI error output (panproto-cli)"] style TC fill:#e8f0fe,stroke:#1a73e8 style GD fill:#fef3e8,stroke:#e87a1a style VS fill:#fef3e8,stroke:#e87a1a style VF fill:#fee8e8,stroke:#e81a1a style CLI fill:#fee8e8,stroke:#e81a1a ``` **Level 0 (GAT, `panproto-gat`)**: The `typecheck` module provides primitive type-checking functions. These operate on `Term`, `Equation`, and `Theory` values and return `Result<_, GatError>`. No knowledge of schemas, migrations, or the VCS. **Level 1 (VCS, `panproto-vcs`)**: The `gat_validate` module wraps GAT-level functions and collects results into `GatDiagnostics`. The `index` module stores diagnostics in the staging area, and the `repo` module consults them during commit. **Level 2 (CLI, `panproto-cli`)**: The CLI converts `VcsError::ValidationFailed` into user-facing `miette` diagnostics with source context, suggestions, and error codes. ### Level 0: GAT type-checking Four public functions form the foundation: #### typecheck_term ```rust pub fn typecheck_term( term: &Term, ctx: &VarContext, theory: &Theory, ) -> Result<Arc<str>, GatError> ``` Given a term, a variable typing context, and a theory, infers the output sort. For `App { op, args }`, it looks up `op` in the theory, typechecks each argument recursively, checks that each argument's inferred sort matches the operation's declared input sort, and returns the output sort. #### infer_var_sorts ```rust pub fn infer_var_sorts(eq: &Equation, theory: &Theory) -> Result<VarContext, GatError> ``` Infers variable sorts from an equation by walking both sides and collecting sort constraints. Eliminates the need for explicit sort annotations on equation variables. #### typecheck_equation ```rust pub fn typecheck_equation(eq: &Equation, theory: &Theory) -> Result<(), GatError> ``` Combines `infer_var_sorts` and `typecheck_term`: infers the variable context, typechecks both sides, and verifies they produce the same output sort. #### typecheck_theory ```rust pub fn typecheck_theory(theory: &Theory) -> Result<(), GatError> ``` Iterates over all equations in a theory and typechecks each one. Returns the first error encountered. This is the function that the VCS layer calls. #### Error Types | Variant | Meaning | |---|---| | `UnboundVariable(name)` | Variable not in context | | `OpNotFound(name)` | Operation not in theory | | `TermArityMismatch { op, expected, got }` | Wrong argument count | | `ArgTypeMismatch { op, arg_index, expected, got }` | Wrong argument sort | | `EquationSortMismatch { equation, lhs_sort, rhs_sort }` | Equation sides differ | | `ConflictingVarSort { var, sort1, sort2 }` | Variable at incompatible positions | ### Level 1: VCS integration Type-checking integrates at three points in schema evolution: staging (`schema add`), committing (`schema commit`), and theory validation. #### GatDiagnostics The `GatDiagnostics` struct bridges GAT-level errors and VCS-level validation: ```rust pub struct GatDiagnostics { pub type_errors: Vec<String>, pub equation_errors: Vec<String>, pub migration_warnings: Vec<String>, } ``` - `type_errors` collects results from `typecheck_theory`. Each entry is the `Display` output of a `GatError`. - `equation_errors` collects results from `check_model_with_options`. Each entry describes a specific equation violation with its variable assignment and the values that disagreed. - `migration_warnings` collects structural warnings from `validate_migration`; these are non-blocking. The `is_clean()` method returns `true` only when both `type_errors` and `equation_errors` are empty. Migration warnings don't block commits. #### Integration Points When `Repository::add` stages a schema: 1. Compute the schema diff between HEAD and the new schema. 2. Call `auto_mig::derive_migration` to produce a `Migration`. 3. Call `gat_validate::validate_migration` to check migration structure. 4. Store diagnostics in the `StagedSchema::gat_diagnostics` field. 5. If `gat_diagnostics.has_errors()`, set `ValidationStatus::Invalid(diag.all_errors())`. The index file (`.panproto/index.json`) persists the diagnostics so that `schema commit` can consult them without re-running validation. When `Repository::commit_with_options` creates a commit, it checks two gates: 1. **ValidationStatus**: if `ValidationStatus::Invalid`, the commit is blocked with `VcsError::ValidationFailed`. 2. **GatDiagnostics**: even if `ValidationStatus` is `Valid`, the commit checks `gat_diagnostics.has_errors()`. This handles cases where validation status was manually set to `Valid` but diagnostics were populated later. Both gates are bypassed when `CommitOptions::skip_verify` is `true`. The `--skip-verify` CLI flag sets this option, providing an escape hatch for work-in-progress schemas. The `validate_theory_equations` function provides standalone theory type-checking, called by the `schema validate` and `schema typecheck` CLI commands for schemas that aren't part of a VCS repository. ### Level 2: CLI error formatting The CLI layer converts `VcsError::ValidationFailed` into user-facing diagnostics using `miette`. #### Error Flow ``` GatError::ArgTypeMismatch → GatDiagnostics → ValidationStatus::Invalid → VcsError::ValidationFailed → miette::Report ``` Each layer adds context: - **GatError**: the raw error with structured fields. - **GatDiagnostics**: the error serialized to a string via `Display`, prefixed with its category. - **ValidationStatus**: the collected error strings. - **VcsError**: the error wrapped in a `ValidationFailed` variant. - **miette**: the error rendered with ANSI colors, source labels, and help text. #### Equation Violation Formatting Equation violations receive special formatting: ``` equation 'assoc' violated when a=2, b=3, c=1: LHS=6, RHS=4 ``` This format shows the equation name, the variable assignment that triggered the violation, and the concrete values that the two sides evaluated to. ### Design rationale The three-level design separates concerns cleanly: - **Level 0** is pure mathematics. No knowledge of files, schemas, or version control. Easy to test in isolation. - **Level 1** is integration logic. Knows how to translate between GAT errors and VCS concepts. Serializes diagnostics for persistence. - **Level 2** is presentation. Renders errors for humans with colors and suggestions. Uses `miette`, a CLI-only dependency. Storing diagnostics in the index rather than re-computing at commit time has two benefits: 1. **Performance**: type-checking can be expensive for large theories. Running once at staging and persisting avoids redundant work. 2. **Auditability**: the index file records exactly what diagnostics were produced, even if the user later modifies the schema file. The `--skip-verify` flag is an explicit escape hatch for situations where the pipeline produces false positives or where a work-in-progress schema intentionally violates equations. The `max_assignments` bound in `CheckModelOptions` prevents combinatorial explosion. An equation with 3 variables over a carrier of 100 elements requires $10^6$ assignments; with 4 variables it's $10^8$. The default bound of 10,000 catches most violations while keeping verification fast.