12 Names Across Protocol Boundaries

Imagine you’re integrating Bluesky (which calls user handles handle), Mastodon (which calls them acct), and your internal API (which calls them username). The same concept, three names. Multiply this by dozens of fields across dozens of protocols, and you’ve got a persistent source of integration bugs. The fundamental problem isn’t that these names differ—it’s that no system tracks how they correspond.

panproto solves this by formalizing relationships between names as structure-preserving maps. Two protocols can call the same concept by different names. The system tracks how those names correspond through bidirectional, composable, verifiable transformations: $\phi: \mathrm{name}_A \Rightarrow \mathrm{name}_B$. A field rename from text to body becomes a lens. A protocol-level vocabulary shift (ATProto’s record to JSON Schema’s object) becomes a theory morphism. Both are handled by the same mechanism.

12.1 Where names live

Names appear at nine different sites in panproto. Each site is a position where the naming function $\mathrm{name}: \mathrm{Site} \times \mathrm{Element} \to \mathrm{String}$ assigns a string label to a structural element:

Table 12.1: The nine naming sites in panproto.

Site	Example	What it labels
Edge label	`"text"` to `"body"`	Field/property names
Vertex ID	`"post:body.text"`	Structural identifiers for schema elements
Vertex kind	`"string"`, `"object"`	Type classification (from the protocol’s theory)
Edge kind	`"prop"`, `"field-of"`	Relationship types (from the protocol’s theory)
NSID	`"app.bsky.feed.post"`	Namespace identifiers (ATProto-specific)
Constraint sort	`"maxLength"`	Validation property names
Instance anchor	`"post:body.text"`	Data-to-schema references in $W$-type instances
Theory name	`"ThATProtoSchema"`	GAT theory identifiers
Sort name	`"Vertex"`, `"Node"`	GAT sort identifiers

When you rename a field in a migration (Chapter 9), you’re changing an edge label. When a theory-level mapping translates ATProto sorts to JSON Schema sorts, it changes sort names and vertex kinds. Every naming transformation—from a simple field rename to a cross-protocol vocabulary translation—operates on one or more of these nine sites.

12.2 Identity vs. name

panproto separates identity (which element is this?) from name (what do we call it?). Following the GATlab design (Lynch et al. 2024), an identifier has three components:

A scope tag: which theory or schema this element belongs to
A positional index: which element within that scope (a numeric position, not a name)
A display name: the human-readable label

Equality and hashing use only scope and index. The display name is metadata. Rename a field and the HashMap<Ident, _> entry stays put without rehashing.

This distinction solves a real coordination problem. In a system where identity is the name, renaming a field from text to body forces every consumer to update simultaneously—the schema, migrations, compiled code, lenses, all of it. A partial rename breaks everything. panproto sidesteps this. The rename changes only the display name; the positional index (which is what actually matters for identity) stays constant. The schema, migration, compiled migration, and lens all reference (scope, index), which hasn’t changed. The display name gets used only when serializing to a wire format or showing something to a human.

Think of it like git’s content-addressing. Rename a file and git still tracks the same content via its SHA. Rename a vertex and panproto tracks the same structural element via its positional index.

Exercise: Identity stability under composition

If element $e$ has identity $(\mathrm{scope}_1, 3)$ in theory $T_1$ and the colimit $T_1 +_S T_2$ produces a new theory, does $e$ keep the same positional index? Or does the colimit reindex elements, and if so, how does panproto maintain the correspondence?

Answer

No. The colimit reindexes to produce a single consistent numbering for the merged theory. Element $e$ from $T_1$ receives a new positional index in $T_1 +_S T_2$. panproto maintains the correspondence through the composition legs: the theory morphisms $T_1 \to T_1 +_S T_2$ and $T_2 \to T_1 +_S T_2$ record exactly where each original element landed. The scope tag changes, but the composition leg provides the lookup.

12.2.1 When identity breaks down

The identity/name separation assumes you know that two elements across schemas are “the same thing.” When you write a migration that maps v1’s Text vertex to v2’s Text vertex, you’re asserting that these are the same structural element. panproto can verify that this assertion is consistent (via check_existence), but it can’t determine whether it is correct. Two vertices with the same kind and constraints might correspond to different real-world concepts. Two vertices with different names might correspond to the same concept.

This is the rename-vs-drop ambiguity from Chapter 8. When a vertex disappears and a different vertex appears in the next version, structural evidence alone can’t tell a rename (data-preserving) from a drop-and-add (destructive). The identity system tracks correspondences once they’re established, but can’t discover them on its own. Discovery requires either a human decision or a heuristic—the VCS rename detection below provides the structural heuristic.

12.3 Rename combinators

panproto provides a rename combinator for each naming site. Each is a lens (Chapter 7) with an empty complement: renaming is always lossless and the round-trip laws hold trivially.

Table 12.2: Rename combinators. All are lossless (empty complement).

Combinator	Site	What it changes
`RenameField { old, new }`	Edge label	A single field/property name
`RenameVertex { old_id, new_id }`	Vertex ID	A structural identifier; cascades to all edges, constraints, required sets, variants, and hyper-edges that reference the vertex
`RenameKind { vertex_id, new_kind }`	Vertex kind	The type classification of a single vertex
`RenameEdgeKind { old_kind, new_kind }`	Edge kind	All edges matching the old kind
`RenameNsid { vertex_id, new_nsid }`	NSID	The namespace identifier on a vertex
`RenameConstraintSort { old_sort, new_sort }`	Constraint sort	Validation vocabulary (e.g., `"maxLength"` to `"max-length"`)
`Rename { site, old, new }`	Any	Unified combinator that dispatches to the specific rename

RenameVertex deserves special attention because it cascades. Renaming "post:body.text" to "post:content.text" updates not just the vertex table but every edge, constraint, required-edge set, variant, ordering, and recursion point that references that ID. The cascade is computed from the schema’s adjacency indices—the same precomputed outgoing, incoming, and between maps that make migration fast.

12.4 A worked example: field renaming across versions

Suppose v1 of a blog post schema has a field text and v2 renames it to body.

The v1 schema has a vertex $\mathrm{Text}$ (kind: string) connected to $\mathrm{Post}$ via an edge labeled text. In v2, the same vertex and edge exist, but the edge is labeled body.

The migration specifies:

$\mathrm{Post} \mapsto \mathrm{Post}$, $\mathrm{Text} \mapsto \mathrm{Text}$ (vertex map)
text $\mapsto$ body (edge map)

This is expressible as a single combinator:

RenameField { old: "text", new: "body" }

The lens built from this combinator has:

get: Takes a v2 record with field body, produces a v1 record with field text. The complement is empty; no data discarded.
put: Takes a v1 record with field text, produces a v2 record with field body.

Round-tripping: $\mathrm{put}(\mathrm{get}(r)) = r$. The data is identical, only the label changes.

Now suppose v3 renames body to content:

RenameField { old: "body", new: "content" }

Compose these two lenses and the intermediate step vanishes algebraically. The net effect is text to content:

compose(
    RenameField { old: "text", new: "body" },
    RenameField { old: "body", new: "content" }
) = RenameField { old: "text", new: "content" }

Each version declares its renames as combinators. Composition eliminates the intermediate history. A client understanding v1 can consume v3 data by composing two rename lenses; composition is computed once and compiled; applying it to data is just a label substitution.

12.5 Name mapping across protocols

Rename combinators work at the schema level, within a single protocol. But ATProto calls a type record, SQL calls it table, Protobuf calls it message, and GraphQL calls it ObjectType. These aren’t schema-level differences. They’re theory-level differences: the vocabulary of sorts and operations that define each protocol’s schema theory.

A theory-level mapping $F: T_1 \to T_2$ maps sorts and operations between theories:

sort_map:  Vertex -> Node,    Edge -> Arrow
op_map:    src -> source,     tgt -> target

When applied as a combinator (ApplyTheoryMorphism), this single mapping cascades through three levels:

Theory level: Renames sorts and operations.
Schema level: Every vertex with kind "Vertex" becomes kind "Node". Every edge with kind "src" becomes "source". The result records all induced renames.
Instance level: The schema-level mapping lowers to a CompiledMigration for the restrict pipeline, so instance data follows automatically.

Each level is derived from the one above, not written by hand.¹

Exercise: Functorial cascade correctness

The cascade $F_{\mathrm{theory}} \Rightarrow F_{\mathrm{schema}} \Rightarrow F_{\mathrm{instance}}$ is claimed to be automatically derived at each level. If the theory morphism $F: T_1 \to T_2$ is not injective (two sorts in $T_1$ map to the same sort in $T_2$), what happens at the schema and instance levels? Is information lost, or does the derivation fail?

Answer

Information is lost at the schema level but the derivation does not fail. If two sorts in $T_1$ map to the same sort in $T_2$, the schema-level cascade merges all vertices of both sorts into a single sort in the target. Data that distinguished the two sorts (e.g., different constraint vocabularies) is collapsed. The instance-level migration operates on the merged schema, which has fewer distinctions. The cascade is well-defined; it just produces a coarser result.

12.5.1 A concrete cross-protocol rename

Suppose you have an ATProto schema for a post and want to express the same structure as JSON Schema. The two protocols use different vocabularies:

ATProto	JSON Schema	Meaning
`record`	`object`	The root container type
`prop`	`property`	A field within a container
`string`	`string`	A text type
`record-schema`	`properties`	The edge from container to its fields

The theory-level mapping from ATProto to JSON Schema maps:

sort_map:  record -> object
op_map:    prop -> property,  record-schema -> properties

When applied to a specific ATProto schema (a blog post), the cascade produces:

Every vertex with kind record becomes kind object
Every edge with kind prop becomes kind property
Every edge with kind record-schema becomes kind properties
Vertex IDs and edge labels stay unchanged; only the vocabulary changes

The string sort is shared between both protocols (it appears in the shared sub-theory from which both are built). Shared sorts don’t need renaming; they’re identical by construction.

12.5.2 Composing across protocols

Name mappings compose transitively. Know how ATProto maps to JSON Schema, and how JSON Schema maps to Protobuf, and you get ATProto-to-Protobuf for free:

ATProto -> JSON Schema -> Protobuf
         (compose)
ATProto ──────────────── -> Protobuf

The intermediate step is eliminated algebraically. In practice, most cross-protocol paths go through a common intermediate. The building-block theories from Chapter 14 ($\text{ThGraph}$, $\text{ThConstraint}$, $\text{ThWType}$, etc.) serve as the shared vocabulary. Each protocol defines mappings to and from these building blocks. Cross-protocol composition then becomes automatic: ATProto to $\text{ThGraph}$ to SQL. The intermediate is a shared language that needs no learning because it’s internal to the engine.

12.5.3 What name mapping does not do

The machinery translates names that have known correspondences. It doesn’t discover correspondences. Given two protocols with no registered mapping, the system can’t infer that ATProto’s handle corresponds to ActivityPub’s preferredUsername. Discovery requires either a human specification or a heuristic.

The machinery makes application of correspondences automatic and verified. It leaves discovery to another mechanism. Within a single protocol’s version history, the VCS rename detection below provides the structural heuristic. Across protocols, correspondences are established once when the protocol is registered and composed thereafter.

12.6 Rename detection in VCS

When two schema versions are committed to panproto-vcs (Chapter 10), the system heuristically detects renames between them. This addresses the rename-vs-drop ambiguity: when a vertex disappears from the old schema and a structurally similar vertex appears in the new schema, is it a rename or a deletion followed by an addition?

The VCS computes a confidence score for each candidate rename pair based on four structural signals:

Table 12.3: Rename detection confidence signals.

Signal	Weight	What it measures
Same vertex kind	+0.3	Type classification matches
Same outgoing edge set	+0.3	Same fields/relationships
Same incoming edge set	+0.2	Same containment structure
Edit distance of names ≤ 3	+0.2	Names are similar strings

The maximum score is 1.0. Candidates are evaluated exhaustively over all (disappeared, appeared) vertex pairs. Conflicting assignments are resolved by taking the highest-confidence match.

12.6.1 Detection output

$ prot diff v1.json v2.json --detect-renames
Detected renames (confidence > 0.7):
  text -> body  (1.0: same kind, same edges, edit distance 0/4)

Possible renames (confidence 0.5-0.7):
  authorName -> displayName  (0.6: same kind, different edges)
  ? Treat as rename? [y/n]

High-confidence renames (above the threshold, default 0.7) are automatically incorporated into the migration as correspondences rather than as separate deletions and additions. Low-confidence candidates are presented to the user for confirmation, showing confidence score and structural evidence. Accept and the data is preserved under the new name. Reject and the change is treated as deletion plus addition (the old field’s data is dropped; the new field starts empty).

12.6.2 What the structural heuristic catches

The four signals detect renames that preserve structure:

text -> body scores 1.0: same kind (string), same outgoing edges (none; it’s a leaf), same incoming edges, and names have edit distance 4 (structural signals alone give 0.8).
user -> account scores 0.8: same kind (record), same outgoing edges (both have name, email children), same incoming edges (root), names dissimilar.
email -> contact_email scores 0.7: same kind (string), same outgoing and incoming edges, names share email as a substring.

12.6.3 What the structural heuristic misses

The heuristic is entirely structural; it has no semantic knowledge:

patient.dob (date of birth) and patient.date (date of last visit) both score 0.7: same kind (date), same edges, names share dat. A human would recognize these as different concepts.
status moved from user to order as account status versus order status. The vertex kind is the same (string) but the incoming edges differ (different parent). The heuristic correctly flags this as a non-rename because the parent changed. Structure saves us here.

The threshold is configurable because projects have different tolerances. Careful naming conventions can lower the threshold (more automatic renames). Reused generic names like status, type, id should raise it (more prompts, fewer silent errors).

Exercise: Heuristic composition

If the VCS detects text -> body in commit 1 and body -> content in commit 2, each with confidence 1.0, what confidence should the composed rename text -> content have? Should confidences multiply, or is the composed rename exact because each step was individually confirmed?

Answer

The composed rename is exact (confidence 1.0). Each step was individually confirmed (either by the heuristic or by user acceptance), and once confirmed, the rename is recorded as a first-class event in the VCS history. Composition of confirmed renames is deterministic substitution, not probabilistic inference. Confidences matter only at the discovery stage; once a rename is committed, it is a fact, and composing facts does not degrade certainty.

12.7 Rename detection and VCS blame

When a rename is detected and incorporated into a migration, the VCS commit records it. This means schema blame can trace a field’s name backward through history:

$ schema blame --element-type edge body
Commit  Author         Date        Field name
abc123  alice          2025-03-01  body  (renamed from text)
def456  initial        2025-01-15  text  (created)

The rename is a first-class event in the history, not an inference. Once confirmed (by the heuristic or by the user), it’s recorded permanently. Future blame queries trace through renames as git blame traces through file renames.

Every rename detected and confirmed becomes part of the VCS history (Chapter 10). The commit graph records not just what changed but how it corresponds:

schema log shows renames as explicit events in the commit history
schema blame traces field names backward through renames, following the mapping chain
schema bisect can find the commit that introduced a specific rename
schema merge composes renames from two branches (if branch A renamed text -> body and branch B renamed text -> content, the merge detects a conflict: two renames of the same source)

12.8 Renames and the existence checker

The existence checker (Chapter 6) validates that a rename is consistent with the migration’s structural constraints:

Kind consistency: If you rename a vertex’s kind from "string" to "integer", the checker flags this as a kind change, not a rename. The vertex map sends a string vertex to a vertex with kind integer, which violates kind consistency unless the protocols allow it.
Constraint compatibility: If you rename a constraint sort from "maxLength" to "maxSize", the checker verifies that the constraint’s semantics are preserved (both describe an upper bound on a numeric property). Renaming "maxLength" to "minLength" would fail; the sort changed semantics.
Edge compatibility: If you rename an edge’s kind from "prop" to "field-of", the checker verifies that the target protocol’s edge rules allow "field-of" edges between the same vertex kinds that "prop" connected.

Renames that pass existence checking are guaranteed to be structure-preserving. The lens laws hold. The migration can be compiled and applied with no risk of corruption. The only question the checker can’t answer is whether the rename is semantically right: whether text -> body is a genuine rename or a name collision.

12.9 The naming problem, restated

panproto doesn’t make the naming problem disappear. The problem (that different systems use different names for the same concepts) is inherent to distributed, independently-evolving systems. What panproto does is change the shape of the problem.

Without panproto, a naming disagreement between two systems requires writing adapter code: a function translating every field name in both directions, maintained by hand as both systems evolve. The cost is $O(n)$ in fields, $O(m)$ in system pairs, and must be updated every time either system changes: $O(n \cdot m \cdot \mathrm{versions})$.

With panproto, a naming disagreement is a structure-preserving map $\phi$ that translates vocabulary while preserving structural invariants. These maps compose (eliminating intermediate translations), are verified by the existence checker, and are tracked by the VCS. The cost is $O(1)$ per version change; each version declares its renames as combinators and composition handles the rest.

You still need someone (or something) to determine that handle and preferredUsername refer to the same concept. But once that determination is made, the bookkeeping (propagating the correspondence through schemas, migrations, instances, and version history) is handled by the engine. The human makes the judgment call once. The algebra handles the rest.

Lynch, Owen, Kris Brown, James Fairbanks, and Evan Patterson. 2024. “GATlab: Modeling and Programming with Generalized Algebraic Theories.” arXiv Preprint arXiv:2404.04837, ahead of print. https://doi.org/10.48550/arXiv.2404.04837.

Spivak, David I. 2012. “Functorial Data Migration.” Information and Computation 217: 31–51. https://arxiv.org/abs/1009.1166.

This cascading derivation is what Spivak (2012) calls functorial data migration. Each level is automatically and correctly derived from the level above, so you specify only the top-level theory mapping. See Appendix A.↩︎

# Names Across Protocol Boundaries {#sec-names} Imagine you're integrating Bluesky (which calls user handles `handle`), Mastodon (which calls them `acct`), and your internal API (which calls them `username`). The same concept, three names. Multiply this by dozens of fields across dozens of protocols, and you've got a persistent source of integration bugs. The fundamental problem isn't that these names differ—it's that no system tracks how they correspond. panproto solves this by formalizing *relationships between names* as structure-preserving maps. Two protocols can call the same concept by different names. The system tracks how those names correspond through bidirectional, composable, verifiable transformations: $\phi: \mathrm{name}_A \Rightarrow \mathrm{name}_B$. A field rename from `text` to `body` becomes a lens. A protocol-level vocabulary shift (ATProto's `record` to JSON Schema's `object`) becomes a theory morphism. Both are handled by the same mechanism. ## Where names live {#sec-naming-sites} Names appear at nine different sites in panproto. Each site is a position where the naming function $\mathrm{name}: \mathrm{Site} \times \mathrm{Element} \to \mathrm{String}$ assigns a string label to a structural element: | Site | Example | What it labels | |------|---------|----------------| | Edge label | `"text"` to `"body"` | Field/property names | | Vertex ID | `"post:body.text"` | Structural identifiers for schema elements | | Vertex kind | `"string"`, `"object"` | Type classification (from the protocol's theory) | | Edge kind | `"prop"`, `"field-of"` | Relationship types (from the protocol's theory) | | NSID | `"app.bsky.feed.post"` | Namespace identifiers (ATProto-specific) | | Constraint sort | `"maxLength"` | Validation property names | | Instance anchor | `"post:body.text"` | Data-to-schema references in $W$-type instances | | Theory name | `"ThATProtoSchema"` | GAT theory identifiers | | Sort name | `"Vertex"`, `"Node"` | GAT sort identifiers | : The nine naming sites in panproto. {#tbl-naming-sites} When you rename a field in a migration (@sec-lifting-data), you're changing an edge label. When a theory-level mapping translates ATProto sorts to JSON Schema sorts, it changes sort names and vertex kinds. Every naming transformation—from a simple field rename to a cross-protocol vocabulary translation—operates on one or more of these nine sites. ## Identity vs. name {#sec-identity-vs-name} panproto separates *identity* (which element is this?) from *name* (what do we call it?). Following the [GATlab](https://algebraicjulia.github.io/GATlab.jl/) design [@lynch2024], an identifier has three components: - A **scope tag**: which theory or schema this element belongs to - A **positional index**: which element within that scope (a numeric position, not a name) - A **display name**: the human-readable label Equality and hashing use only scope and index. The display name is metadata. Rename a field and the `HashMap<Ident, _>` entry stays put without rehashing. This distinction solves a real coordination problem. In a system where identity *is* the name, renaming a field from `text` to `body` forces every consumer to update simultaneously—the schema, migrations, compiled code, lenses, all of it. A partial rename breaks everything. panproto sidesteps this. The rename changes only the display name; the positional index (which is what actually matters for identity) stays constant. The schema, migration, compiled migration, and lens all reference `(scope, index)`, which hasn't changed. The display name gets used only when serializing to a wire format or showing something to a human. Think of it like git's content-addressing. Rename a file and git still tracks the same content via its SHA. Rename a vertex and panproto tracks the same structural element via its positional index. :::{.callout-caution} ## Exercise: Identity stability under composition If element $e$ has identity $(\mathrm{scope}_1, 3)$ in theory $T_1$ and the colimit $T_1 +_S T_2$ produces a new theory, does $e$ keep the same positional index? Or does the colimit reindex elements, and if so, how does panproto maintain the correspondence? ::: ::: {.callout-tip collapse=true} ## Answer No. The colimit reindexes to produce a single consistent numbering for the merged theory. Element $e$ from $T_1$ receives a new positional index in $T_1 +_S T_2$. panproto maintains the correspondence through the *composition legs*: the theory morphisms $T_1 \to T_1 +_S T_2$ and $T_2 \to T_1 +_S T_2$ record exactly where each original element landed. The scope tag changes, but the composition leg provides the lookup. ::: ### When identity breaks down The identity/name separation assumes you know that two elements across schemas are "the same thing." When you write a migration that maps v1's `Text` vertex to v2's `Text` vertex, you're asserting that these are the same structural element. panproto can verify that this assertion is *consistent* (via `check_existence`), but it can't determine whether it is *correct*. Two vertices with the same kind and constraints might correspond to different real-world concepts. Two vertices with different names might correspond to the same concept. This is the rename-vs-drop ambiguity from @sec-breaking-changes. When a vertex disappears and a different vertex appears in the next version, structural evidence alone can't tell a rename (data-preserving) from a drop-and-add (destructive). The identity system tracks correspondences once they're established, but can't discover them on its own. Discovery requires either a human decision or a heuristic—the VCS rename detection below provides the structural heuristic. ## Rename combinators {#sec-rename-combinators} panproto provides a rename combinator for each naming site. Each is a lens (@sec-lenses) with an empty complement: renaming is always lossless and the round-trip laws hold trivially. | Combinator | Site | What it changes | |------------|------|-----------------| | `RenameField { old, new }` | Edge label | A single field/property name | | `RenameVertex { old_id, new_id }` | Vertex ID | A structural identifier; cascades to all edges, constraints, required sets, variants, and hyper-edges that reference the vertex | | `RenameKind { vertex_id, new_kind }` | Vertex kind | The type classification of a single vertex | | `RenameEdgeKind { old_kind, new_kind }` | Edge kind | All edges matching the old kind | | `RenameNsid { vertex_id, new_nsid }` | NSID | The namespace identifier on a vertex | | `RenameConstraintSort { old_sort, new_sort }` | Constraint sort | Validation vocabulary (e.g., `"maxLength"` to `"max-length"`) | | `Rename { site, old, new }` | Any | Unified combinator that dispatches to the specific rename | : Rename combinators. All are lossless (empty complement). {#tbl-rename-combinators} `RenameVertex` deserves special attention because it cascades. Renaming `"post:body.text"` to `"post:content.text"` updates not just the vertex table but every edge, constraint, required-edge set, variant, ordering, and recursion point that references that ID. The cascade is computed from the schema's adjacency indices—the same precomputed `outgoing`, `incoming`, and `between` maps that make migration fast. ## A worked example: field renaming across versions Suppose v1 of a blog post schema has a field `text` and v2 renames it to `body`. The v1 schema has a vertex $\mathrm{Text}$ (kind: `string`) connected to $\mathrm{Post}$ via an edge labeled `text`. In v2, the same vertex and edge exist, but the edge is labeled `body`. The migration specifies: - $\mathrm{Post} \mapsto \mathrm{Post}$, $\mathrm{Text} \mapsto \mathrm{Text}$ (vertex map) - `text` $\mapsto$ `body` (edge map) This is expressible as a single combinator: ``` RenameField { old: "text", new: "body" } ``` The lens built from this combinator has: - **get**: Takes a v2 record with field `body`, produces a v1 record with field `text`. The complement is empty; no data discarded. - **put**: Takes a v1 record with field `text`, produces a v2 record with field `body`. Round-tripping: $\mathrm{put}(\mathrm{get}(r)) = r$. The data is identical, only the label changes. Now suppose v3 renames `body` to `content`: ``` RenameField { old: "body", new: "content" } ``` Compose these two lenses and the intermediate step vanishes algebraically. The net effect is `text` to `content`: ``` compose( RenameField { old: "text", new: "body" }, RenameField { old: "body", new: "content" } ) = RenameField { old: "text", new: "content" } ``` Each version declares its renames as combinators. Composition eliminates the intermediate history. A client understanding v1 can consume v3 data by composing two rename lenses; composition is computed once and compiled; applying it to data is just a label substitution. ## Name mapping across protocols {#sec-name-mapping-tower} Rename combinators work at the schema level, within a single protocol. But ATProto calls a type `record`, SQL calls it `table`, Protobuf calls it `message`, and GraphQL calls it `ObjectType`. These aren't schema-level differences. They're *theory-level* differences: the vocabulary of sorts and operations that define each protocol's schema theory. A theory-level mapping $F: T_1 \to T_2$ maps sorts and operations between theories: ``` sort_map: Vertex -> Node, Edge -> Arrow op_map: src -> source, tgt -> target ``` When applied as a combinator (`ApplyTheoryMorphism`), this single mapping cascades through three levels: 1. **Theory level**: Renames sorts and operations. 2. **Schema level**: Every vertex with kind `"Vertex"` becomes kind `"Node"`. Every edge with kind `"src"` becomes `"source"`. The result records all induced renames. 3. **Instance level**: The schema-level mapping lowers to a `CompiledMigration` for the restrict pipeline, so instance data follows automatically. Each level is derived from the one above, not written by hand.^[This cascading derivation is what @spivak2012 calls functorial data migration. Each level is automatically and correctly derived from the level above, so you specify only the top-level theory mapping. See [Appendix A](../appendices/A-formal-foundations.qmd).] :::{.callout-caution} ## Exercise: Functorial cascade correctness The cascade $F_{\mathrm{theory}} \Rightarrow F_{\mathrm{schema}} \Rightarrow F_{\mathrm{instance}}$ is claimed to be automatically derived at each level. If the theory morphism $F: T_1 \to T_2$ is not injective (two sorts in $T_1$ map to the same sort in $T_2$), what happens at the schema and instance levels? Is information lost, or does the derivation fail? ::: ::: {.callout-tip collapse=true} ## Answer Information is lost at the schema level but the derivation does not fail. If two sorts in $T_1$ map to the same sort in $T_2$, the schema-level cascade merges all vertices of both sorts into a single sort in the target. Data that distinguished the two sorts (e.g., different constraint vocabularies) is collapsed. The instance-level migration operates on the merged schema, which has fewer distinctions. The cascade is well-defined; it just produces a coarser result. ::: ### A concrete cross-protocol rename Suppose you have an ATProto schema for a post and want to express the same structure as JSON Schema. The two protocols use different vocabularies: | ATProto | JSON Schema | Meaning | |---------|-------------|---------| | `record` | `object` | The root container type | | `prop` | `property` | A field within a container | | `string` | `string` | A text type | | `record-schema` | `properties` | The edge from container to its fields | The theory-level mapping from ATProto to JSON Schema maps: ``` sort_map: record -> object op_map: prop -> property, record-schema -> properties ``` When applied to a specific ATProto schema (a blog post), the cascade produces: - Every vertex with kind `record` becomes kind `object` - Every edge with kind `prop` becomes kind `property` - Every edge with kind `record-schema` becomes kind `properties` - Vertex IDs and edge labels stay unchanged; only the *vocabulary* changes The `string` sort is shared between both protocols (it appears in the shared sub-theory from which both are built). Shared sorts don't need renaming; they're identical by construction. ### Composing across protocols Name mappings compose transitively. Know how ATProto maps to JSON Schema, and how JSON Schema maps to Protobuf, and you get ATProto-to-Protobuf for free: ``` ATProto -> JSON Schema -> Protobuf (compose) ATProto ──────────────── -> Protobuf ``` The intermediate step is eliminated algebraically. In practice, most cross-protocol paths go through a common intermediate. The building-block theories from @sec-self-description ($\text{ThGraph}$, $\text{ThConstraint}$, $\text{ThWType}$, etc.) serve as the shared vocabulary. Each protocol defines mappings to and from these building blocks. Cross-protocol composition then becomes automatic: ATProto to $\text{ThGraph}$ to SQL. The intermediate is a shared language that needs no learning because it's internal to the engine. ### What name mapping does not do The machinery translates names that have known correspondences. It doesn't *discover* correspondences. Given two protocols with no registered mapping, the system can't infer that ATProto's `handle` corresponds to ActivityPub's `preferredUsername`. Discovery requires either a human specification or a heuristic. The machinery makes *application* of correspondences automatic and verified. It leaves *discovery* to another mechanism. Within a single protocol's version history, the VCS rename detection below provides the structural heuristic. Across protocols, correspondences are established once when the protocol is registered and composed thereafter. ## Rename detection in VCS {#sec-rename-detection} When two schema versions are committed to [panproto-vcs](https://github.com/aaronstevenwhite/panproto) (@sec-version-control), the system heuristically detects renames between them. This addresses the rename-vs-drop ambiguity: when a vertex disappears from the old schema and a structurally similar vertex appears in the new schema, is it a rename or a deletion followed by an addition? The VCS computes a confidence score for each candidate rename pair based on four structural signals: | Signal | Weight | What it measures | |--------|--------|-----------------| | Same vertex kind | +0.3 | Type classification matches | | Same outgoing edge set | +0.3 | Same fields/relationships | | Same incoming edge set | +0.2 | Same containment structure | | Edit distance of names ≤ 3 | +0.2 | Names are similar strings | : Rename detection confidence signals. {#tbl-rename-confidence} The maximum score is 1.0. Candidates are evaluated exhaustively over all (disappeared, appeared) vertex pairs. Conflicting assignments are resolved by taking the highest-confidence match. ### Detection output ``` $ prot diff v1.json v2.json --detect-renames Detected renames (confidence > 0.7): text -> body (1.0: same kind, same edges, edit distance 0/4) Possible renames (confidence 0.5-0.7): authorName -> displayName (0.6: same kind, different edges) ? Treat as rename? [y/n] ``` High-confidence renames (above the threshold, default 0.7) are automatically incorporated into the migration as correspondences rather than as separate deletions and additions. Low-confidence candidates are presented to the user for confirmation, showing confidence score and structural evidence. Accept and the data is preserved under the new name. Reject and the change is treated as deletion plus addition (the old field's data is dropped; the new field starts empty). ### What the structural heuristic catches The four signals detect renames that preserve structure: - `text -> body` scores 1.0: same kind (`string`), same outgoing edges (none; it's a leaf), same incoming edges, and names have edit distance 4 (structural signals alone give 0.8). - `user -> account` scores 0.8: same kind (`record`), same outgoing edges (both have `name`, `email` children), same incoming edges (root), names dissimilar. - `email -> contact_email` scores 0.7: same kind (`string`), same outgoing and incoming edges, names share `email` as a substring. ### What the structural heuristic misses The heuristic is entirely structural; it has no semantic knowledge: - `patient.dob` (date of birth) and `patient.date` (date of last visit) both score 0.7: same kind (`date`), same edges, names share `dat`. A human would recognize these as different concepts. - `status` moved from `user` to `order` as account status versus order status. The vertex kind is the same (`string`) but the incoming edges differ (different parent). The heuristic correctly flags this as a non-rename because the parent changed. Structure saves us here. The threshold is configurable because projects have different tolerances. Careful naming conventions can lower the threshold (more automatic renames). Reused generic names like `status`, `type`, `id` should raise it (more prompts, fewer silent errors). :::{.callout-caution} ## Exercise: Heuristic composition If the VCS detects `text -> body` in commit 1 and `body -> content` in commit 2, each with confidence 1.0, what confidence should the composed rename `text -> content` have? Should confidences multiply, or is the composed rename exact because each step was individually confirmed? ::: ::: {.callout-tip collapse=true} ## Answer The composed rename is exact (confidence 1.0). Each step was individually confirmed (either by the heuristic or by user acceptance), and once confirmed, the rename is recorded as a first-class event in the VCS history. Composition of confirmed renames is deterministic substitution, not probabilistic inference. Confidences matter only at the *discovery* stage; once a rename is committed, it is a fact, and composing facts does not degrade certainty. ::: ## Rename detection and VCS blame {#sec-rename-blame} When a rename is detected and incorporated into a migration, the VCS commit records it. This means `schema blame` can trace a field's name backward through history: ``` $ schema blame --element-type edge body Commit Author Date Field name abc123 alice 2025-03-01 body (renamed from text) def456 initial 2025-01-15 text (created) ``` The rename is a first-class event in the history, not an inference. Once confirmed (by the heuristic or by the user), it's recorded permanently. Future blame queries trace through renames as `git blame` traces through file renames. Every rename detected and confirmed becomes part of the VCS history (@sec-version-control). The commit graph records not just *what* changed but *how* it corresponds: - `schema log` shows renames as explicit events in the commit history - `schema blame` traces field names backward through renames, following the mapping chain - `schema bisect` can find the commit that introduced a specific rename - `schema merge` composes renames from two branches (if branch A renamed `text -> body` and branch B renamed `text -> content`, the merge detects a conflict: two renames of the same source) ## Renames and the existence checker The existence checker (@sec-when-migrations-break) validates that a rename is consistent with the migration's structural constraints: - **Kind consistency**: If you rename a vertex's kind from `"string"` to `"integer"`, the checker flags this as a kind change, not a rename. The vertex map sends a `string` vertex to a vertex with kind `integer`, which violates kind consistency unless the protocols allow it. - **Constraint compatibility**: If you rename a constraint sort from `"maxLength"` to `"maxSize"`, the checker verifies that the constraint's semantics are preserved (both describe an upper bound on a numeric property). Renaming `"maxLength"` to `"minLength"` would fail; the sort changed semantics. - **Edge compatibility**: If you rename an edge's kind from `"prop"` to `"field-of"`, the checker verifies that the target protocol's edge rules allow `"field-of"` edges between the same vertex kinds that `"prop"` connected. Renames that pass existence checking are guaranteed to be structure-preserving. The lens laws hold. The migration can be compiled and applied with no risk of corruption. The only question the checker can't answer is whether the rename is *semantically* right: whether `text -> body` is a genuine rename or a name collision. ## The naming problem, restated panproto doesn't make the naming problem disappear. The problem (that different systems use different names for the same concepts) is inherent to distributed, independently-evolving systems. What panproto does is change the *shape* of the problem. Without panproto, a naming disagreement between two systems requires writing adapter code: a function translating every field name in both directions, maintained by hand as both systems evolve. The cost is $O(n)$ in fields, $O(m)$ in system pairs, and must be updated every time either system changes: $O(n \cdot m \cdot \mathrm{versions})$. With panproto, a naming disagreement is a structure-preserving map $\phi$ that translates vocabulary while preserving structural invariants. These maps compose (eliminating intermediate translations), are verified by the existence checker, and are tracked by the VCS. The cost is $O(1)$ per version change; each version declares its renames as combinators and composition handles the rest. You still need someone (or something) to determine that `handle` and `preferredUsername` refer to the same concept. But once that determination is made, the bookkeeping (propagating the correspondence through schemas, migrations, instances, and version history) is handled by the engine. The human makes the judgment call once. The algebra handles the rest.