12 Names Across Protocol Boundaries
Imagine you’re integrating Bluesky (which calls user handles handle), Mastodon (which calls them acct), and your internal API (which calls them username). The same concept, three names. Multiply this by dozens of fields across dozens of protocols, and you’ve got a persistent source of integration bugs. The fundamental problem isn’t that these names differ—it’s that no system tracks how they correspond.
panproto solves this by formalizing relationships between names as structure-preserving maps. Two protocols can call the same concept by different names. The system tracks how those names correspond through bidirectional, composable, verifiable transformations: \(\phi: \mathrm{name}_A \Rightarrow \mathrm{name}_B\). A field rename from text to body becomes a lens. A protocol-level vocabulary shift (ATProto’s record to JSON Schema’s object) becomes a theory morphism. Both are handled by the same mechanism.
12.1 Where names live
Names appear at nine different sites in panproto. Each site is a position where the naming function \(\mathrm{name}: \mathrm{Site} \times \mathrm{Element} \to \mathrm{String}\) assigns a string label to a structural element:
| Site | Example | What it labels |
|---|---|---|
| Edge label | "text" to "body" |
Field/property names |
| Vertex ID | "post:body.text" |
Structural identifiers for schema elements |
| Vertex kind | "string", "object" |
Type classification (from the protocol’s theory) |
| Edge kind | "prop", "field-of" |
Relationship types (from the protocol’s theory) |
| NSID | "app.bsky.feed.post" |
Namespace identifiers (ATProto-specific) |
| Constraint sort | "maxLength" |
Validation property names |
| Instance anchor | "post:body.text" |
Data-to-schema references in \(W\)-type instances |
| Theory name | "ThATProtoSchema" |
GAT theory identifiers |
| Sort name | "Vertex", "Node" |
GAT sort identifiers |
When you rename a field in a migration (Chapter 9), you’re changing an edge label. When a theory-level mapping translates ATProto sorts to JSON Schema sorts, it changes sort names and vertex kinds. Every naming transformation—from a simple field rename to a cross-protocol vocabulary translation—operates on one or more of these nine sites.
12.2 Identity vs. name
panproto separates identity (which element is this?) from name (what do we call it?). Following the GATlab design (Lynch et al. 2024), an identifier has three components:
- A scope tag: which theory or schema this element belongs to
- A positional index: which element within that scope (a numeric position, not a name)
- A display name: the human-readable label
Equality and hashing use only scope and index. The display name is metadata. Rename a field and the HashMap<Ident, _> entry stays put without rehashing.
This distinction solves a real coordination problem. In a system where identity is the name, renaming a field from text to body forces every consumer to update simultaneously—the schema, migrations, compiled code, lenses, all of it. A partial rename breaks everything. panproto sidesteps this. The rename changes only the display name; the positional index (which is what actually matters for identity) stays constant. The schema, migration, compiled migration, and lens all reference (scope, index), which hasn’t changed. The display name gets used only when serializing to a wire format or showing something to a human.
Think of it like git’s content-addressing. Rename a file and git still tracks the same content via its SHA. Rename a vertex and panproto tracks the same structural element via its positional index.
If element \(e\) has identity \((\mathrm{scope}_1, 3)\) in theory \(T_1\) and the colimit \(T_1 +_S T_2\) produces a new theory, does \(e\) keep the same positional index? Or does the colimit reindex elements, and if so, how does panproto maintain the correspondence?
No. The colimit reindexes to produce a single consistent numbering for the merged theory. Element \(e\) from \(T_1\) receives a new positional index in \(T_1 +_S T_2\). panproto maintains the correspondence through the composition legs: the theory morphisms \(T_1 \to T_1 +_S T_2\) and \(T_2 \to T_1 +_S T_2\) record exactly where each original element landed. The scope tag changes, but the composition leg provides the lookup.
12.2.1 When identity breaks down
The identity/name separation assumes you know that two elements across schemas are “the same thing.” When you write a migration that maps v1’s Text vertex to v2’s Text vertex, you’re asserting that these are the same structural element. panproto can verify that this assertion is consistent (via check_existence), but it can’t determine whether it is correct. Two vertices with the same kind and constraints might correspond to different real-world concepts. Two vertices with different names might correspond to the same concept.
This is the rename-vs-drop ambiguity from Chapter 8. When a vertex disappears and a different vertex appears in the next version, structural evidence alone can’t tell a rename (data-preserving) from a drop-and-add (destructive). The identity system tracks correspondences once they’re established, but can’t discover them on its own. Discovery requires either a human decision or a heuristic—the VCS rename detection below provides the structural heuristic.
12.3 Rename combinators
panproto provides a rename combinator for each naming site. Each is a lens (Chapter 7) with an empty complement: renaming is always lossless and the round-trip laws hold trivially.
| Combinator | Site | What it changes |
|---|---|---|
RenameField { old, new } |
Edge label | A single field/property name |
RenameVertex { old_id, new_id } |
Vertex ID | A structural identifier; cascades to all edges, constraints, required sets, variants, and hyper-edges that reference the vertex |
RenameKind { vertex_id, new_kind } |
Vertex kind | The type classification of a single vertex |
RenameEdgeKind { old_kind, new_kind } |
Edge kind | All edges matching the old kind |
RenameNsid { vertex_id, new_nsid } |
NSID | The namespace identifier on a vertex |
RenameConstraintSort { old_sort, new_sort } |
Constraint sort | Validation vocabulary (e.g., "maxLength" to "max-length") |
Rename { site, old, new } |
Any | Unified combinator that dispatches to the specific rename |
RenameVertex deserves special attention because it cascades. Renaming "post:body.text" to "post:content.text" updates not just the vertex table but every edge, constraint, required-edge set, variant, ordering, and recursion point that references that ID. The cascade is computed from the schema’s adjacency indices—the same precomputed outgoing, incoming, and between maps that make migration fast.
12.4 A worked example: field renaming across versions
Suppose v1 of a blog post schema has a field text and v2 renames it to body.
The v1 schema has a vertex \(\mathrm{Text}\) (kind: string) connected to \(\mathrm{Post}\) via an edge labeled text. In v2, the same vertex and edge exist, but the edge is labeled body.
The migration specifies:
- \(\mathrm{Post} \mapsto \mathrm{Post}\), \(\mathrm{Text} \mapsto \mathrm{Text}\) (vertex map)
text\(\mapsto\)body(edge map)
This is expressible as a single combinator:
RenameField { old: "text", new: "body" }
The lens built from this combinator has:
- get: Takes a v2 record with field
body, produces a v1 record with fieldtext. The complement is empty; no data discarded. - put: Takes a v1 record with field
text, produces a v2 record with fieldbody.
Round-tripping: \(\mathrm{put}(\mathrm{get}(r)) = r\). The data is identical, only the label changes.
Now suppose v3 renames body to content:
RenameField { old: "body", new: "content" }
Compose these two lenses and the intermediate step vanishes algebraically. The net effect is text to content:
compose(
RenameField { old: "text", new: "body" },
RenameField { old: "body", new: "content" }
) = RenameField { old: "text", new: "content" }
Each version declares its renames as combinators. Composition eliminates the intermediate history. A client understanding v1 can consume v3 data by composing two rename lenses; composition is computed once and compiled; applying it to data is just a label substitution.
12.5 Name mapping across protocols
Rename combinators work at the schema level, within a single protocol. But ATProto calls a type record, SQL calls it table, Protobuf calls it message, and GraphQL calls it ObjectType. These aren’t schema-level differences. They’re theory-level differences: the vocabulary of sorts and operations that define each protocol’s schema theory.
A theory-level mapping \(F: T_1 \to T_2\) maps sorts and operations between theories:
sort_map: Vertex -> Node, Edge -> Arrow
op_map: src -> source, tgt -> target
When applied as a combinator (ApplyTheoryMorphism), this single mapping cascades through three levels:
- Theory level: Renames sorts and operations.
- Schema level: Every vertex with kind
"Vertex"becomes kind"Node". Every edge with kind"src"becomes"source". The result records all induced renames. - Instance level: The schema-level mapping lowers to a
CompiledMigrationfor the restrict pipeline, so instance data follows automatically.
Each level is derived from the one above, not written by hand.1
The cascade \(F_{\mathrm{theory}} \Rightarrow F_{\mathrm{schema}} \Rightarrow F_{\mathrm{instance}}\) is claimed to be automatically derived at each level. If the theory morphism \(F: T_1 \to T_2\) is not injective (two sorts in \(T_1\) map to the same sort in \(T_2\)), what happens at the schema and instance levels? Is information lost, or does the derivation fail?
Information is lost at the schema level but the derivation does not fail. If two sorts in \(T_1\) map to the same sort in \(T_2\), the schema-level cascade merges all vertices of both sorts into a single sort in the target. Data that distinguished the two sorts (e.g., different constraint vocabularies) is collapsed. The instance-level migration operates on the merged schema, which has fewer distinctions. The cascade is well-defined; it just produces a coarser result.
12.5.1 A concrete cross-protocol rename
Suppose you have an ATProto schema for a post and want to express the same structure as JSON Schema. The two protocols use different vocabularies:
| ATProto | JSON Schema | Meaning |
|---|---|---|
record |
object |
The root container type |
prop |
property |
A field within a container |
string |
string |
A text type |
record-schema |
properties |
The edge from container to its fields |
The theory-level mapping from ATProto to JSON Schema maps:
sort_map: record -> object
op_map: prop -> property, record-schema -> properties
When applied to a specific ATProto schema (a blog post), the cascade produces:
- Every vertex with kind
recordbecomes kindobject - Every edge with kind
propbecomes kindproperty - Every edge with kind
record-schemabecomes kindproperties - Vertex IDs and edge labels stay unchanged; only the vocabulary changes
The string sort is shared between both protocols (it appears in the shared sub-theory from which both are built). Shared sorts don’t need renaming; they’re identical by construction.
12.5.2 Composing across protocols
Name mappings compose transitively. Know how ATProto maps to JSON Schema, and how JSON Schema maps to Protobuf, and you get ATProto-to-Protobuf for free:
ATProto -> JSON Schema -> Protobuf
(compose)
ATProto ──────────────── -> Protobuf
The intermediate step is eliminated algebraically. In practice, most cross-protocol paths go through a common intermediate. The building-block theories from Chapter 14 (\(\text{ThGraph}\), \(\text{ThConstraint}\), \(\text{ThWType}\), etc.) serve as the shared vocabulary. Each protocol defines mappings to and from these building blocks. Cross-protocol composition then becomes automatic: ATProto to \(\text{ThGraph}\) to SQL. The intermediate is a shared language that needs no learning because it’s internal to the engine.
12.5.3 What name mapping does not do
The machinery translates names that have known correspondences. It doesn’t discover correspondences. Given two protocols with no registered mapping, the system can’t infer that ATProto’s handle corresponds to ActivityPub’s preferredUsername. Discovery requires either a human specification or a heuristic.
The machinery makes application of correspondences automatic and verified. It leaves discovery to another mechanism. Within a single protocol’s version history, the VCS rename detection below provides the structural heuristic. Across protocols, correspondences are established once when the protocol is registered and composed thereafter.
12.6 Rename detection in VCS
When two schema versions are committed to panproto-vcs (Chapter 10), the system heuristically detects renames between them. This addresses the rename-vs-drop ambiguity: when a vertex disappears from the old schema and a structurally similar vertex appears in the new schema, is it a rename or a deletion followed by an addition?
The VCS computes a confidence score for each candidate rename pair based on four structural signals:
| Signal | Weight | What it measures |
|---|---|---|
| Same vertex kind | +0.3 | Type classification matches |
| Same outgoing edge set | +0.3 | Same fields/relationships |
| Same incoming edge set | +0.2 | Same containment structure |
| Edit distance of names ≤ 3 | +0.2 | Names are similar strings |
The maximum score is 1.0. Candidates are evaluated exhaustively over all (disappeared, appeared) vertex pairs. Conflicting assignments are resolved by taking the highest-confidence match.
12.6.1 Detection output
$ prot diff v1.json v2.json --detect-renames
Detected renames (confidence > 0.7):
text -> body (1.0: same kind, same edges, edit distance 0/4)
Possible renames (confidence 0.5-0.7):
authorName -> displayName (0.6: same kind, different edges)
? Treat as rename? [y/n]
High-confidence renames (above the threshold, default 0.7) are automatically incorporated into the migration as correspondences rather than as separate deletions and additions. Low-confidence candidates are presented to the user for confirmation, showing confidence score and structural evidence. Accept and the data is preserved under the new name. Reject and the change is treated as deletion plus addition (the old field’s data is dropped; the new field starts empty).
12.6.2 What the structural heuristic catches
The four signals detect renames that preserve structure:
text -> bodyscores 1.0: same kind (string), same outgoing edges (none; it’s a leaf), same incoming edges, and names have edit distance 4 (structural signals alone give 0.8).user -> accountscores 0.8: same kind (record), same outgoing edges (both havename,emailchildren), same incoming edges (root), names dissimilar.email -> contact_emailscores 0.7: same kind (string), same outgoing and incoming edges, names shareemailas a substring.
12.6.3 What the structural heuristic misses
The heuristic is entirely structural; it has no semantic knowledge:
patient.dob(date of birth) andpatient.date(date of last visit) both score 0.7: same kind (date), same edges, names sharedat. A human would recognize these as different concepts.statusmoved fromusertoorderas account status versus order status. The vertex kind is the same (string) but the incoming edges differ (different parent). The heuristic correctly flags this as a non-rename because the parent changed. Structure saves us here.
The threshold is configurable because projects have different tolerances. Careful naming conventions can lower the threshold (more automatic renames). Reused generic names like status, type, id should raise it (more prompts, fewer silent errors).
If the VCS detects text -> body in commit 1 and body -> content in commit 2, each with confidence 1.0, what confidence should the composed rename text -> content have? Should confidences multiply, or is the composed rename exact because each step was individually confirmed?
The composed rename is exact (confidence 1.0). Each step was individually confirmed (either by the heuristic or by user acceptance), and once confirmed, the rename is recorded as a first-class event in the VCS history. Composition of confirmed renames is deterministic substitution, not probabilistic inference. Confidences matter only at the discovery stage; once a rename is committed, it is a fact, and composing facts does not degrade certainty.
12.7 Rename detection and VCS blame
When a rename is detected and incorporated into a migration, the VCS commit records it. This means schema blame can trace a field’s name backward through history:
$ schema blame --element-type edge body
Commit Author Date Field name
abc123 alice 2025-03-01 body (renamed from text)
def456 initial 2025-01-15 text (created)
The rename is a first-class event in the history, not an inference. Once confirmed (by the heuristic or by the user), it’s recorded permanently. Future blame queries trace through renames as git blame traces through file renames.
Every rename detected and confirmed becomes part of the VCS history (Chapter 10). The commit graph records not just what changed but how it corresponds:
schema logshows renames as explicit events in the commit historyschema blametraces field names backward through renames, following the mapping chainschema bisectcan find the commit that introduced a specific renameschema mergecomposes renames from two branches (if branch A renamedtext -> bodyand branch B renamedtext -> content, the merge detects a conflict: two renames of the same source)
12.8 Renames and the existence checker
The existence checker (Chapter 6) validates that a rename is consistent with the migration’s structural constraints:
- Kind consistency: If you rename a vertex’s kind from
"string"to"integer", the checker flags this as a kind change, not a rename. The vertex map sends astringvertex to a vertex with kindinteger, which violates kind consistency unless the protocols allow it. - Constraint compatibility: If you rename a constraint sort from
"maxLength"to"maxSize", the checker verifies that the constraint’s semantics are preserved (both describe an upper bound on a numeric property). Renaming"maxLength"to"minLength"would fail; the sort changed semantics. - Edge compatibility: If you rename an edge’s kind from
"prop"to"field-of", the checker verifies that the target protocol’s edge rules allow"field-of"edges between the same vertex kinds that"prop"connected.
Renames that pass existence checking are guaranteed to be structure-preserving. The lens laws hold. The migration can be compiled and applied with no risk of corruption. The only question the checker can’t answer is whether the rename is semantically right: whether text -> body is a genuine rename or a name collision.
12.9 The naming problem, restated
panproto doesn’t make the naming problem disappear. The problem (that different systems use different names for the same concepts) is inherent to distributed, independently-evolving systems. What panproto does is change the shape of the problem.
Without panproto, a naming disagreement between two systems requires writing adapter code: a function translating every field name in both directions, maintained by hand as both systems evolve. The cost is \(O(n)\) in fields, \(O(m)\) in system pairs, and must be updated every time either system changes: \(O(n \cdot m \cdot \mathrm{versions})\).
With panproto, a naming disagreement is a structure-preserving map \(\phi\) that translates vocabulary while preserving structural invariants. These maps compose (eliminating intermediate translations), are verified by the existence checker, and are tracked by the VCS. The cost is \(O(1)\) per version change; each version declares its renames as combinators and composition handles the rest.
You still need someone (or something) to determine that handle and preferredUsername refer to the same concept. But once that determination is made, the bookkeeping (propagating the correspondence through schemas, migrations, instances, and version history) is handled by the engine. The human makes the judgment call once. The algebra handles the rest.
This cascading derivation is what Spivak (2012) calls functorial data migration. Each level is automatically and correctly derived from the level above, so you specify only the top-level theory mapping. See Appendix A.↩︎