5 When Migrations Break

6 When Migrations Break

Not every pair of maps constitutes a valid migration. Some proposed schema changes will silently corrupt data, violate constraints, or create ambiguous reconnections. panproto refuses them before any data is touched. This chapter explains what goes wrong and how the engine catches it.

6.1 Three ways to break things

Protobuf field number reuse. You delete a field with number 3, then assign number 3 to a new field with a different type. Existing clients that still have the old .proto compiled in will decode the new field’s bytes using the old field’s wire type. The result is silently corrupted data—no error, no warning, just garbage values. Buf’s breaking-change rules exist specifically to prevent this.

ATProto constraint tightening. The app.bsky.feed.post Lexicon’s text field has maxLength: 3000 (bytes) and maxGraphemes: 300 (Unicode grapheme clusters). Suppose someone tightens maxGraphemes from 300 to 150. Every existing post with more than 150 grapheme clusters is now invalid under the new schema. The data has not changed; the schema moved out from under it.

SQL column drop with dependents. You run ALTER TABLE posts DROP COLUMN author_id, but comments.author_id has a foreign key referencing it. The database refuses: you cannot remove the column without CASCADE, and cascading deletes the foreign key constraint on a table you never intended to modify.

These three failures look different, but they share a common structure. In each case, a proposed mapping between schema versions violates an invariant that the schema language’s theory guarantees. panproto derives these invariants from the theory itself, rather than maintaining a separate rule list per protocol.

6.2 Theory-derived conditions

The engine determines which checks to apply by inspecting the schema theory $\mathbb{T}_S$ and instance theory $\mathbb{T}_I$ at runtime. Each sort or structural feature in the theory triggers a specific check. A protocol whose theories are richer (more sorts, more equations) gets more checks automatically. A minimal custom protocol with a bare graph theory gets only the basic checks.

6.2.1 Kind consistency

When the schema theory has a $\mathrm{Kind}$ sort (which classifies vertices by data type), the engine checks that every vertex in the source maps to a vertex of a compatible kind in the target.

You cannot map a string vertex to a union{string, integer} vertex if the protocol does not support implicit widening. Protobuf is strict: kinds must be equal. ATProto allows some subtyping (string is compatible with union{string}). GraphQL respects interface subtyping. The compatibility relation is part of the protocol definition, not a global rule.

6.2.2 Edge compatibility

When the schema theory has an $\text{Edge}$ sort with $\text{src}$ and $\text{tgt}$ operations, the engine checks that the edge map is compatible with the vertex map. If field text goes from Post to Text in v1, and you map Post → Post and Text → Text, then text must map to an edge that also goes from Post to Text. Mapping it to an edge from Post to Integer would break the graph structure.¹

6.2.3 Constraint compatibility

When the schema theory has a $\text{Constraint}$ sort, the engine checks that every constraint on a source vertex or edge has a compatible counterpart in the target. “Compatible” depends on the migration direction.

In the backward direction (lifting v2 data to v1), the target (v2) constraint must be at least as tight as the source (v1) constraint, so that all v2 data satisfies v1’s requirements. In the forward direction, the condition reverses: the target must be at least as loose.

The Protobuf field-number reuse and the ATProto maxGraphemes tightening both fail this check. panproto derives the check from the presence of a $\text{Constraint}$ sort, not from a hardcoded rule about Protobuf or ATProto.

6.2.4 Pushout complement uniqueness

When the schema theory allows parallel edges ($\text{ThMulti}$), removing an intermediate vertex can create ambiguity. If the schema has a path $A \to B \to C$ and the migration drops $B$, the result must connect $A$ directly to $C$. But if the target schema has multiple edges between $A$ and $C$, which one should the engine use? Without a resolver to disambiguate, the migration is ill-defined.

The engine checks for this ambiguity when the schema theory includes the $\text{ThMulti}$ component. If the pushout complement is not unique, the migration is rejected with an error that identifies the ambiguous pair and suggests providing a resolver.²

6.2.5 Signature coherence (simultaneity)

When the schema theory has a $\text{HyperEdge}$ sort (as in SQL), the engine checks that hyper-edge mappings are all-or-nothing: either all vertices of a hyperedge are mapped, or none are. You cannot partially map a foreign key; that would create a dangling connection.

In SQL terms, a compound foreign key (author_id, post_id) → (users.id, posts.id) is a hyperedge connecting three tables. If you map users and posts but drop the join table, the foreign key becomes incoherent. Either all participants survive, or the hyperedge must be explicitly removed.³

6.3 How the checks compose

A protocol whose schema theory has both a $\text{Constraint}$ sort and a $\text{HyperEdge}$ sort (like SQL) gets all the conditional checks: kind consistency, edge compatibility, constraint compatibility, signature coherence, and simultaneity. A protocol with neither (a minimal custom protocol) gets only the universal checks: vertex map validity, edge map validity, and kind consistency.

The engine assembles the check list by querying the theory’s sorts. Adding a new protocol with a novel theory composition requires no changes to the checking code.

Exercise

Consider a migration that tightens a maxLength constraint from 3000 to 1500. In the backward direction (lifting v2 data to v1), does this pass constraint compatibility? What about the forward direction (lifting v1 data to v2)?

Solution

Backward: yes. v2 data has maxLength: 1500, and any string with at most 1500 characters also satisfies maxLength: 3000. Forward: no. v1 data may contain strings up to 3000 characters, which would violate v2’s tighter constraint. This is a one-way migration; v1 can read v2 data, but v2 cannot accept all v1 data.

6.4 Existence reports

When a migration fails validation, panproto produces an ExistenceReport that lists every violated condition. Each violation includes the condition that failed, the specific elements involved (which vertex, which edge, which constraint), and a human-readable explanation of why it failed.

The report is structured, not a wall of text. CI pipelines can parse it programmatically, and the CLI formats it as a table:

Migration v1 → v2: 2 existence errors

  1. CONSTRAINT_TIGHTENING
     vertex: text (kind: string)
     constraint: maxGraphemes
     source: 300, target: 150
     Migration direction: backward
     → v2 data with >150 graphemes would violate v1 constraint

  2. REACHABILITY_RISK
     vertex: Author (kind: object)
     Intermediate vertex dropped; subtree may become disconnected
     → provide a resolver or restructure the migration

The report tells you not just what went wrong but what to do about it. A constraint tightening can be resolved by choosing the forward direction instead (if that is acceptable) or by adding a coercion step. A reachability risk can be resolved by providing a resolver that specifies how to reconnect orphaned children.

The next chapter introduces bidirectional lenses, which extend the one-directional lifts we have been writing into round-trippable transformations that remember what they discarded.

Barr, Michael, and Charles Wells. 1990. Category Theory for Computing Science. Prentice Hall.

Ehrig, Hartmut, Karsten Ehrig, Ulrike Prange, and Gabriele Taentzer. 2006. Fundamentals of Algebraic Graph Transformation. Monographs in Theoretical Computer Science. Springer. https://doi.org/10.1007/3-540-31188-2.

This is the naturality condition: $f_E$ and $f_V$ must form a graph homomorphism.↩︎
In algebraic graph theory, this is called the pushout complement uniqueness condition (Ehrig et al. 2006).↩︎
This is the simultaneity condition for hypergraph morphisms (Barr and Wells 1990).↩︎

--- title: "When Migrations Break" id: sec-when-migrations-break --- # When Migrations Break {#sec-when-migrations-break} Not every pair of maps constitutes a valid migration. Some proposed schema changes will silently corrupt data, violate constraints, or create ambiguous reconnections. panproto refuses them before any data is touched. This chapter explains what goes wrong and how the engine catches it. ## Three ways to break things **Protobuf field number reuse.** You delete a field with number 3, then assign number 3 to a new field with a different type. Existing clients that still have the old `.proto` compiled in will decode the new field's bytes using the old field's wire type. The result is silently corrupted data---no error, no warning, just garbage values. [Buf](https://buf.build/)'s breaking-change rules exist specifically to prevent this. **ATProto constraint tightening.** The `app.bsky.feed.post` Lexicon's `text` field has `maxLength: 3000` (bytes) and `maxGraphemes: 300` (Unicode grapheme clusters). Suppose someone tightens `maxGraphemes` from 300 to 150. Every existing post with more than 150 grapheme clusters is now invalid under the new schema. The data has not changed; the schema moved out from under it. **SQL column drop with dependents.** You run `ALTER TABLE posts DROP COLUMN author_id`, but `comments.author_id` has a foreign key referencing it. The database refuses: you cannot remove the column without `CASCADE`, and cascading deletes the foreign key constraint on a table you never intended to modify. These three failures look different, but they share a common structure. In each case, a proposed mapping between schema versions violates an invariant that the schema language's theory guarantees. panproto derives these invariants from the theory itself, rather than maintaining a separate rule list per protocol. ## Theory-derived conditions The engine determines which checks to apply by inspecting the schema theory $\mathbb{T}_S$ and instance theory $\mathbb{T}_I$ at runtime. Each sort or structural feature in the theory triggers a specific check. A protocol whose theories are richer (more sorts, more equations) gets more checks automatically. A minimal custom protocol with a bare graph theory gets only the basic checks. ### Kind consistency When the schema theory has a $\mathrm{Kind}$ sort (which classifies vertices by data type), the engine checks that every vertex in the source maps to a vertex of a compatible kind in the target. You cannot map a `string` vertex to a `union{string, integer}` vertex if the protocol does not support implicit widening. Protobuf is strict: kinds must be equal. ATProto allows some subtyping (`string` is compatible with `union{string}`). GraphQL respects interface subtyping. The compatibility relation is part of the protocol definition, not a global rule. ### Edge compatibility When the schema theory has an $\text{Edge}$ sort with $\text{src}$ and $\text{tgt}$ operations, the engine checks that the edge map is compatible with the vertex map. If field `text` goes from `Post` to `Text` in v1, and you map `Post → Post` and `Text → Text`, then `text` must map to an edge that also goes from `Post` to `Text`. Mapping it to an edge from `Post` to `Integer` would break the graph structure.^[This is the naturality condition: $f_E$ and $f_V$ must form a graph homomorphism.] ### Constraint compatibility When the schema theory has a $\text{Constraint}$ sort, the engine checks that every constraint on a source vertex or edge has a compatible counterpart in the target. "Compatible" depends on the migration direction. In the backward direction (lifting v2 data to v1), the target (v2) constraint must be at least as tight as the source (v1) constraint, so that all v2 data satisfies v1's requirements. In the forward direction, the condition reverses: the target must be at least as loose. The Protobuf field-number reuse and the ATProto `maxGraphemes` tightening both fail this check. panproto derives the check from the presence of a $\text{Constraint}$ sort, not from a hardcoded rule about Protobuf or ATProto. ### Pushout complement uniqueness When the schema theory allows parallel edges ($\text{ThMulti}$), removing an intermediate vertex can create ambiguity. If the schema has a path $A \to B \to C$ and the migration drops $B$, the result must connect $A$ directly to $C$. But if the target schema has multiple edges between $A$ and $C$, which one should the engine use? Without a resolver to disambiguate, the migration is ill-defined. The engine checks for this ambiguity when the schema theory includes the $\text{ThMulti}$ component. If the pushout complement is not unique, the migration is rejected with an error that identifies the ambiguous pair and suggests providing a resolver.^[In algebraic graph theory, this is called the pushout complement uniqueness condition [@ehrig2006].] ### Signature coherence (simultaneity) When the schema theory has a $\text{HyperEdge}$ sort (as in SQL), the engine checks that hyper-edge mappings are all-or-nothing: either all vertices of a hyperedge are mapped, or none are. You cannot partially map a foreign key; that would create a dangling connection. In SQL terms, a compound foreign key `(author_id, post_id) → (users.id, posts.id)` is a hyperedge connecting three tables. If you map `users` and `posts` but drop the join table, the foreign key becomes incoherent. Either all participants survive, or the hyperedge must be explicitly removed.^[This is the simultaneity condition for hypergraph morphisms [@barr1990].] ## How the checks compose A protocol whose schema theory has both a $\text{Constraint}$ sort and a $\text{HyperEdge}$ sort (like SQL) gets all the conditional checks: kind consistency, edge compatibility, constraint compatibility, signature coherence, and simultaneity. A protocol with neither (a minimal custom protocol) gets only the universal checks: vertex map validity, edge map validity, and kind consistency. The engine assembles the check list by querying the theory's sorts. Adding a new protocol with a novel theory composition requires no changes to the checking code. ::: {.callout-caution} ## Exercise Consider a migration that tightens a `maxLength` constraint from 3000 to 1500. In the backward direction (lifting v2 data to v1), does this pass constraint compatibility? What about the forward direction (lifting v1 data to v2)? ::: ::: {.callout-tip collapse=true} ## Solution Backward: yes. v2 data has `maxLength: 1500`, and any string with at most 1500 characters also satisfies `maxLength: 3000`. Forward: no. v1 data may contain strings up to 3000 characters, which would violate v2's tighter constraint. This is a one-way migration; v1 can read v2 data, but v2 cannot accept all v1 data. ::: ## Existence reports When a migration fails validation, panproto produces an `ExistenceReport` that lists every violated condition. Each violation includes the condition that failed, the specific elements involved (which vertex, which edge, which constraint), and a human-readable explanation of why it failed. The report is structured, not a wall of text. CI pipelines can parse it programmatically, and the CLI formats it as a table: ``` Migration v1 → v2: 2 existence errors 1. CONSTRAINT_TIGHTENING vertex: text (kind: string) constraint: maxGraphemes source: 300, target: 150 Migration direction: backward → v2 data with >150 graphemes would violate v1 constraint 2. REACHABILITY_RISK vertex: Author (kind: object) Intermediate vertex dropped; subtree may become disconnected → provide a resolver or restructure the migration ``` The report tells you not just what went wrong but what to do about it. A constraint tightening can be resolved by choosing the forward direction instead (if that is acceptable) or by adding a coercion step. A reachability risk can be resolved by providing a resolver that specifies how to reconnect orphaned children. The next chapter introduces bidirectional lenses, which extend the one-directional lifts we have been writing into round-trippable transformations that remember what they discarded.