7 Breaking Changes and CI
8 Breaking Changes and CI
You can classify every schema change by asking two questions: does a forward migration exist (can old data be read by new code)? And does a backward migration exist (can new data be read by old code)? The answers determine whether the change is safe to deploy, and under what conditions.
panproto makes this classification decidable and automatic. There is no heuristic, no pattern-matching on the diff, and no approximation. The engine checks whether the required migrations exist using the existence conditions from the previous chapter, and reports a definitive answer.1
8.1 The structural diff
Before classifying compatibility, panproto computes a structural diff between the two schemas. This operates on the graph representation—vertices, edges, constraints—not on the textual source files. Two schemas that are textually different but structurally identical (fields in a different order, different internal $defs names, inline vs. referenced definitions) produce an empty diff.
const oldSchema = atproto.schema()
.vertex('post', 'record', { nsid: 'app.bsky.feed.post' })
.vertex('post:body', 'object')
.vertex('post:body.text', 'string')
.edge('post', 'post:body', 'record-schema')
.edge('post:body', 'post:body.text', 'prop', { name: 'text' })
.constraint('post:body.text', 'maxLength', '3000')
.build();
const newSchema = atproto.schema()
.vertex('post', 'record', { nsid: 'app.bsky.feed.post' })
.vertex('post:body', 'object')
.vertex('post:body.text', 'string')
.vertex('post:body.tags', 'array')
.edge('post', 'post:body', 'record-schema')
.edge('post:body', 'post:body.text', 'prop', { name: 'text' })
.edge('post:body', 'post:body.tags', 'prop', { name: 'tags' })
.constraint('post:body.text', 'maxLength', '3000')
.build();
const report = panproto.diff(oldSchema, newSchema);
// report.compatibility: 'fully-compatible'
// report.changes: [{ kind: 'vertex-added', id: 'post:body.tags' }, ...]The diff produces a list of change descriptors, each identifying what changed (a vertex, edge, or constraint), how it changed (added, removed, modified, or moved), and where (the path from the root to the affected element). These descriptors are useful for reporting—they tell a human reviewer what happened—but they do not drive the compatibility classification. The classification comes from migration existence, not from the diff.
Before diffing, panproto canonicalizes both schemas: it resolves all references, normalizes naming, sorts vertices and edges deterministically, and strips metadata that does not affect the graph structure. The diff operates on canonical forms, ensuring that equivalent schemas always produce an empty diff.
8.2 Three compatibility levels
8.2.1 Fully compatible
Both forward and backward migrations exist. Old data can be read by new code, and new data can be read by old code. Deploying a fully compatible change requires no coordination: old and new consumers can coexist indefinitely.
Examples: renaming a field (with a migration that maps the old name to the new), adding an optional field with a default, reordering fields.
8.2.2 Backward compatible
A forward migration exists, but no backward migration. Old data can be read by new code, but new data may not be readable by old code. This is acceptable for rolling deployments where consumers update before producers, or where old consumers can gracefully handle unknown data.
Examples: removing a field (old data is fine without it; new data lacks what old code expects), adding a required field (old data can be given a default; new data has a field old code does not know about).
8.2.3 Breaking
No valid forward migration exists. Existing data cannot be transformed to conform to the new schema without loss or corruption.
Examples: tightening a constraint below existing values, changing a field’s type incompatibly, removing a vertex that other vertices depend on.
The classification reduces to a two-by-two table:
| Forward exists | Backward exists | Classification |
|---|---|---|
| Yes | Yes | Fully compatible |
| Yes | No | Backward compatible |
| No | Yes | Forward compatible only (rare; treated as breaking) |
| No | No | Breaking |
The “forward compatible only” case is theoretically possible but rare in practice. panproto reports it but treats it as breaking for safety, since the typical deployment assumption is that old data must be readable by new code.
Some schema registries classify changes by pattern: “adding a field is backward compatible,” “removing a field is breaking,” and so on. This works for simple, isolated changes but fails for compositions. Adding an optional field and tightening a constraint on an unrelated field is not the conjunction of their individual classifications; the changes can interact. panproto avoids this problem by not classifying individual changes at all. It classifies the migration as a whole by checking existence.
8.3 Four examples
8.3.1 Adding an optional field
Add an optional labels field (type: array<string>, default: []) to postView.
The forward migration exists: old data lacks labels, but the default fills it in. The backward migration also exists: new data has labels, which the restrict pipeline discards. Classification: fully compatible.
Schema diff:
+ postView.labels: array<string> (optional, default: [])
Compatibility: FULLY COMPATIBLE
Forward migration: ✓ exists (add default)
Backward migration: ✓ exists (drop field)
8.3.2 Removing a field
Remove likeCount from postView.
The forward migration exists: old data has likeCount, which the forward migration discards. The backward migration does not: new data lacks likeCount, and the old schema requires it. There is no way to conjure a valid value from nothing. Classification: backward compatible.
Schema diff:
- postView.likeCount: integer
Compatibility: BACKWARD COMPATIBLE
Forward migration: ✓ exists (drop field)
Backward migration: ✗ no valid migration (required field missing)
If likeCount had been optional in the old schema, the backward migration would exist (new data simply omits the field, which is valid), and the classification would be fully compatible. This is why marking fields as optional from the start is good protocol hygiene.
8.3.3 Tightening a constraint
Reduce postView.text maxLength from 3000 to 300.
The forward migration does not exist: old data may contain text up to 3000 characters, which violates the new 300-character limit. Truncation would change the data’s meaning and is not a valid structural transformation. The backward migration exists: new data has text at most 300 characters, which satisfies the old 3000-character limit. Classification: breaking.
Schema diff:
~ postView.text: maxLength 3000 → 300
Compatibility: BREAKING
Forward migration: ✗ constraint violation (existing data may exceed new limit)
Backward migration: ✓ exists (new constraint is stricter)
The diff alone is misleading here. “Changing a maxLength” sounds minor, but the direction matters: tightening is breaking, loosening is backward compatible.
8.3.4 Changing a vertex kind
Change postView.likeCount from integer to string.
Neither migration exists. Old integer values are not valid strings; new string values are not valid integers. Classification: breaking.
If you want to change a type with a known coercion (integer to string via formatting, say), you can use the coerceType combinator from the previous chapter to build a manual migration. The combinator includes both directions, so it satisfies the lens laws. But panproto will not infer this coercion automatically. The compatibility checker reports the structural classification; manual migrations can override it.
8.4 Composed changes
Real schema evolutions rarely consist of a single change. A new version might simultaneously add fields, remove fields, tighten one constraint, and loosen another.
The compatibility of the composed change is not the “worst” individual classification. It is determined by checking migration existence on the composed schemas. Composition of compatibility levels is not a lattice operation; changes can interact in surprising ways.
Adding an optional labels field is fully compatible individually. Tightening text maxLength from 3000 to 300 is breaking individually. What is the classification of both changes shipped together? Can you determine this by taking the “worst” of the two, or do you need to check the composed schemas?
You must check the composed schemas. The composed change is breaking, because the forward migration does not exist (the constraint tightening prevents it). In this particular case the answer happens to match the “worst of two” heuristic, but that is coincidental. Consider instead: loosening text maxLength from 300 to 3000 (backward compatible) combined with adding an optional field (fully compatible). The composed change is backward compatible—matching the “worst”—but there are edge cases involving interacting constraints where the composition is worse than either individual change. Checking existence on the composed schemas is the only reliable method.
8.5 The diff report
panproto combines the structural diff with the compatibility classification into a single report designed for code review. A developer should be able to read it and understand both what changed and whether it is safe.
schema diff v1/schema.json v2/schema.json
Schema: app.bsky.feed.defs#threadViewPost
Changes:
+ postView.labels: array<string> (optional, default: [])
- postView.likeCount: integer
~ postView.text: maxLength 300 → 3000
+ profileViewBasic.avatarUrl: string (optional)
Compatibility: BACKWARD COMPATIBLE
Forward migration: ✓ exists
- labels: filled with default []
- likeCount: dropped
- text: constraint loosened (valid)
- avatarUrl: filled with default null
Backward migration: ✗ does not exist
- likeCount: required field missing in new schema
- text: new data may exceed old maxLength (3000 > 300)
Recommendation: Safe to deploy if consumers update before producers.
8.6 CI integration
Schema compatibility checks belong in CI, not in code review checklists. Here is a minimal GitHub Actions workflow that blocks breaking changes on every pull request:
name: Schema Compatibility Check
on:
pull_request:
paths:
- 'schemas/**'
jobs:
check-compat:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install panproto
run: cargo install panproto-cli
- name: Get base schema
run: git show origin/main:schemas/schema.json > /tmp/old-schema.json
- name: Check compatibility
run: |
schema diff /tmp/old-schema.json schemas/schema.json
- name: Block breaking changes
run: |
schema check \
--src /tmp/old-schema.json \
--tgt schemas/schema.json \
--level backward-compatibleThe schema check command accepts a --level flag: fully-compatible (fail unless both directions work), backward-compatible (fail unless at least the forward migration exists; this is the default), or breaking (never fail, only report, useful for auditing).
For local development, the same commands work:
# show the diff and classification
schema diff old-schema.json new-schema.json
# check against a level (exit 0 = pass, 1 = fail)
schema check --src old-schema.json --tgt new-schema.json --level backward-compatible
# JSON output for programmatic consumption
schema diff old-schema.json new-schema.json --format jsonFor teams that use a schema registry, panproto can check against the latest registered version:
schema check \
--src "registry://app.bsky.feed.defs#threadViewPost@latest" \
--tgt schemas/schema.jsonYour CI uses --level backward-compatible, which allows changes where the forward migration exists but the backward does not. Under what deployment ordering constraint is this safe? What goes wrong if producers deploy first?
Consumers must deploy the new schema before producers stop sending old-format data. “Backward compatible” means old data can be read by new code, but new data may be unreadable by old code. If a producer deploys first and starts emitting new-format data, old consumers will fail.
8.7 Versioning strategies
The compatibility classification informs but does not dictate versioning. Different protocols adopt different conventions.
Semantic versioning maps compatibility levels to semver: fully compatible → patch or minor, backward compatible → minor, breaking → major. This is the most common approach for protocols with external consumers.
Chronological versioning, as used by ATProto’s Lexicon, uses date-based or sequential version numbers without semver semantics. The compatibility level is still useful for deployment decisions.
Continuous evolution avoids explicit versioning entirely, relying on the lens mechanism to keep old and new consumers working simultaneously. Every change ships with its migration, and the compatibility level determines the deployment order: fully compatible changes deploy freely, backward compatible changes require consumers first, and breaking changes require a data migration followed by consumers followed by producers.
8.8 Further reading
The graph-theoretic foundation for schema compatibility draws on Ehrig et al. (2006), particularly the treatment of pushout complements and the gluing condition for typed attributed graph grammars. For practical context on how compatibility levels are used in event-streaming architectures, the Confluent Schema Registry documentation is useful. panproto’s classification is more fine-grained (because it reasons about structural migrations rather than syntactic patterns), but the deployment strategies are similar.
With migration, lenses, existence checking, and CI integration covered, we have the complete workflow for evolving schemas safely. The next part of the tutorial extends the toolkit: data lifting, custom protocols, schema version control, and cross-protocol translation.
The proof that this classification is decidable relies on properties of adhesive categories (Lack and Sobociński 2005). See Appendix A.↩︎