7 Breaking Changes and CI

8 Breaking Changes and CI

You can classify every schema change by asking two questions: does a forward migration exist (can old data be read by new code)? And does a backward migration exist (can new data be read by old code)? The answers determine whether the change is safe to deploy, and under what conditions.

panproto makes this classification decidable and automatic. There is no heuristic, no pattern-matching on the diff, and no approximation. The engine checks whether the required migrations exist using the existence conditions from the previous chapter, and reports a definitive answer.¹

8.1 The structural diff

Before classifying compatibility, panproto computes a structural diff between the two schemas. This operates on the graph representation—vertices, edges, constraints—not on the textual source files. Two schemas that are textually different but structurally identical (fields in a different order, different internal $defs names, inline vs. referenced definitions) produce an empty diff.

const oldSchema = atproto.schema()
  .vertex('post', 'record', { nsid: 'app.bsky.feed.post' })
  .vertex('post:body', 'object')
  .vertex('post:body.text', 'string')
  .edge('post', 'post:body', 'record-schema')
  .edge('post:body', 'post:body.text', 'prop', { name: 'text' })
  .constraint('post:body.text', 'maxLength', '3000')
  .build();

const newSchema = atproto.schema()
  .vertex('post', 'record', { nsid: 'app.bsky.feed.post' })
  .vertex('post:body', 'object')
  .vertex('post:body.text', 'string')
  .vertex('post:body.tags', 'array')
  .edge('post', 'post:body', 'record-schema')
  .edge('post:body', 'post:body.text', 'prop', { name: 'text' })
  .edge('post:body', 'post:body.tags', 'prop', { name: 'tags' })
  .constraint('post:body.text', 'maxLength', '3000')
  .build();

const report = panproto.diff(oldSchema, newSchema);
// report.compatibility: 'fully-compatible'
// report.changes: [{ kind: 'vertex-added', id: 'post:body.tags' }, ...]

The diff produces a list of change descriptors, each identifying what changed (a vertex, edge, or constraint), how it changed (added, removed, modified, or moved), and where (the path from the root to the affected element). These descriptors are useful for reporting—they tell a human reviewer what happened—but they do not drive the compatibility classification. The classification comes from migration existence, not from the diff.

Canonicalization

Before diffing, panproto canonicalizes both schemas: it resolves all references, normalizes naming, sorts vertices and edges deterministically, and strips metadata that does not affect the graph structure. The diff operates on canonical forms, ensuring that equivalent schemas always produce an empty diff.

8.2 Three compatibility levels

8.2.1 Fully compatible

Both forward and backward migrations exist. Old data can be read by new code, and new data can be read by old code. Deploying a fully compatible change requires no coordination: old and new consumers can coexist indefinitely.

Examples: renaming a field (with a migration that maps the old name to the new), adding an optional field with a default, reordering fields.

8.2.2 Backward compatible

A forward migration exists, but no backward migration. Old data can be read by new code, but new data may not be readable by old code. This is acceptable for rolling deployments where consumers update before producers, or where old consumers can gracefully handle unknown data.

Examples: removing a field (old data is fine without it; new data lacks what old code expects), adding a required field (old data can be given a default; new data has a field old code does not know about).

8.2.3 Breaking

No valid forward migration exists. Existing data cannot be transformed to conform to the new schema without loss or corruption.

Examples: tightening a constraint below existing values, changing a field’s type incompatibly, removing a vertex that other vertices depend on.

The classification reduces to a two-by-two table:

Forward exists	Backward exists	Classification
Yes	Yes	Fully compatible
Yes	No	Backward compatible
No	Yes	Forward compatible only (rare; treated as breaking)
No	No	Breaking

The “forward compatible only” case is theoretically possible but rare in practice. panproto reports it but treats it as breaking for safety, since the typical deployment assumption is that old data must be readable by new code.

Why not classify based on the diff?

Some schema registries classify changes by pattern: “adding a field is backward compatible,” “removing a field is breaking,” and so on. This works for simple, isolated changes but fails for compositions. Adding an optional field and tightening a constraint on an unrelated field is not the conjunction of their individual classifications; the changes can interact. panproto avoids this problem by not classifying individual changes at all. It classifies the migration as a whole by checking existence.

8.3 Four examples

8.3.1 Adding an optional field

Add an optional labels field (type: array<string>, default: []) to postView.

The forward migration exists: old data lacks labels, but the default fills it in. The backward migration also exists: new data has labels, which the restrict pipeline discards. Classification: fully compatible.

Schema diff:
  + postView.labels: array<string> (optional, default: [])

Compatibility: FULLY COMPATIBLE
  Forward migration:  ✓ exists (add default)
  Backward migration: ✓ exists (drop field)

8.3.2 Removing a field

Remove likeCount from postView.

The forward migration exists: old data has likeCount, which the forward migration discards. The backward migration does not: new data lacks likeCount, and the old schema requires it. There is no way to conjure a valid value from nothing. Classification: backward compatible.

Schema diff:
  - postView.likeCount: integer

Compatibility: BACKWARD COMPATIBLE
  Forward migration:  ✓ exists (drop field)
  Backward migration: ✗ no valid migration (required field missing)

If likeCount had been optional in the old schema, the backward migration would exist (new data simply omits the field, which is valid), and the classification would be fully compatible. This is why marking fields as optional from the start is good protocol hygiene.

8.3.3 Tightening a constraint

Reduce postView.text maxLength from 3000 to 300.

The forward migration does not exist: old data may contain text up to 3000 characters, which violates the new 300-character limit. Truncation would change the data’s meaning and is not a valid structural transformation. The backward migration exists: new data has text at most 300 characters, which satisfies the old 3000-character limit. Classification: breaking.

Schema diff:
  ~ postView.text: maxLength 3000 → 300

Compatibility: BREAKING
  Forward migration:  ✗ constraint violation (existing data may exceed new limit)
  Backward migration: ✓ exists (new constraint is stricter)

The diff alone is misleading here. “Changing a maxLength” sounds minor, but the direction matters: tightening is breaking, loosening is backward compatible.

8.3.4 Changing a vertex kind

Change postView.likeCount from integer to string.

Neither migration exists. Old integer values are not valid strings; new string values are not valid integers. Classification: breaking.

Coercions and manual overrides

If you want to change a type with a known coercion (integer to string via formatting, say), you can use the coerceType combinator from the previous chapter to build a manual migration. The combinator includes both directions, so it satisfies the lens laws. But panproto will not infer this coercion automatically. The compatibility checker reports the structural classification; manual migrations can override it.

8.4 Composed changes

Real schema evolutions rarely consist of a single change. A new version might simultaneously add fields, remove fields, tighten one constraint, and loosen another.

The compatibility of the composed change is not the “worst” individual classification. It is determined by checking migration existence on the composed schemas. Composition of compatibility levels is not a lattice operation; changes can interact in surprising ways.

Exercise

Adding an optional labels field is fully compatible individually. Tightening text maxLength from 3000 to 300 is breaking individually. What is the classification of both changes shipped together? Can you determine this by taking the “worst” of the two, or do you need to check the composed schemas?

Solution

You must check the composed schemas. The composed change is breaking, because the forward migration does not exist (the constraint tightening prevents it). In this particular case the answer happens to match the “worst of two” heuristic, but that is coincidental. Consider instead: loosening text maxLength from 300 to 3000 (backward compatible) combined with adding an optional field (fully compatible). The composed change is backward compatible—matching the “worst”—but there are edge cases involving interacting constraints where the composition is worse than either individual change. Checking existence on the composed schemas is the only reliable method.

8.5 The diff report

panproto combines the structural diff with the compatibility classification into a single report designed for code review. A developer should be able to read it and understand both what changed and whether it is safe.

schema diff v1/schema.json v2/schema.json

Schema: app.bsky.feed.defs#threadViewPost

Changes:
  + postView.labels: array<string> (optional, default: [])
  - postView.likeCount: integer
  ~ postView.text: maxLength 300 → 3000
  + profileViewBasic.avatarUrl: string (optional)

Compatibility: BACKWARD COMPATIBLE

  Forward migration:  ✓ exists
    - labels: filled with default []
    - likeCount: dropped
    - text: constraint loosened (valid)
    - avatarUrl: filled with default null

  Backward migration: ✗ does not exist
    - likeCount: required field missing in new schema
    - text: new data may exceed old maxLength (3000 > 300)

Recommendation: Safe to deploy if consumers update before producers.

8.6 CI integration

Schema compatibility checks belong in CI, not in code review checklists. Here is a minimal GitHub Actions workflow that blocks breaking changes on every pull request:

name: Schema Compatibility Check
on:
  pull_request:
    paths:
      - 'schemas/**'

jobs:
  check-compat:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install panproto
        run: cargo install panproto-cli

      - name: Get base schema
        run: git show origin/main:schemas/schema.json > /tmp/old-schema.json

      - name: Check compatibility
        run: |
          schema diff /tmp/old-schema.json schemas/schema.json

      - name: Block breaking changes
        run: |
          schema check \
            --src /tmp/old-schema.json \
            --tgt schemas/schema.json \
            --level backward-compatible

The schema check command accepts a --level flag: fully-compatible (fail unless both directions work), backward-compatible (fail unless at least the forward migration exists; this is the default), or breaking (never fail, only report, useful for auditing).

For local development, the same commands work:

# show the diff and classification
schema diff old-schema.json new-schema.json

# check against a level (exit 0 = pass, 1 = fail)
schema check --src old-schema.json --tgt new-schema.json --level backward-compatible

# JSON output for programmatic consumption
schema diff old-schema.json new-schema.json --format json

For teams that use a schema registry, panproto can check against the latest registered version:

schema check \
  --src "registry://app.bsky.feed.defs#threadViewPost@latest" \
  --tgt schemas/schema.json

Exercise

Your CI uses --level backward-compatible, which allows changes where the forward migration exists but the backward does not. Under what deployment ordering constraint is this safe? What goes wrong if producers deploy first?

Solution

Consumers must deploy the new schema before producers stop sending old-format data. “Backward compatible” means old data can be read by new code, but new data may be unreadable by old code. If a producer deploys first and starts emitting new-format data, old consumers will fail.

8.7 Versioning strategies

The compatibility classification informs but does not dictate versioning. Different protocols adopt different conventions.

Semantic versioning maps compatibility levels to semver: fully compatible → patch or minor, backward compatible → minor, breaking → major. This is the most common approach for protocols with external consumers.

Chronological versioning, as used by ATProto’s Lexicon, uses date-based or sequential version numbers without semver semantics. The compatibility level is still useful for deployment decisions.

Continuous evolution avoids explicit versioning entirely, relying on the lens mechanism to keep old and new consumers working simultaneously. Every change ships with its migration, and the compatibility level determines the deployment order: fully compatible changes deploy freely, backward compatible changes require consumers first, and breaking changes require a data migration followed by consumers followed by producers.

8.8 Further reading

The graph-theoretic foundation for schema compatibility draws on Ehrig et al. (2006), particularly the treatment of pushout complements and the gluing condition for typed attributed graph grammars. For practical context on how compatibility levels are used in event-streaming architectures, the Confluent Schema Registry documentation is useful. panproto’s classification is more fine-grained (because it reasons about structural migrations rather than syntactic patterns), but the deployment strategies are similar.

With migration, lenses, existence checking, and CI integration covered, we have the complete workflow for evolving schemas safely. The next part of the tutorial extends the toolkit: data lifting, custom protocols, schema version control, and cross-protocol translation.

Ehrig, Hartmut, Karsten Ehrig, Ulrike Prange, and Gabriele Taentzer. 2006. Fundamentals of Algebraic Graph Transformation. Monographs in Theoretical Computer Science. Springer. https://doi.org/10.1007/3-540-31188-2.

Lack, Stephen, and Paweł Sobociński. 2005. “Adhesive and Quasiadhesive Categories.” RAIRO – Theoretical Informatics and Applications 39 (3): 511–45. https://doi.org/10.1051/ita:2005028.

The proof that this classification is decidable relies on properties of adhesive categories (Lack and Sobociński 2005). See Appendix A.↩︎

--- title: "Breaking Changes and CI" id: sec-breaking-changes --- # Breaking Changes and CI {#sec-breaking-changes} You can classify every schema change by asking two questions: does a forward migration exist (can old data be read by new code)? And does a backward migration exist (can new data be read by old code)? The answers determine whether the change is safe to deploy, and under what conditions. panproto makes this classification decidable and automatic. There is no heuristic, no pattern-matching on the diff, and no approximation. The engine checks whether the required migrations exist using the existence conditions from the previous chapter, and reports a definitive answer.^[The proof that this classification is decidable relies on properties of adhesive categories [@lack2005]. See [Appendix A](../appendices/A-formal-foundations.qmd).] ## The structural diff {#sec-diff} Before classifying compatibility, panproto computes a structural diff between the two schemas. This operates on the graph representation---vertices, edges, constraints---not on the textual source files. Two schemas that are textually different but structurally identical (fields in a different order, different internal `$defs` names, inline vs. referenced definitions) produce an empty diff. ```{.typescript include="../code/ts/09-breaking-changes.ts" snippet="diff"} ``` The diff produces a list of change descriptors, each identifying what changed (a vertex, edge, or constraint), how it changed (added, removed, modified, or moved), and where (the path from the root to the affected element). These descriptors are useful for reporting---they tell a human reviewer what happened---but they do not drive the compatibility classification. The classification comes from migration existence, not from the diff. :::{.callout-note} ## Canonicalization Before diffing, panproto canonicalizes both schemas: it resolves all references, normalizes naming, sorts vertices and edges deterministically, and strips metadata that does not affect the graph structure. The diff operates on canonical forms, ensuring that equivalent schemas always produce an empty diff. ::: ## Three compatibility levels {#sec-compat-levels} ### Fully compatible Both forward and backward migrations exist. Old data can be read by new code, and new data can be read by old code. Deploying a fully compatible change requires no coordination: old and new consumers can coexist indefinitely. Examples: renaming a field (with a migration that maps the old name to the new), adding an optional field with a default, reordering fields. ### Backward compatible A forward migration exists, but no backward migration. Old data can be read by new code, but new data may not be readable by old code. This is acceptable for rolling deployments where consumers update before producers, or where old consumers can gracefully handle unknown data. Examples: removing a field (old data is fine without it; new data lacks what old code expects), adding a required field (old data can be given a default; new data has a field old code does not know about). ### Breaking No valid forward migration exists. Existing data cannot be transformed to conform to the new schema without loss or corruption. Examples: tightening a constraint below existing values, changing a field's type incompatibly, removing a vertex that other vertices depend on. The classification reduces to a two-by-two table: | Forward exists | Backward exists | Classification | |:-:|:-:|:--| | Yes | Yes | Fully compatible | | Yes | No | Backward compatible | | No | Yes | Forward compatible only (rare; treated as breaking) | | No | No | Breaking | The "forward compatible only" case is theoretically possible but rare in practice. panproto reports it but treats it as breaking for safety, since the typical deployment assumption is that old data must be readable by new code. :::{.callout-note} ## Why not classify based on the diff? Some schema registries classify changes by pattern: "adding a field is backward compatible," "removing a field is breaking," and so on. This works for simple, isolated changes but fails for compositions. Adding an optional field *and* tightening a constraint on an unrelated field is not the conjunction of their individual classifications; the changes can interact. panproto avoids this problem by not classifying individual changes at all. It classifies the migration as a whole by checking existence. ::: ## Four examples ### Adding an optional field Add an optional `labels` field (type: `array<string>`, default: `[]`) to `postView`. The forward migration exists: old data lacks `labels`, but the default fills it in. The backward migration also exists: new data has `labels`, which the restrict pipeline discards. Classification: **fully compatible**. ``` Schema diff: + postView.labels: array<string> (optional, default: []) Compatibility: FULLY COMPATIBLE Forward migration: ✓ exists (add default) Backward migration: ✓ exists (drop field) ``` ### Removing a field Remove `likeCount` from `postView`. The forward migration exists: old data has `likeCount`, which the forward migration discards. The backward migration does not: new data lacks `likeCount`, and the old schema requires it. There is no way to conjure a valid value from nothing. Classification: **backward compatible**. ``` Schema diff: - postView.likeCount: integer Compatibility: BACKWARD COMPATIBLE Forward migration: ✓ exists (drop field) Backward migration: ✗ no valid migration (required field missing) ``` If `likeCount` had been optional in the old schema, the backward migration would exist (new data simply omits the field, which is valid), and the classification would be fully compatible. This is why marking fields as optional from the start is good protocol hygiene. ### Tightening a constraint Reduce `postView.text` maxLength from 3000 to 300. The forward migration does not exist: old data may contain text up to 3000 characters, which violates the new 300-character limit. Truncation would change the data's meaning and is not a valid structural transformation. The backward migration exists: new data has text at most 300 characters, which satisfies the old 3000-character limit. Classification: **breaking**. ``` Schema diff: ~ postView.text: maxLength 3000 → 300 Compatibility: BREAKING Forward migration: ✗ constraint violation (existing data may exceed new limit) Backward migration: ✓ exists (new constraint is stricter) ``` The diff alone is misleading here. "Changing a maxLength" sounds minor, but the direction matters: tightening is breaking, loosening is backward compatible. ### Changing a vertex kind Change `postView.likeCount` from `integer` to `string`. Neither migration exists. Old integer values are not valid strings; new string values are not valid integers. Classification: **breaking**. :::{.callout-note} ## Coercions and manual overrides If you *want* to change a type with a known coercion (integer to string via formatting, say), you can use the `coerceType` combinator from the previous chapter to build a manual migration. The combinator includes both directions, so it satisfies the lens laws. But panproto will not infer this coercion automatically. The compatibility checker reports the structural classification; manual migrations can override it. ::: ## Composed changes {#sec-composed} Real schema evolutions rarely consist of a single change. A new version might simultaneously add fields, remove fields, tighten one constraint, and loosen another. The compatibility of the composed change is *not* the "worst" individual classification. It is determined by checking migration existence on the composed schemas. Composition of compatibility levels is not a lattice operation; changes can interact in surprising ways. ::: {.callout-caution} ## Exercise Adding an optional `labels` field is fully compatible individually. Tightening `text` maxLength from 3000 to 300 is breaking individually. What is the classification of both changes shipped together? Can you determine this by taking the "worst" of the two, or do you need to check the composed schemas? ::: ::: {.callout-tip collapse=true} ## Solution You must check the composed schemas. The composed change is breaking, because the forward migration does not exist (the constraint tightening prevents it). In this particular case the answer happens to match the "worst of two" heuristic, but that is coincidental. Consider instead: loosening `text` maxLength from 300 to 3000 (backward compatible) combined with adding an optional field (fully compatible). The composed change is backward compatible---matching the "worst"---but there are edge cases involving interacting constraints where the composition is worse than either individual change. Checking existence on the composed schemas is the only reliable method. ::: ## The diff report {#sec-diff-report} panproto combines the structural diff with the compatibility classification into a single report designed for code review. A developer should be able to read it and understand both what changed and whether it is safe. ``` schema diff v1/schema.json v2/schema.json Schema: app.bsky.feed.defs#threadViewPost Changes: + postView.labels: array<string> (optional, default: []) - postView.likeCount: integer ~ postView.text: maxLength 300 → 3000 + profileViewBasic.avatarUrl: string (optional) Compatibility: BACKWARD COMPATIBLE Forward migration: ✓ exists - labels: filled with default [] - likeCount: dropped - text: constraint loosened (valid) - avatarUrl: filled with default null Backward migration: ✗ does not exist - likeCount: required field missing in new schema - text: new data may exceed old maxLength (3000 > 300) Recommendation: Safe to deploy if consumers update before producers. ``` ## CI integration {#sec-ci} Schema compatibility checks belong in CI, not in code review checklists. Here is a minimal GitHub Actions workflow that blocks breaking changes on every pull request: ```yaml name: Schema Compatibility Check on: pull_request: paths: - 'schemas/**' jobs: check-compat: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install panproto run: cargo install panproto-cli - name: Get base schema run: git show origin/main:schemas/schema.json > /tmp/old-schema.json - name: Check compatibility run: | schema diff /tmp/old-schema.json schemas/schema.json - name: Block breaking changes run: | schema check \ --src /tmp/old-schema.json \ --tgt schemas/schema.json \ --level backward-compatible ``` The `schema check` command accepts a `--level` flag: `fully-compatible` (fail unless both directions work), `backward-compatible` (fail unless at least the forward migration exists; this is the default), or `breaking` (never fail, only report, useful for auditing). For local development, the same commands work: ```bash # show the diff and classification schema diff old-schema.json new-schema.json # check against a level (exit 0 = pass, 1 = fail) schema check --src old-schema.json --tgt new-schema.json --level backward-compatible # JSON output for programmatic consumption schema diff old-schema.json new-schema.json --format json ``` For teams that use a schema registry, panproto can check against the latest registered version: ```bash schema check \ --src "registry://app.bsky.feed.defs#threadViewPost@latest" \ --tgt schemas/schema.json ``` ::: {.callout-caution} ## Exercise Your CI uses `--level backward-compatible`, which allows changes where the forward migration exists but the backward does not. Under what deployment ordering constraint is this safe? What goes wrong if producers deploy first? ::: ::: {.callout-tip collapse=true} ## Solution Consumers must deploy the new schema before producers stop sending old-format data. "Backward compatible" means old data can be read by new code, but new data may be unreadable by old code. If a producer deploys first and starts emitting new-format data, old consumers will fail. ::: ## Versioning strategies {#sec-versioning} The compatibility classification informs but does not dictate versioning. Different protocols adopt different conventions. **Semantic versioning** maps compatibility levels to semver: fully compatible → patch or minor, backward compatible → minor, breaking → major. This is the most common approach for protocols with external consumers. **Chronological versioning**, as used by ATProto's Lexicon, uses date-based or sequential version numbers without semver semantics. The compatibility level is still useful for deployment decisions. **Continuous evolution** avoids explicit versioning entirely, relying on the lens mechanism to keep old and new consumers working simultaneously. Every change ships with its migration, and the compatibility level determines the deployment order: fully compatible changes deploy freely, backward compatible changes require consumers first, and breaking changes require a data migration followed by consumers followed by producers. ## Further reading The graph-theoretic foundation for schema compatibility draws on @ehrig2006, particularly the treatment of pushout complements and the gluing condition for typed attributed graph grammars. For practical context on how compatibility levels are used in event-streaming architectures, the Confluent Schema Registry documentation is useful. panproto's classification is more fine-grained (because it reasons about structural migrations rather than syntactic patterns), but the deployment strategies are similar. With migration, lenses, existence checking, and CI integration covered, we have the complete workflow for evolving schemas safely. The next part of the tutorial extends the toolkit: data lifting, custom protocols, schema version control, and cross-protocol translation.