11  panproto-check: Breaking Change Detection

The breaking change pipeline chains three functions: \(\mathrm{diff}\) computes structural deltas, \(\mathrm{classify}\) assigns severity based on protocol semantics, and \(\mathrm{report}\) renders human and machine-readable output. Take two schemas \(S_1\) and \(S_2\) and a protocol \(P\). The pipeline tells you whether \(S_2\) is backward-compatible with \(S_1\) under \(P\)’s rules.

flowchart LR
    O[Old Schema] --> D[diff]
    N[New Schema] --> D
    D --> SD[SchemaDiff]
    SD --> CL[classify]
    P[Protocol] --> CL
    CL --> CR[CompatReport]
    CR --> RT[report_text]
    CR --> RJ[report_json]

11.1 Structural diffing

11.1.1 What diff produces

The SchemaDiff struct records every structural change between two schema revisions. It captures changes across all 13 schema fields: vertices, edges, constraints, hyper-edges, required edges, NSIDs, variants, orderings, recursion points, usage modes, spans, and nominal flags.

/// A structural diff between two schemas.
///
/// Each field captures a specific category of change between the old
/// and new schema revisions.
#[derive(Clone, Debug, Default, PartialEq, Eq, Serialize, Deserialize)]
pub struct SchemaDiff {
    // --- Vertices ---
    /// Vertex IDs present in the new schema but absent from the old.
    pub added_vertices: Vec<String>,
    /// Vertex IDs present in the old schema but absent from the new.
    pub removed_vertices: Vec<String>,
    /// Vertices whose `kind` changed between old and new.
    pub kind_changes: Vec<KindChange>,

    // --- Edges ---
    /// Edges present in the new schema but absent from the old.
    pub added_edges: Vec<Edge>,
    /// Edges present in the old schema but absent from the new.
    pub removed_edges: Vec<Edge>,

    // --- Constraints ---
    /// Constraints that changed between old and new, keyed by vertex ID.
    pub modified_constraints: HashMap<String, ConstraintDiff>,

    // --- Hyper-edges ---
    /// Hyper-edge IDs added in the new schema.
    pub added_hyper_edges: Vec<String>,
    /// Hyper-edge IDs removed from the old schema.
    pub removed_hyper_edges: Vec<String>,
    /// Hyper-edges whose kind, signature, or parent label changed.
    pub modified_hyper_edges: Vec<HyperEdgeChange>,

    // --- Required edges ---
    /// Per-vertex: required edges added in the new schema.
    pub added_required: HashMap<String, Vec<Edge>>,
    /// Per-vertex: required edges removed from the old schema.
    pub removed_required: HashMap<String, Vec<Edge>>,

    // --- NSIDs ---
    /// Vertex-to-NSID mappings added in the new schema.
    pub added_nsids: HashMap<String, String>,
    /// Vertex IDs whose NSID mapping was removed.
    pub removed_nsids: Vec<String>,
    /// NSID mappings that changed: `(vertex_id, old_nsid, new_nsid)`.
    pub changed_nsids: Vec<(String, String, String)>,

    // --- Variants ---
    /// Variants added in the new schema.
    pub added_variants: Vec<Variant>,
    /// Variants removed from the old schema.
    pub removed_variants: Vec<Variant>,
    /// Variants whose tag changed (same ID, different tag).
    pub modified_variants: Vec<VariantChange>,

    // --- Orderings ---
    /// Edge ordering changes: `(edge, old_position, new_position)`.
    pub order_changes: Vec<(Edge, Option<u32>, Option<u32>)>,

    // --- Recursion points ---
    /// Recursion points added in the new schema.
    pub added_recursion_points: Vec<RecursionPoint>,
    /// Recursion points removed from the old schema.
    pub removed_recursion_points: Vec<RecursionPoint>,
    /// Recursion points whose target vertex changed.
    pub modified_recursion_points: Vec<RecursionPointChange>,

    // --- Usage modes ---
    /// Usage mode changes: `(edge, old_mode, new_mode)`.
    pub usage_mode_changes: Vec<(Edge, UsageMode, UsageMode)>,

    // --- Spans ---
    /// Span IDs added in the new schema.
    pub added_spans: Vec<String>,
    /// Span IDs removed from the old schema.
    pub removed_spans: Vec<String>,
    /// Spans whose left or right vertex changed.
    pub modified_spans: Vec<SpanChange>,

    // --- Nominal ---
    /// Nominal flag changes: `(vertex_id, old_value, new_value)`.
    pub nominal_changes: Vec<(String, bool, bool)>,

Each category uses simple collections of added and removed elements. The diff doesn’t assign severity; that’s classify’s job. Think of it as a structural delta independent of any protocol.

11.1.2 Constraint and kind changes

Fine-grained tracking of constraint mutations comes through ConstraintDiff, ConstraintChange, and KindChange:

    /// Coercion keys `(source_kind, target_kind)` added in the new schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub added_coercions: Vec<(String, String)>,
    /// Coercion keys `(source_kind, target_kind)` removed from the old schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub removed_coercions: Vec<(String, String)>,
    /// Coercion keys `(source_kind, target_kind)` whose expression changed.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub modified_coercions: Vec<(String, String)>,

    /// Merger keys (vertex ID) added in the new schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub added_mergers: Vec<String>,
    /// Merger keys (vertex ID) removed from the old schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub removed_mergers: Vec<String>,
    /// Merger keys (vertex ID) whose expression changed.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub modified_mergers: Vec<String>,

    /// Default keys (vertex ID) added in the new schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub added_defaults: Vec<String>,
    /// Default keys (vertex ID) removed from the old schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub removed_defaults: Vec<String>,
    /// Default keys (vertex ID) whose expression changed.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub modified_defaults: Vec<String>,

    /// Policy keys (sort name) added in the new schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]

For each vertex, ConstraintDiff tracks three categories: constraints added (new but not old), removed (old but not new), and changed (both present but different). The ConstraintChange struct captures the sort, old value, and new value. This detail lets the classifier determine the direction of change—tightening or relaxing.

KindChange records vertices whose kind field shifted (e.g., "string" to "integer"). A kind change always signals incompatibility because existing data may not conform to the new kind.

11.1.3 Extended change types

The diff also tracks fine-grained changes to hyper-edges, variants, recursion points, and spans:

    /// Policy keys (sort name) removed from the old schema.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub removed_policies: Vec<String>,
    /// Policy keys (sort name) whose expression changed.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub modified_policies: Vec<String>,
}

/// Describes how constraints on a single vertex changed.
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct ConstraintDiff {
    /// Constraints added in the new schema.
    pub added: Vec<Constraint>,
    /// Constraints removed from the old schema.
    pub removed: Vec<Constraint>,
    /// Constraints whose value changed.
    pub changed: Vec<ConstraintChange>,
}

/// A single constraint that changed its value.
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct ConstraintChange {
    /// The constraint sort (e.g., `"maxLength"`).
    pub sort: String,
    /// The value in the old schema.
    pub old_value: String,
    /// The value in the new schema.
    pub new_value: String,
}

/// Records a vertex whose kind changed between schema versions.
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct KindChange {
    /// The vertex ID.
    pub vertex_id: String,
    /// The kind in the old schema.
    pub old_kind: String,
    /// The kind in the new schema.
    pub new_kind: String,
}

/// Records changes to a hyper-edge's kind, signature, or parent label.
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct HyperEdgeChange {
    /// The hyper-edge ID.
    pub id: String,
    /// Kind change: `(old_kind, new_kind)`, or `None` if unchanged.
    pub kind_change: Option<(String, String)>,
    /// Signature labels added: label → `vertex_id`.
    pub signature_added: HashMap<String, String>,
    /// Signature labels removed: label → `vertex_id`.

11.1.4 The diff function

    /// Signature labels whose vertex changed: label → (`old_vid`, `new_vid`).
    pub signature_changed: HashMap<String, (String, String)>,
    /// Parent label change: `(old, new)`, or `None` if unchanged.
    pub parent_label_change: Option<(String, String)>,
}

/// Records a variant whose tag changed between schema versions.
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]

The function compares all schema fields using set operations. Output is deterministic: added and removed lists are sorted for consistent comparison in tests and CI.

Note

The diff operates on panproto’s graph-level schema representation, not on protocol-specific formats. A SchemaDiff between two JSON Schema versions and a SchemaDiff between two Protobuf versions have the same type and structure. Protocol-specific semantics only matter during classification.

11.1.5 What diff does not track

The diff is purely structural. It doesn’t determine breaking vs. non-breaking (that’s classify’s responsibility). It can’t detect renames—a rename appears as a removal plus addition. It doesn’t identify semantic equivalences—two structurally different but semantically equivalent schemas will produce a non-empty diff.

For rename detection, use the migration engine (Chapter 9) to compute an explicit mapping, then classify the migration’s implied diff.

11.2 Protocol-aware classification

Classification is where protocol semantics enter. The classify function takes a SchemaDiff and a Protocol and determines each change’s severity based on the protocol’s edge rules and constraint sorts.

11.2.1 Breaking change rules

Change When Breaking Rationale
Vertex removal Always Consumers depending on it will fail
Edge removal When edge kind matches a protocol edge rule Protocol-governed edges are structural contracts
Kind change Always Existing data may not conform
Constraint tightening When protocol recognizes the sort Existing data may violate tighter constraints
Constraint addition When protocol recognizes the sort Existing data may violate new constraints

Vertex removal is unconditionally breaking because any consumer (reading a field, following an edge, anchoring a node) will fail.

Edge removal depends on the protocol. The protocol’s edge_rules declare which edge kinds matter structurally. JSON Schema protocols govern "prop" edges; SQL protocols govern "fk" edges. Removing a governed edge breaks the contract. Removing an ungoverned edge (metadata, annotations) does not.

Kind changes are unconditionally breaking. Changing a vertex from "string" to "integer" means existing string data can’t be interpreted as integers without explicit coercion.

11.2.2 Non-breaking changes

Change Rationale
Vertex addition New consumers use it; existing consumers ignore it
Edge addition Additive; existing consumers unaffected
Constraint relaxation Existing data remains valid under looser constraints
Constraint removal Equivalent to maximally relaxing

Vertex and edge additions are the workhorse of backward-compatible evolution. Adding an optional field to JSON Schema, a new column to SQL, a new message to Protobuf—all are non-breaking.

Constraint relaxation preserves existing data. Raising maxLength from 300 to 3000 doesn’t invalidate any existing strings.

CautionEdge Removal Without Protocol Governance

If the protocol has no EdgeRule for a removed edge’s kind, the removal is classified as non-breaking. Ungoverned edges are metadata or annotations, not structural contracts. To make all edge removals breaking, add a catch-all edge rule to the protocol.

11.2.3 Constraint direction

Numeric constraints are analyzed for direction automatically:

  • Upper bounds (maxLength, maxSize, maximum, maxGraphemes): a smaller new value is tightening (breaking); larger is relaxing (non-breaking).
  • Lower bounds (minLength, minimum): a larger new value is tightening (breaking); smaller is relaxing (non-breaking).
  • Other constraints: any value change is treated as tightening (breaking) unless the protocol provides direction metadata.
Tip

For custom constraint sorts with known directionality, extend the is_constraint_tightened function in classify.rs. Unrecognized sorts default to “any change is breaking”—the conservative choice.

11.2.4 Compatibility levels

The CompatReport assigns one of three implicit levels:

  • Fully compatible: both breaking and non_breaking lists are empty. The schemas are structurally identical.
  • Backward compatible: breaking is empty but non_breaking is not. Existing consumers are safe; new ones can use additions. The compatible field is true.
  • Breaking: the breaking list is non-empty. At least one change is incompatible. The compatible field is false.
Important

Classification is protocol-dependent. The same SchemaDiff can produce different CompatReport results for different protocols. Removing an edge with kind "fk" is breaking under the SQL protocol but might be non-breaking under a protocol that doesn’t govern that edge kind. Always pass the correct protocol to classify.

11.3 Reporting

The report module produces output suitable for humans or machines.

11.3.1 Text reports

report_text produces plain-text output for terminals:

  1. A compatibility verdict: "COMPATIBLE: No breaking changes detected." or "INCOMPATIBLE: Breaking changes detected."
  2. A numbered list of breaking changes (if any), with count header.
  3. A numbered list of non-breaking changes (if any), with count header.
  4. If both lists are empty: "No changes detected."

This is what schema check displays by default.

11.3.2 Json reports

report_json produces a serde_json::Value with three fields:

  • compatible (bool): whether the migration is backward-compatible.
  • breaking (array): each entry describes a breaking change with type and details.
  • non_breaking (array): each entry describes a non-breaking change with type and details.

Consumed by CI integrations (GitHub Actions annotations), the TypeScript SDK, and any tooling needing machine-readable verdicts.

11.3.3 Choosing a report format

Use report_text for developer-facing output (CLI, PR comments). Use report_json for machine-facing output (CI gates, automated migration generators, monitoring).

CautionIntegrating JSON Reports with GitHub Actions

Parse the JSON and create inline annotations using the GitHub Checks API. The breaking array entries contain vertex IDs and edge descriptions that map back to source locations if metadata includes line numbers. See Chapter 17 for a worked example.

11.4 Relationship to migrations and lenses

Breaking change detection complements the migration and lens systems:

  • Before migration: run diff + classify to assess safety. If the report says “breaking,” provide a migration with resolvers, supply defaults, or use a lens with complement tracking.
  • With lenses: the RemoveField combinator in panproto-lens produces a non-empty complement. The check crate confirms the corresponding schema change is indeed breaking, validating the lens is necessary and correctly scoped.
  • In CI: the schema check command runs this pipeline and exits with non-zero status on breaking changes, enabling gatekeeping. PRs introducing breaking changes without accompanying migrations can be flagged automatically.
  • In documentation: the text report can be included in changelog entries or release notes to communicate schema changes to downstream consumers.

11.5 Integration with CI

The schema check CLI command wraps this pipeline:

schema check --old schema-v1.json --new schema-v2.json --protocol atproto

Exit codes:

  • 0: fully compatible or backward compatible.
  • 1: breaking changes detected.
  • 2: error (malformed schema, unknown protocol, etc.).

11.6 Performance

The entire pipeline is designed for interactive use:

  • diff is \(O(|V_{\mathrm{old}}| + |V_{\mathrm{new}}| + |E_{\mathrm{old}}| + |E_{\mathrm{new}}| + |C_{\mathrm{old}}| + |C_{\mathrm{new}}|)\) where \(V\), \(E\), and \(C\) are vertex, edge, and constraint counts. For typical schemas with hundreds of vertices, this completes in microseconds.
  • classify is \(O(|\mathrm{diff}|)\), iterating once over each change category.
  • report_text and report_json are \(O(|\mathrm{report}|)\).

The pipeline runs on every commit in CI without measurable impact.

11.7 Error handling

The panproto-check crate defines a CheckError type for failures that prevent analysis:

  • Schema deserialization failures (malformed input).
  • Protocol lookup failures (unknown protocol name).

A CompatReport with breaking changes is not an error; it’s a successful analysis. Errors cover only cases where the analysis itself can’t proceed.

11.8 Module map

Module Exports Purpose
diff SchemaDiff, ConstraintDiff, ConstraintChange, KindChange, diff Protocol-agnostic structural diffing
classify CompatReport, BreakingChange, NonBreakingChange, classify Protocol-aware severity classification
report report_text, report_json Human and machine-readable output
error CheckError Error types

All public types are re-exported from the crate root.

11.9 Testing strategy

The test suite validates each stage:

  • Diff tests verify that specific structural changes (vertex addition/removal, edge changes, constraint modification, kind change) produce expected SchemaDiff entries. An important invariant: diffing identical schemas produces an empty diff.
  • Classification tests verify that breaking and non-breaking changes are correctly assigned based on protocol edge rules and constraint sorts. Edge cases include removing an edge whose kind isn’t governed (should be non-breaking) and tightening a constraint whose sort isn’t recognized (should be ignored).
  • Report tests verify that text and JSON formats are correctly structured and contain expected information.