25  Edit Lenses: Incremental Migration

Batch migration works well when you move entire data sets from one schema version to another. But suppose two systems share a synchronized data set, each operating under a different schema version. When one system modifies a single record—inserts a field, updates a value—you don’t want to re-migrate the entire data set. You want to translate the individual edit itself. Edit lenses solve this by seeing changes (patches) and translating individual edits between schema versions, updating state incrementally.

25.1 The problem: translating patches

You maintain a Bluesky issue tracker where issues are assigned to a single person:

// schema v1: single assignee
pub struct Issue {
    pub title: String,
    pub assignee: String,
}

A new schema version replaces assignee with a list of assignees:

// schema v2: multiple assignees
pub struct Issue {
    pub title: String,
    pub assignees: Vec<String>,
}

A batch lens can convert an entire v1 Issue to v2 and back. But what happens when a collaborator on v1 sends you a patch: “set assignee to alice”? You need to translate that into “append alice to assignees” in v2. The batch lens has no concept of this; it only sees full records.

25.2 The TreeEdit monoid

An edit is an element of a monoid: a set with an associative binary operation (composition) and an identity element (the no-op edit). panproto defines TreeEdit for tree-shaped instances and TableEdit for relational instances.

TreeEdit has the following variants:

Variant Effect
Identity No change
InsertNode { parent, child_id, node, edge } Add a child node under a parent
DeleteNode { id } Remove a leaf node
ContractNode { id } Remove a node, reattaching its children to its parent
RelabelNode { id, new_anchor } Change a node’s schema anchor
SetField { node_id, field, value } Set or update a field value
RemoveField { node_id, field } Remove a field from a node
MoveSubtree { node_id, new_parent, edge } Reparent a subtree
InsertFan { fan } / DeleteFan { hyper_edge_id } Manipulate hyper-edge fans
JoinFeatures { primary, joined, produce } Merge co-located annotations
Sequence(Vec<TreeEdit>) A sequence of edits

The monoid laws hold:

\[\mathrm{apply}(\mathrm{identity}, s) = s\] \[\mathrm{apply}(\mathrm{compose}(e_1, e_2), s) = \mathrm{apply}(e_2, \mathrm{apply}(e_1, s))\]

Composition is associative. Nested sequences are flattened and identity elements are elided.

25.2.1 Concrete example

Returning to the ATProto issue tracker, the patch “set assignee to alice” is:

TreeEdit::SetField {
    node_id: 1,
    field: Name::from("assignee"),
    value: Value::Str("alice".into()),
}

The TableEdit monoid provides the relational counterpart: InsertRow, DeleteRow, and UpdateCell.

25.3 EditLens: wrapping a state-based lens

An EditLens wraps a state-based Lens (the batch lens from Chapter 7) and adds two operations:

  • get_edit(edit) -> TreeEdit: translate a source edit to a view edit
  • put_edit(edit) -> TreeEdit: translate a view edit back to a source edit

Construction starts with EditLens::from_lens(lens, protocol), which builds reverse remap tables from the compiled migration’s vertex and edge mappings. Before translating edits, you call initialize(source) to perform a whole-state get that populates the initial complement.

let lens = auto_generate(&old_schema, &new_schema, &protocol, &config)?;
let mut edit_lens = EditLens::from_lens(lens.lens, protocol);
edit_lens.initialize(&source_instance)?;

After initialization, each call to get_edit or put_edit translates one edit and incrementally updates the complement.

25.4 The complement as a state machine

In batch lenses (Section 7.2), the complement is a snapshot: it captures what get discarded, and put consumes it. In edit lenses, the complement is a state machine. Each edit transition updates it incrementally:

  • When get_edit absorbs a non-surviving edit (the edit targets a node that does not exist in the view schema), the edit’s data is added to the complement.
  • When get_edit encounters a DeleteNode for a node already in the complement, the node is removed from the complement.
  • When put_edit receives an insert from the view, the reverse remap is applied and any complement data for that node is consumed.

The complement state after processing a sequence of edits is the same complement you would get from applying all edits to the source and then running a whole-state get. This is the complement coherence law (see below).

NoteWhy a state machine?

A snapshot complement requires a full get pass over the entire instance. For large data sets (millions of records) where edits arrive one at a time, rerunning get on every edit is prohibitively expensive. The state machine complement processes each edit in time proportional to the edit itself, not the instance size.

25.5 The five-step translation pipeline

Edit translation mirrors the five steps of the batch wtype_restrict pipeline, applied incrementally via EditPipeline:

  1. Anchor survival: does the edit target a node whose schema anchor survives the migration? If not, the edit is absorbed into the complement.
  2. Reachability: the ReachabilityIndex tracks which nodes can reach the root. Inserting an edge may make a subtree reachable; deleting an edge may make one unreachable.
  3. Ancestor contraction: the ContractionTracker records which non-surviving nodes have been contracted (removed with children reattached to the nearest surviving ancestor).
  4. Edge resolution: when contraction creates new arcs, the resolver determines which schema edge to assign.
  5. Fan reconstruction: fans (hyper-edge instances) with dropped participants are absorbed into the complement.

At each step, the edit may be transformed, split into multiple edits, or collapsed to Identity (absorbed).

let mut pipeline = EditPipeline::from_lens_and_instance(&edit_lens, &instance);
let mut complement = edit_lens.complement.clone();

let view_edit = pipeline.translate_get(&source_edit, &mut complement)?;
Tip

For identity lenses (where source and target schemas are the same), edits pass through all five steps unchanged. The pipeline adds no overhead beyond a survival check per edit.

25.6 CLI: schema data sync --edits

The schema data sync command migrates data files to match the current schema version. With the --edits flag, it records an EditLogObject in the VCS:

schema data sync records/ --edits

The EditLogObject stores the translated edit sequence as MessagePack-encoded bytes, along with the schema ID, data set ID, edit count, and the final complement state. This enables replaying the incremental migration later or auditing which edits were applied.

For batch migration (without --edits), use:

schema data migrate records/

And for checking data staleness:

schema data status records/

25.7 EditProvenance: tracking translation lineage

Each translated edit can carry an EditProvenance record that captures:

  • source_edit_desc: a description of the original source edit
  • rules_applied: which translation rules fired during get_edit (e.g., "structural_remap", "field_text")
  • policy_consulted: which complement policy was used, if any
  • was_total: whether the translation was total (all constraints satisfied) or partial
let (view_edit, provenance) = edit_lens.get_edit_with_provenance(source_edit)?;
assert!(provenance.was_total);
assert!(provenance.rules_applied.iter().any(|r| r.as_ref() == "structural_remap"));

Provenance is serializable (via serde), so it can be stored alongside the translated edits for auditing.

25.8 Edit lens laws

Edit lenses satisfy two laws from Hofmann, Pierce, and Wagner’s edit lens framework.

25.8.1 Consistency

Translating a source edit through the lens and applying it to the view produces the same result as applying the edit to the source and then doing a whole-state get:

\[\mathrm{apply}(\mathrm{get\_edit}(e), \mathrm{get}(s)) = \mathrm{get}(\mathrm{apply}(e, s))\]

This says the incremental path (translate then apply) and the batch path (apply then get) agree.

25.8.2 Complement coherence

The complement state after get_edit is consistent with the complement that a whole-state get would produce on the edited source:

\[c'_{\mathrm{incremental}} = c'_{\mathrm{batch}}\]

where \(c'_{\mathrm{incremental}}\) is the complement after calling get_edit(e), and \(c'_{\mathrm{batch}}\) is the complement from calling get on the source with \(e\) applied.

This says the state machine complement tracks the batch complement exactly. If it drifts, put may reconstruct incorrect data.

CautionExercise: When does consistency fail?

If a SetField edit targets a node that the migration conditionally drops (via a value-dependent predicate), under what circumstances could the Consistency law be violated?

If the SetField changes the value that the conditional-survival predicate evaluates, the edit might cause the node to transition from surviving to dropped (or vice versa). The incremental path must detect this transition and update the complement accordingly. If the edit lens fails to re-evaluate the predicate after the field change, it produces an incorrect view edit, violating Consistency. The implementation handles this by checking conditional_survival predicates on every InsertNode and RelabelNode edit.

25.8.3 Verification

panproto provides two functions for law checking:

use panproto_lens::edit_laws::{check_edit_consistency, check_complement_coherence};

check_edit_consistency(&mut edit_lens, &edit, &source)?;
check_complement_coherence(&mut edit_lens, &edit, &source)?;

Both compare the incremental path against the batch path and return EditLawViolation if a mismatch is detected. The violation includes a detail string describing which field or node count diverged.

25.9 Cross-references

  • For the batch lens framework (state-based get/put with complement), see Chapter 7.
  • For data versioning commands (schema data migrate, schema data status), see Chapter 21.
  • For the developer-level architecture of edit lenses, see the edit lens internals chapter.

25.10 Exercises

  1. Create two schema versions where v2 renames a field. Initialize an EditLens, send a SetField edit targeting the renamed field, and verify that get_edit produces an edit with the new field name.

  2. Create a projection lens that drops a vertex. Send an InsertNode edit for a node anchored at the dropped vertex. Verify that the edit is absorbed (returns Identity) and the node appears in the complement.

  3. Use check_edit_consistency to verify the Consistency law for a sequence of five edits against an identity lens. Then repeat with a projection lens and confirm the law still holds.

  4. Run schema data sync records/ --edits on a repository with two schema versions. Inspect the stored EditLogObject with schema show.

  5. Write a test that sends a RelabelNode edit causing a node to transition from non-surviving to surviving. Verify that the complement state machine correctly releases the node from the complement.