25 Edit Lenses: Incremental Migration
Batch migration works well when you move entire data sets from one schema version to another. But suppose two systems share a synchronized data set, each operating under a different schema version. When one system modifies a single record—inserts a field, updates a value—you don’t want to re-migrate the entire data set. You want to translate the individual edit itself. Edit lenses solve this by seeing changes (patches) and translating individual edits between schema versions, updating state incrementally.
25.1 The problem: translating patches
You maintain a Bluesky issue tracker where issues are assigned to a single person:
// schema v1: single assignee
pub struct Issue {
pub title: String,
pub assignee: String,
}A new schema version replaces assignee with a list of assignees:
// schema v2: multiple assignees
pub struct Issue {
pub title: String,
pub assignees: Vec<String>,
}A batch lens can convert an entire v1 Issue to v2 and back. But what happens when a collaborator on v1 sends you a patch: “set assignee to alice”? You need to translate that into “append alice to assignees” in v2. The batch lens has no concept of this; it only sees full records.
25.2 The TreeEdit monoid
An edit is an element of a monoid: a set with an associative binary operation (composition) and an identity element (the no-op edit). panproto defines TreeEdit for tree-shaped instances and TableEdit for relational instances.
TreeEdit has the following variants:
| Variant | Effect |
|---|---|
Identity |
No change |
InsertNode { parent, child_id, node, edge } |
Add a child node under a parent |
DeleteNode { id } |
Remove a leaf node |
ContractNode { id } |
Remove a node, reattaching its children to its parent |
RelabelNode { id, new_anchor } |
Change a node’s schema anchor |
SetField { node_id, field, value } |
Set or update a field value |
RemoveField { node_id, field } |
Remove a field from a node |
MoveSubtree { node_id, new_parent, edge } |
Reparent a subtree |
InsertFan { fan } / DeleteFan { hyper_edge_id } |
Manipulate hyper-edge fans |
JoinFeatures { primary, joined, produce } |
Merge co-located annotations |
Sequence(Vec<TreeEdit>) |
A sequence of edits |
The monoid laws hold:
\[\mathrm{apply}(\mathrm{identity}, s) = s\] \[\mathrm{apply}(\mathrm{compose}(e_1, e_2), s) = \mathrm{apply}(e_2, \mathrm{apply}(e_1, s))\]
Composition is associative. Nested sequences are flattened and identity elements are elided.
25.2.1 Concrete example
Returning to the ATProto issue tracker, the patch “set assignee to alice” is:
TreeEdit::SetField {
node_id: 1,
field: Name::from("assignee"),
value: Value::Str("alice".into()),
}The TableEdit monoid provides the relational counterpart: InsertRow, DeleteRow, and UpdateCell.
25.3 EditLens: wrapping a state-based lens
An EditLens wraps a state-based Lens (the batch lens from Chapter 7) and adds two operations:
get_edit(edit) -> TreeEdit: translate a source edit to a view editput_edit(edit) -> TreeEdit: translate a view edit back to a source edit
Construction starts with EditLens::from_lens(lens, protocol), which builds reverse remap tables from the compiled migration’s vertex and edge mappings. Before translating edits, you call initialize(source) to perform a whole-state get that populates the initial complement.
let lens = auto_generate(&old_schema, &new_schema, &protocol, &config)?;
let mut edit_lens = EditLens::from_lens(lens.lens, protocol);
edit_lens.initialize(&source_instance)?;After initialization, each call to get_edit or put_edit translates one edit and incrementally updates the complement.
25.4 The complement as a state machine
In batch lenses (Section 7.2), the complement is a snapshot: it captures what get discarded, and put consumes it. In edit lenses, the complement is a state machine. Each edit transition updates it incrementally:
- When
get_editabsorbs a non-surviving edit (the edit targets a node that does not exist in the view schema), the edit’s data is added to the complement. - When
get_editencounters aDeleteNodefor a node already in the complement, the node is removed from the complement. - When
put_editreceives an insert from the view, the reverse remap is applied and any complement data for that node is consumed.
The complement state after processing a sequence of edits is the same complement you would get from applying all edits to the source and then running a whole-state get. This is the complement coherence law (see below).
A snapshot complement requires a full get pass over the entire instance. For large data sets (millions of records) where edits arrive one at a time, rerunning get on every edit is prohibitively expensive. The state machine complement processes each edit in time proportional to the edit itself, not the instance size.
25.5 The five-step translation pipeline
Edit translation mirrors the five steps of the batch wtype_restrict pipeline, applied incrementally via EditPipeline:
- Anchor survival: does the edit target a node whose schema anchor survives the migration? If not, the edit is absorbed into the complement.
- Reachability: the
ReachabilityIndextracks which nodes can reach the root. Inserting an edge may make a subtree reachable; deleting an edge may make one unreachable. - Ancestor contraction: the
ContractionTrackerrecords which non-surviving nodes have been contracted (removed with children reattached to the nearest surviving ancestor). - Edge resolution: when contraction creates new arcs, the resolver determines which schema edge to assign.
- Fan reconstruction: fans (hyper-edge instances) with dropped participants are absorbed into the complement.
At each step, the edit may be transformed, split into multiple edits, or collapsed to Identity (absorbed).
let mut pipeline = EditPipeline::from_lens_and_instance(&edit_lens, &instance);
let mut complement = edit_lens.complement.clone();
let view_edit = pipeline.translate_get(&source_edit, &mut complement)?;For identity lenses (where source and target schemas are the same), edits pass through all five steps unchanged. The pipeline adds no overhead beyond a survival check per edit.
25.6 CLI: schema data sync --edits
The schema data sync command migrates data files to match the current schema version. With the --edits flag, it records an EditLogObject in the VCS:
schema data sync records/ --editsThe EditLogObject stores the translated edit sequence as MessagePack-encoded bytes, along with the schema ID, data set ID, edit count, and the final complement state. This enables replaying the incremental migration later or auditing which edits were applied.
For batch migration (without --edits), use:
schema data migrate records/And for checking data staleness:
schema data status records/25.7 EditProvenance: tracking translation lineage
Each translated edit can carry an EditProvenance record that captures:
source_edit_desc: a description of the original source editrules_applied: which translation rules fired duringget_edit(e.g.,"structural_remap","field_text")policy_consulted: which complement policy was used, if anywas_total: whether the translation was total (all constraints satisfied) or partial
let (view_edit, provenance) = edit_lens.get_edit_with_provenance(source_edit)?;
assert!(provenance.was_total);
assert!(provenance.rules_applied.iter().any(|r| r.as_ref() == "structural_remap"));Provenance is serializable (via serde), so it can be stored alongside the translated edits for auditing.
25.8 Edit lens laws
Edit lenses satisfy two laws from Hofmann, Pierce, and Wagner’s edit lens framework.
25.8.1 Consistency
Translating a source edit through the lens and applying it to the view produces the same result as applying the edit to the source and then doing a whole-state get:
\[\mathrm{apply}(\mathrm{get\_edit}(e), \mathrm{get}(s)) = \mathrm{get}(\mathrm{apply}(e, s))\]
This says the incremental path (translate then apply) and the batch path (apply then get) agree.
25.8.2 Complement coherence
The complement state after get_edit is consistent with the complement that a whole-state get would produce on the edited source:
\[c'_{\mathrm{incremental}} = c'_{\mathrm{batch}}\]
where \(c'_{\mathrm{incremental}}\) is the complement after calling get_edit(e), and \(c'_{\mathrm{batch}}\) is the complement from calling get on the source with \(e\) applied.
This says the state machine complement tracks the batch complement exactly. If it drifts, put may reconstruct incorrect data.
If a SetField edit targets a node that the migration conditionally drops (via a value-dependent predicate), under what circumstances could the Consistency law be violated?
If the SetField changes the value that the conditional-survival predicate evaluates, the edit might cause the node to transition from surviving to dropped (or vice versa). The incremental path must detect this transition and update the complement accordingly. If the edit lens fails to re-evaluate the predicate after the field change, it produces an incorrect view edit, violating Consistency. The implementation handles this by checking conditional_survival predicates on every InsertNode and RelabelNode edit.
25.8.3 Verification
panproto provides two functions for law checking:
use panproto_lens::edit_laws::{check_edit_consistency, check_complement_coherence};
check_edit_consistency(&mut edit_lens, &edit, &source)?;
check_complement_coherence(&mut edit_lens, &edit, &source)?;Both compare the incremental path against the batch path and return EditLawViolation if a mismatch is detected. The violation includes a detail string describing which field or node count diverged.
25.9 Cross-references
- For the batch lens framework (state-based
get/putwith complement), see Chapter 7. - For data versioning commands (
schema data migrate,schema data status), see Chapter 21. - For the developer-level architecture of edit lenses, see the edit lens internals chapter.
25.10 Exercises
Create two schema versions where v2 renames a field. Initialize an
EditLens, send aSetFieldedit targeting the renamed field, and verify thatget_editproduces an edit with the new field name.Create a projection lens that drops a vertex. Send an
InsertNodeedit for a node anchored at the dropped vertex. Verify that the edit is absorbed (returnsIdentity) and the node appears in the complement.Use
check_edit_consistencyto verify the Consistency law for a sequence of five edits against an identity lens. Then repeat with a projection lens and confirm the law still holds.Run
schema data sync records/ --editson a repository with two schema versions. Inspect the storedEditLogObjectwithschema show.Write a test that sends a
RelabelNodeedit causing a node to transition from non-surviving to surviving. Verify that the complement state machine correctly releases the node from the complement.