6 Bidirectional Migration with Lenses
7 Bidirectional Migration with Lenses
Everything so far has been one-directional: given a v2 record, produce a v1 record. The tags field disappears, and that is that. But what if you need to go back? A Mastodon bridge translates a Bluesky post into ActivityPub format, a user edits the translated version, and the bridge needs to reconstruct a Bluesky post from the edited result. The naive approach is to re-translate from scratch, but that loses every ATProto-specific field that ActivityPub has no concept of—like counts, reply threading, and labels.
A lens solves this by recording what the forward translation discarded, so the backward translation can restore it. Think of it as a store receipt: when you buy an item (get), you receive the item plus a receipt. When you return the item (put), you hand back the item and the receipt, and the store restores its original state.
7.1 The lens abstraction
An asymmetric lens \(\ell\colon S \rightleftarrows V\) consists of two operations:
\[\mathtt{get}\colon S \to (V, C) \qquad \mathtt{put}\colon (V, C) \to S\]
The source \(S\) is the full data (a Bluesky post), the view \(V\) is the projected data (an ActivityPub note), and the complement \(C\) is the receipt—everything get discarded so put can reconstruct the original.
These two operations must satisfy two laws.
GetPut (round-tripping). Getting and then putting back without modifying the view recovers the original source exactly:
\[\mathtt{put}(\mathtt{get}(s)) = s.\]
No information is lost.
PutGet (view consistency). Putting a view into the source and then getting it back returns the view you put in:
\[\pi_1(\mathtt{get}(\mathtt{put}(v, c))) = v.\]
The view is faithfully represented in the reconstructed source.
Together, these laws say that get and put are mutual inverses up to the complement. The complement is the slack that allows the source to carry more information than the view without violating invertibility.
The name comes from the idea of focusing. A lens focuses on a part of a larger structure (the view) while keeping the rest (the complement) available for reconstruction. Like an optical lens, it lets you see a specific slice while the surrounding context remains intact but out of focus.
7.2 The complement as data receipt
The complement is the central data structure. It records exactly what get discarded, organized by pipeline step.
pub struct Complement {
/// nodes pruned in step 1 (anchor not surviving)
pub dropped_nodes: HashMap<u32, Node>,
/// arcs connecting pruned nodes to surviving nodes
pub dropped_arcs: Vec<(u32, u32, Edge)>,
/// edge labels chosen by the resolver in step 4
pub contraction_choices: HashMap<(u32, u32), Vec<VertexId>>,
/// original parent of each contracted node (before re-parenting)
pub original_parent: HashMap<u32, u32>,
/// fan children pruned in step 5 (hypergraph schemas only)
pub pruned_fan_children: HashMap<FanId, Vec<u32>>,
}Each field corresponds to a step of the restrict pipeline from the previous chapters:
| Complement field | Pipeline step | What it records |
|---|---|---|
dropped_nodes |
Step 1: anchor_surviving |
Nodes whose anchor was pruned |
dropped_arcs |
Step 2: reachable_from_root |
Edges to unreachable nodes |
contraction_choices |
Steps 3–4: contraction + resolution | The intermediate path that was collapsed |
original_parent |
Step 3: ancestor_contraction |
Original parent before re-parenting |
pruned_fan_children |
Step 5: reconstruct_fans |
Fan children that were removed |
The complement’s size is proportional to what the migration discards. A migration that removes two leaf fields from a ten-field schema produces a complement roughly 20% the size of the source. A migration that projects away 90% of the schema produces a large complement.
Application code should treat the complement as opaque. Its internal structure is an implementation detail of the pipeline. The only valid operation on a complement is passing it to put.
The complement for removeField stores the removed value. What does the complement for renameField store?
Nothing. A rename is a bijection: given the old name and the new name, you can always reconstruct the original field label without storing anything. The put operation renames back. This is why renames are classified as lossless.
7.3 Lenses in panproto
In panproto, get is the restrict pipeline plus complement capture. put is reconstruction from the complement.
The get direction runs the five-step W-type restrict pipeline (or functor restrict for SQL) and simultaneously records what was removed at each step. The put direction replays the removals in reverse: restore pruned fans, re-expand contracted ancestors, re-attach unreachable nodes, and restore pruned anchors.
pub fn put(view: &WTree, complement: &Complement) -> WTree {
let mut tree = view.clone();
// step 5 inverse: restore pruned fan children
restore_fan_children(&mut tree, &complement.pruned_fan_children);
// step 3 inverse: re-expand contracted paths
expand_contractions(&mut tree, &complement.original_parent,
&complement.contraction_choices);
// step 2 inverse: re-attach unreachable nodes
reattach_unreachable(&mut tree, &complement.dropped_arcs);
// step 1 inverse: restore pruned nodes
restore_nodes(&mut tree, &complement.dropped_nodes);
tree
}The lens laws hold because the complement records exactly what get discarded. put has enough information to invert each step, and the sequential structure of the pipeline means the inversions compose correctly.
If the view has been modified (a field value changed, say), the modification is preserved in the reconstructed source. The complement fills in only the missing parts; it does not overwrite the view’s content. This is what makes lenses useful for bidirectional synchronization, not just round-tripping.
7.4 Cambria-style combinators
Building migrations by hand—specifying vertex maps, edge maps, and resolver tables—is precise but tedious. For common schema changes, panproto provides combinators: small, pre-built lenses that you compose into a pipeline.
The design follows the Cambria project’s lens combinators (Litt et al. 2022), adapted to panproto’s categorical framework. Each combinator is a well-behaved lens (it satisfies GetPut and PutGet). Composition of well-behaved lenses is a well-behaved lens. So a pipeline of combinators automatically satisfies the lens laws.
const lens = pipeline([
renameField('displayName', 'name'),
addField('bio', 'string', ''),
removeField('legacyField'),
]);7.4.1 The combinator catalog
removeField.
| Combinator | Forward (\(\mathrm{get}\)) | Backward (\(\mathrm{put}\)) |
|---|---|---|
renameField(old, new) |
Rename field in output | Rename back |
addField(name, default) |
Add field with default value | Remove the field |
removeField(name) |
Remove the field | Restore from complement |
wrapInObject(field, wrapper) |
Nest field inside a new object | Unwrap |
hoistField(path) |
Move a nested field up one level | Move back down |
coerceType(field, from, to) |
Convert value (e.g., string → int) | Convert back |
panproto also provides naming combinators that operate on the nine naming sites described in the chapter on names:
| Combinator | What it renames | Cascades? |
|---|---|---|
renameVertex(old, new) |
Vertex ID | Yes: edges, constraints, variants, hyper-edges |
renameKind(vertex, kind) |
Single vertex’s kind | No |
renameEdgeKind(old, new) |
All edges with matching kind | No |
renameNsid(vertex, nsid) |
Namespace identifier | No |
renameConstraintSort(old, new) |
Constraint sort name | No |
applyTheoryMorphism(sortMap, opMap) |
Vertex kinds + edge kinds | Yes, via theory morphism |
rename(site, old, new) |
Any naming site | Depends on site |
Each combinator handles its own complement. removeField stores the removed value. addField stores nothing (the default is known). renameField stores nothing (the mapping is invertible). wrapInObject stores the wrapper structure.
7.4.2 Composing combinators
Combinators compose via pipeline():
const migration = pipeline(
renameField("userName", "handle"),
removeField("internalId"),
addField("version", 2),
hoistField("profile.displayName"),
);The pipeline applies combinators left to right for get and right to left for put. The complement is a stack: each combinator pushes its complement fragment during get, and put pops them in reverse order.
Mathematically, this is function composition in the lens category. If \(\ell_1\colon S \rightleftarrows T\) and \(\ell_2\colon T \rightleftarrows V\), the composite \(\ell_2 \circ \ell_1\colon S \rightleftarrows V\) has:
\[\mathtt{get}_{2 \circ 1}(s) = \mathtt{get}_2(\pi_1(\mathtt{get}_1(s)))\] \[\mathtt{put}_{2 \circ 1}(v, (c_1, c_2)) = \mathtt{put}_1(\mathtt{put}_2(v, c_2), c_1)\]
The complement of the composite is the pair \((c_1, c_2)\).
7.5 Worked example: get and put
import { Panproto, renameField, addField, removeField, pipeline } from '@panproto/core';
const panproto = await Panproto.init();
const atproto = panproto.protocol('atproto');
// start snippet combinators
const lens = pipeline([
renameField('displayName', 'name'),
addField('bio', 'string', ''),
removeField('legacyField'),
]);
// end snippet combinators
// start snippet get-put
// Assume schemaV1, schemaV2, and migration are already built
const inputRecord = {
displayName: 'Alice',
legacyField: 'old-data',
};
// Forward: project to target schema, capturing the complement
const { view, complement } = migration.get(inputRecord);
// view: { name: 'Alice', bio: '' }
// complement: Uint8Array (opaque—tracks dropped fields and resolver choices)
// Backward: restore from modified view + complement
const modifiedView = { name: 'Alice (updated)', bio: 'Hello!' };
const restored = migration.put(modifiedView, complement);
// restored.data: original structure with modifications propagated back
// end snippet get-putThe critical observation: restored.data.legacyField is "old-data" even though the view never contained it. The complement remembered it, and put restored it. Meanwhile, restored.data.name is "Alice (updated)" (from the modified view), not "Alice" (from the original source). The lens respected the view modification.
GetPut guarantees that if we had not modified the view, put(get(source)) would have returned the original source exactly. PutGet guarantees the converse: getting from the put result returns a view with name = "Alice (updated)".
7.6 A real-world example: a Mastodon-to-Bluesky bridge
Consider the interoperability problem from the introduction: you are building a bridge between Bluesky (ATProto) and Mastodon (ActivityPub). A Bluesky post arrives in camelCase ATProto format; the Mastodon side expects snake_case ActivityPub.
// ATProto post → ActivityPub Note
const atprotoToActivityPub = pipeline(
renameField("text", "content"),
renameField("createdAt", "published"),
renameField("authorDid", "attributed_to"),
removeField("likeCount"), // ATProto-only; complement remembers it
removeField("repostCount"), // ATProto-only; complement remembers it
addField("@context", "https://www.w3.org/ns/activitystreams"),
addField("type", "Note"),
wrapInObject("attributed_to", "actor"),
);
// a Bluesky post arrives
const bskyPost = {
text: "Hello from Bluesky!",
createdAt: "2025-12-01T12:00:00Z",
authorDid: "did:plc:abc123",
likeCount: 42,
repostCount: 7,
};
// forward: translate to ActivityPub
const { view: apNote, complement } = atprotoToActivityPub.get(bskyPost);
// apNote:
// {
// content: "Hello from Bluesky!",
// published: "2025-12-01T12:00:00Z",
// actor: { attributed_to: "did:plc:abc123" },
// "@context": "https://www.w3.org/ns/activitystreams",
// type: "Note",
// }
// a Mastodon user replies; the bridge modifies the note and sends it back
const modifiedNote = { ...apNote, content: "Hello from Bluesky! (edited)" };
const restored = atprotoToActivityPub.put(modifiedNote, complement);
// restored:
// {
// text: "Hello from Bluesky! (edited)",
// createdAt: "2025-12-01T12:00:00Z",
// authorDid: "did:plc:abc123",
// likeCount: 42, ← restored from complement
// repostCount: 7, ← restored from complement
// }likeCount and repostCount survive the round trip even though ActivityPub has no concept of them. The complement held onto the ATProto-specific data, and put restored it. The edit to content propagated back as an edit to text. No data was lost; no data was invented.
This is the pattern that makes bidirectional protocol bridges viable. Without the complement, you would need to store the original ATProto record somewhere and diff against it on the return trip, which amounts to reinventing the lens machinery by hand.
Is there a schema change where the complement is larger than the source data? If so, construct one. If not, explain why not.
Yes. Consider a migration that wraps every leaf field in a new container object: wrapInObject("name", "nameWrapper") applied to every field. The complement must store the wrapper structure for each field so that put can unwrap them. If the schema has many fields and each wrapper adds structural metadata, the complement’s structural overhead can exceed the original flat data. In practice this is rare, but theoretically the complement is proportional to what the migration adds, not just what it removes.
7.7 Complement serialization
The complement must be stored somewhere between get and put. panproto serializes it as CBOR (Concise Binary Object Representation), which is compact and fast to parse.
For a typical schema evolution—removing a few fields from a 20-field schema—the serialized complement is a few hundred bytes per record. For bulk migrations, complements can be batched into a single CBOR array, compressed with zstd, and stored alongside the migrated data.
The storage strategy depends on the application. An ephemeral complement lives in memory during a request-response cycle: translate on the way in, translate back on the way out. A persistent complement lives in a database column or object store, enabling deferred round-tripping: migrate data forward today, receive modifications over time, migrate back later. A streaming complement attaches to a message header in a pipeline (Kafka, NATS), so downstream consumers can round-trip without access to the original source.
Because the complement consists mostly of dropped field values, which are often repetitive (many records with likeCount = 0 or displayName = null), it compresses well. In benchmarks on ATProto data, zstd compression reduces complement size by 85–95%.
7.8 How lenses relate to the restrict pipeline
The restrict pipeline from the previous chapters is get without complement capture. panproto implements get by running the restrict pipeline and simultaneously building the complement at each step: step 1 records pruned nodes, step 2 records dropped arcs, step 3 records original parents, step 4 records contraction choices, step 5 records pruned fan children.
The complement is built during the forward pass, not after. The restrict pipeline makes a single pass over the instance tree, and complement capture adds only constant overhead per node.
Suppose we swapped the order of steps 3 and 4 in the restrict pipeline. Would the complement still produce a correct round-trip?
No. Step 3 (ancestor contraction) determines which parent-child pairs need new edge labels. Step 4 (edge resolution) assigns those labels. If you resolve edges before contracting ancestors, you assign labels to the original (pre-contraction) tree, where the parent-child relationships are different. The complement recorded during a swapped-order get would encode the wrong structural associations, and put would reconstruct an incorrect tree. The pipeline’s correctness depends on each step’s complement being recorded in the context established by all previous steps.
7.9 Batch mode and incremental mode
Lenses operate in two modes. In batch mode, get and put operate on whole instances: you hand get a complete record and receive a complete view plus complement. This is the mode described throughout this chapter, and it is the right tool for format conversion and VCS migration.
In incremental mode, get_edit and put_edit operate on individual edits (patches): you hand get_edit a single field update or node insertion, and it returns the corresponding edit in the view schema, updating the complement as a state machine. Incremental mode is the right tool for live synchronization, where re-migrating the entire dataset on every edit would be too expensive. The chapter on edit lenses covers this in detail.
7.10 Further reading
The foundational paper on asymmetric lenses for bidirectional programming is Foster et al. (2007), which treats the string case (regular-expression-based transformations) but whose abstract framework applies directly to trees and graphs. Johnson and Rosebrugh (2013) generalized both asymmetric and symmetric lenses to the delta lens framework, connecting lenses to fibrations and indexed categories. Litt et al. (2022) is the practical inspiration for panproto’s combinator API; Cambria demonstrated that lens combinators work for real schema evolution. panproto extends Cambria’s approach from flat JSON schemas to recursive, polynomial-functor schemas.1
The next chapter wires all of this into CI: classifying schema changes as fully compatible, backward compatible, or breaking, and enforcing the classification in a pull request workflow.
Tracking complements through lens composition gives the system the structure of a fibration, a mathematical framework for propagating “extra data” through a pipeline. See Appendix A for the formal treatment (Johnson and Rosebrugh 2013; Clarke 2020).↩︎