6  Bidirectional Migration with Lenses

7 Bidirectional Migration with Lenses

Everything so far has been one-directional: given a v2 record, produce a v1 record. The tags field disappears, and that is that. But what if you need to go back? A Mastodon bridge translates a Bluesky post into ActivityPub format, a user edits the translated version, and the bridge needs to reconstruct a Bluesky post from the edited result. The naive approach is to re-translate from scratch, but that loses every ATProto-specific field that ActivityPub has no concept of—like counts, reply threading, and labels.

A lens solves this by recording what the forward translation discarded, so the backward translation can restore it. Think of it as a store receipt: when you buy an item (get), you receive the item plus a receipt. When you return the item (put), you hand back the item and the receipt, and the store restores its original state.

7.1 The lens abstraction

An asymmetric lens \(\ell\colon S \rightleftarrows V\) consists of two operations:

\[\mathtt{get}\colon S \to (V, C) \qquad \mathtt{put}\colon (V, C) \to S\]

The source \(S\) is the full data (a Bluesky post), the view \(V\) is the projected data (an ActivityPub note), and the complement \(C\) is the receipt—everything get discarded so put can reconstruct the original.

These two operations must satisfy two laws.

GetPut (round-tripping). Getting and then putting back without modifying the view recovers the original source exactly:

\[\mathtt{put}(\mathtt{get}(s)) = s.\]

No information is lost.

PutGet (view consistency). Putting a view into the source and then getting it back returns the view you put in:

\[\pi_1(\mathtt{get}(\mathtt{put}(v, c))) = v.\]

The view is faithfully represented in the reconstructed source.

Together, these laws say that get and put are mutual inverses up to the complement. The complement is the slack that allows the source to carry more information than the view without violating invertibility.

NoteWhy “lens”?

The name comes from the idea of focusing. A lens focuses on a part of a larger structure (the view) while keeping the rest (the complement) available for reconstruction. Like an optical lens, it lets you see a specific slice while the surrounding context remains intact but out of focus.

7.2 The complement as data receipt

The complement is the central data structure. It records exactly what get discarded, organized by pipeline step.

pub struct Complement {
    /// nodes pruned in step 1 (anchor not surviving)
    pub dropped_nodes: HashMap<u32, Node>,

    /// arcs connecting pruned nodes to surviving nodes
    pub dropped_arcs: Vec<(u32, u32, Edge)>,

    /// edge labels chosen by the resolver in step 4
    pub contraction_choices: HashMap<(u32, u32), Vec<VertexId>>,

    /// original parent of each contracted node (before re-parenting)
    pub original_parent: HashMap<u32, u32>,

    /// fan children pruned in step 5 (hypergraph schemas only)
    pub pruned_fan_children: HashMap<FanId, Vec<u32>>,
}

Each field corresponds to a step of the restrict pipeline from the previous chapters:

Complement field Pipeline step What it records
dropped_nodes Step 1: anchor_surviving Nodes whose anchor was pruned
dropped_arcs Step 2: reachable_from_root Edges to unreachable nodes
contraction_choices Steps 3–4: contraction + resolution The intermediate path that was collapsed
original_parent Step 3: ancestor_contraction Original parent before re-parenting
pruned_fan_children Step 5: reconstruct_fans Fan children that were removed

The complement’s size is proportional to what the migration discards. A migration that removes two leaf fields from a ten-field schema produces a complement roughly 20% the size of the source. A migration that projects away 90% of the schema produces a large complement.

Application code should treat the complement as opaque. Its internal structure is an implementation detail of the pipeline. The only valid operation on a complement is passing it to put.

CautionExercise

The complement for removeField stores the removed value. What does the complement for renameField store?

Nothing. A rename is a bijection: given the old name and the new name, you can always reconstruct the original field label without storing anything. The put operation renames back. This is why renames are classified as lossless.

7.3 Lenses in panproto

In panproto, get is the restrict pipeline plus complement capture. put is reconstruction from the complement.

The get direction runs the five-step W-type restrict pipeline (or functor restrict for SQL) and simultaneously records what was removed at each step. The put direction replays the removals in reverse: restore pruned fans, re-expand contracted ancestors, re-attach unreachable nodes, and restore pruned anchors.

pub fn put(view: &WTree, complement: &Complement) -> WTree {
    let mut tree = view.clone();

    // step 5 inverse: restore pruned fan children
    restore_fan_children(&mut tree, &complement.pruned_fan_children);

    // step 3 inverse: re-expand contracted paths
    expand_contractions(&mut tree, &complement.original_parent,
                        &complement.contraction_choices);

    // step 2 inverse: re-attach unreachable nodes
    reattach_unreachable(&mut tree, &complement.dropped_arcs);

    // step 1 inverse: restore pruned nodes
    restore_nodes(&mut tree, &complement.dropped_nodes);

    tree
}

The lens laws hold because the complement records exactly what get discarded. put has enough information to invert each step, and the sequential structure of the pipeline means the inversions compose correctly.

If the view has been modified (a field value changed, say), the modification is preserved in the reconstructed source. The complement fills in only the missing parts; it does not overwrite the view’s content. This is what makes lenses useful for bidirectional synchronization, not just round-tripping.

7.4 Cambria-style combinators

Building migrations by hand—specifying vertex maps, edge maps, and resolver tables—is precise but tedious. For common schema changes, panproto provides combinators: small, pre-built lenses that you compose into a pipeline.

The design follows the Cambria project’s lens combinators (Litt et al. 2022), adapted to panproto’s categorical framework. Each combinator is a well-behaved lens (it satisfies GetPut and PutGet). Composition of well-behaved lenses is a well-behaved lens. So a pipeline of combinators automatically satisfies the lens laws.

const lens = pipeline([
  renameField('displayName', 'name'),
  addField('bio', 'string', ''),
  removeField('legacyField'),
]);

7.4.1 The combinator catalog

Table 7.1: Schema-change combinators. All are lossless except removeField.
Combinator Forward (\(\mathrm{get}\)) Backward (\(\mathrm{put}\))
renameField(old, new) Rename field in output Rename back
addField(name, default) Add field with default value Remove the field
removeField(name) Remove the field Restore from complement
wrapInObject(field, wrapper) Nest field inside a new object Unwrap
hoistField(path) Move a nested field up one level Move back down
coerceType(field, from, to) Convert value (e.g., string → int) Convert back

panproto also provides naming combinators that operate on the nine naming sites described in the chapter on names:

Table 7.2: Naming combinators. All are lossless (empty complement).
Combinator What it renames Cascades?
renameVertex(old, new) Vertex ID Yes: edges, constraints, variants, hyper-edges
renameKind(vertex, kind) Single vertex’s kind No
renameEdgeKind(old, new) All edges with matching kind No
renameNsid(vertex, nsid) Namespace identifier No
renameConstraintSort(old, new) Constraint sort name No
applyTheoryMorphism(sortMap, opMap) Vertex kinds + edge kinds Yes, via theory morphism
rename(site, old, new) Any naming site Depends on site

Each combinator handles its own complement. removeField stores the removed value. addField stores nothing (the default is known). renameField stores nothing (the mapping is invertible). wrapInObject stores the wrapper structure.

7.4.2 Composing combinators

Combinators compose via pipeline():

const migration = pipeline(
  renameField("userName", "handle"),
  removeField("internalId"),
  addField("version", 2),
  hoistField("profile.displayName"),
);

The pipeline applies combinators left to right for get and right to left for put. The complement is a stack: each combinator pushes its complement fragment during get, and put pops them in reverse order.

Mathematically, this is function composition in the lens category. If \(\ell_1\colon S \rightleftarrows T\) and \(\ell_2\colon T \rightleftarrows V\), the composite \(\ell_2 \circ \ell_1\colon S \rightleftarrows V\) has:

\[\mathtt{get}_{2 \circ 1}(s) = \mathtt{get}_2(\pi_1(\mathtt{get}_1(s)))\] \[\mathtt{put}_{2 \circ 1}(v, (c_1, c_2)) = \mathtt{put}_1(\mathtt{put}_2(v, c_2), c_1)\]

The complement of the composite is the pair \((c_1, c_2)\).

7.5 Worked example: get and put

import { Panproto, renameField, addField, removeField, pipeline } from '@panproto/core';

const panproto = await Panproto.init();
const atproto = panproto.protocol('atproto');

// start snippet combinators
const lens = pipeline([
  renameField('displayName', 'name'),
  addField('bio', 'string', ''),
  removeField('legacyField'),
]);
// end snippet combinators

// start snippet get-put
// Assume schemaV1, schemaV2, and migration are already built
const inputRecord = {
  displayName: 'Alice',
  legacyField: 'old-data',
};

// Forward: project to target schema, capturing the complement
const { view, complement } = migration.get(inputRecord);
// view: { name: 'Alice', bio: '' }
// complement: Uint8Array (opaque—tracks dropped fields and resolver choices)

// Backward: restore from modified view + complement
const modifiedView = { name: 'Alice (updated)', bio: 'Hello!' };
const restored = migration.put(modifiedView, complement);
// restored.data: original structure with modifications propagated back
// end snippet get-put

The critical observation: restored.data.legacyField is "old-data" even though the view never contained it. The complement remembered it, and put restored it. Meanwhile, restored.data.name is "Alice (updated)" (from the modified view), not "Alice" (from the original source). The lens respected the view modification.

GetPut guarantees that if we had not modified the view, put(get(source)) would have returned the original source exactly. PutGet guarantees the converse: getting from the put result returns a view with name = "Alice (updated)".

7.6 A real-world example: a Mastodon-to-Bluesky bridge

Consider the interoperability problem from the introduction: you are building a bridge between Bluesky (ATProto) and Mastodon (ActivityPub). A Bluesky post arrives in camelCase ATProto format; the Mastodon side expects snake_case ActivityPub.

// ATProto post → ActivityPub Note
const atprotoToActivityPub = pipeline(
  renameField("text", "content"),
  renameField("createdAt", "published"),
  renameField("authorDid", "attributed_to"),
  removeField("likeCount"),       // ATProto-only; complement remembers it
  removeField("repostCount"),     // ATProto-only; complement remembers it
  addField("@context", "https://www.w3.org/ns/activitystreams"),
  addField("type", "Note"),
  wrapInObject("attributed_to", "actor"),
);

// a Bluesky post arrives
const bskyPost = {
  text: "Hello from Bluesky!",
  createdAt: "2025-12-01T12:00:00Z",
  authorDid: "did:plc:abc123",
  likeCount: 42,
  repostCount: 7,
};

// forward: translate to ActivityPub
const { view: apNote, complement } = atprotoToActivityPub.get(bskyPost);
// apNote:
// {
//   content: "Hello from Bluesky!",
//   published: "2025-12-01T12:00:00Z",
//   actor: { attributed_to: "did:plc:abc123" },
//   "@context": "https://www.w3.org/ns/activitystreams",
//   type: "Note",
// }

// a Mastodon user replies; the bridge modifies the note and sends it back
const modifiedNote = { ...apNote, content: "Hello from Bluesky! (edited)" };
const restored = atprotoToActivityPub.put(modifiedNote, complement);
// restored:
// {
//   text: "Hello from Bluesky! (edited)",
//   createdAt: "2025-12-01T12:00:00Z",
//   authorDid: "did:plc:abc123",
//   likeCount: 42,       ← restored from complement
//   repostCount: 7,      ← restored from complement
// }

likeCount and repostCount survive the round trip even though ActivityPub has no concept of them. The complement held onto the ATProto-specific data, and put restored it. The edit to content propagated back as an edit to text. No data was lost; no data was invented.

This is the pattern that makes bidirectional protocol bridges viable. Without the complement, you would need to store the original ATProto record somewhere and diff against it on the return trip, which amounts to reinventing the lens machinery by hand.

CautionExercise

Is there a schema change where the complement is larger than the source data? If so, construct one. If not, explain why not.

Yes. Consider a migration that wraps every leaf field in a new container object: wrapInObject("name", "nameWrapper") applied to every field. The complement must store the wrapper structure for each field so that put can unwrap them. If the schema has many fields and each wrapper adds structural metadata, the complement’s structural overhead can exceed the original flat data. In practice this is rare, but theoretically the complement is proportional to what the migration adds, not just what it removes.

7.7 Complement serialization

The complement must be stored somewhere between get and put. panproto serializes it as CBOR (Concise Binary Object Representation), which is compact and fast to parse.

For a typical schema evolution—removing a few fields from a 20-field schema—the serialized complement is a few hundred bytes per record. For bulk migrations, complements can be batched into a single CBOR array, compressed with zstd, and stored alongside the migrated data.

The storage strategy depends on the application. An ephemeral complement lives in memory during a request-response cycle: translate on the way in, translate back on the way out. A persistent complement lives in a database column or object store, enabling deferred round-tripping: migrate data forward today, receive modifications over time, migrate back later. A streaming complement attaches to a message header in a pipeline (Kafka, NATS), so downstream consumers can round-trip without access to the original source.

Because the complement consists mostly of dropped field values, which are often repetitive (many records with likeCount = 0 or displayName = null), it compresses well. In benchmarks on ATProto data, zstd compression reduces complement size by 85–95%.

7.8 How lenses relate to the restrict pipeline

The restrict pipeline from the previous chapters is get without complement capture. panproto implements get by running the restrict pipeline and simultaneously building the complement at each step: step 1 records pruned nodes, step 2 records dropped arcs, step 3 records original parents, step 4 records contraction choices, step 5 records pruned fan children.

The complement is built during the forward pass, not after. The restrict pipeline makes a single pass over the instance tree, and complement capture adds only constant overhead per node.

CautionExercise

Suppose we swapped the order of steps 3 and 4 in the restrict pipeline. Would the complement still produce a correct round-trip?

No. Step 3 (ancestor contraction) determines which parent-child pairs need new edge labels. Step 4 (edge resolution) assigns those labels. If you resolve edges before contracting ancestors, you assign labels to the original (pre-contraction) tree, where the parent-child relationships are different. The complement recorded during a swapped-order get would encode the wrong structural associations, and put would reconstruct an incorrect tree. The pipeline’s correctness depends on each step’s complement being recorded in the context established by all previous steps.

7.9 Batch mode and incremental mode

Lenses operate in two modes. In batch mode, get and put operate on whole instances: you hand get a complete record and receive a complete view plus complement. This is the mode described throughout this chapter, and it is the right tool for format conversion and VCS migration.

In incremental mode, get_edit and put_edit operate on individual edits (patches): you hand get_edit a single field update or node insertion, and it returns the corresponding edit in the view schema, updating the complement as a state machine. Incremental mode is the right tool for live synchronization, where re-migrating the entire dataset on every edit would be too expensive. The chapter on edit lenses covers this in detail.

7.10 Further reading

The foundational paper on asymmetric lenses for bidirectional programming is Foster et al. (2007), which treats the string case (regular-expression-based transformations) but whose abstract framework applies directly to trees and graphs. Johnson and Rosebrugh (2013) generalized both asymmetric and symmetric lenses to the delta lens framework, connecting lenses to fibrations and indexed categories. Litt et al. (2022) is the practical inspiration for panproto’s combinator API; Cambria demonstrated that lens combinators work for real schema evolution. panproto extends Cambria’s approach from flat JSON schemas to recursive, polynomial-functor schemas.1

The next chapter wires all of this into CI: classifying schema changes as fully compatible, backward compatible, or breaking, and enforcing the classification in a pull request workflow.

Clarke, Bryce. 2020. “Internal Lenses as Functors and Cofunctors.” Applied Category Theory 2019, Electronic proceedings in theoretical computer science, vol. 323: 183–95. https://doi.org/10.4204/EPTCS.323.13.
Foster, J. Nathan, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. “Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-Update Problem.” ACM Transactions on Programming Languages and Systems 29 (3): Article 17. https://doi.org/10.1145/1232420.1232424.
Johnson, Michael, and Robert Rosebrugh. 2013. “Delta Lenses and Opfibrations.” Proceedings of the 2nd International Workshop on Bidirectional Transformations (BX 2013), Electronic communications of the EASST, vol. 57. https://doi.org/10.14279/tuj.eceasst.57.875.
Litt, Geoffrey, Martin Kleppmann, and Marc Shapiro. 2022. Project Cambria: Schema Evolution for CRDTs. Ink & Switch research essay. https://www.inkandswitch.com/cambria/.

  1. Tracking complements through lens composition gives the system the structure of a fibration, a mathematical framework for propagating “extra data” through a pipeline. See Appendix A for the formal treatment (Johnson and Rosebrugh 2013; Clarke 2020).↩︎