6 Bidirectional Migration with Lenses

7 Bidirectional Migration with Lenses

Everything so far has been one-directional: given a v2 record, produce a v1 record. The tags field disappears, and that is that. But what if you need to go back? A Mastodon bridge translates a Bluesky post into ActivityPub format, a user edits the translated version, and the bridge needs to reconstruct a Bluesky post from the edited result. The naive approach is to re-translate from scratch, but that loses every ATProto-specific field that ActivityPub has no concept of—like counts, reply threading, and labels.

A lens solves this by recording what the forward translation discarded, so the backward translation can restore it. Think of it as a store receipt: when you buy an item (get), you receive the item plus a receipt. When you return the item (put), you hand back the item and the receipt, and the store restores its original state.

7.1 The lens abstraction

An asymmetric lens $\ell\colon S \rightleftarrows V$ consists of two operations:

\[\mathtt{get}\colon S \to (V, C) \qquad \mathtt{put}\colon (V, C) \to S\]

The source $S$ is the full data (a Bluesky post), the view $V$ is the projected data (an ActivityPub note), and the complement $C$ is the receipt—everything get discarded so put can reconstruct the original.

These two operations must satisfy two laws.

GetPut (round-tripping). Getting and then putting back without modifying the view recovers the original source exactly:

\[\mathtt{put}(\mathtt{get}(s)) = s.\]

No information is lost.

PutGet (view consistency). Putting a view into the source and then getting it back returns the view you put in:

\[\pi_1(\mathtt{get}(\mathtt{put}(v, c))) = v.\]

The view is faithfully represented in the reconstructed source.

Together, these laws say that get and put are mutual inverses up to the complement. The complement is the slack that allows the source to carry more information than the view without violating invertibility.

Why “lens”?

The name comes from the idea of focusing. A lens focuses on a part of a larger structure (the view) while keeping the rest (the complement) available for reconstruction. Like an optical lens, it lets you see a specific slice while the surrounding context remains intact but out of focus.

7.2 The complement as data receipt

The complement is the central data structure. It records exactly what get discarded, organized by pipeline step.

pub struct Complement {
    /// nodes pruned in step 1 (anchor not surviving)
    pub dropped_nodes: HashMap<u32, Node>,

    /// arcs connecting pruned nodes to surviving nodes
    pub dropped_arcs: Vec<(u32, u32, Edge)>,

    /// edge labels chosen by the resolver in step 4
    pub contraction_choices: HashMap<(u32, u32), Vec<VertexId>>,

    /// original parent of each contracted node (before re-parenting)
    pub original_parent: HashMap<u32, u32>,

    /// fan children pruned in step 5 (hypergraph schemas only)
    pub pruned_fan_children: HashMap<FanId, Vec<u32>>,
}

Each field corresponds to a step of the restrict pipeline from the previous chapters:

Complement field	Pipeline step	What it records
`dropped_nodes`	Step 1: `anchor_surviving`	Nodes whose anchor was pruned
`dropped_arcs`	Step 2: `reachable_from_root`	Edges to unreachable nodes
`contraction_choices`	Steps 3–4: contraction + resolution	The intermediate path that was collapsed
`original_parent`	Step 3: `ancestor_contraction`	Original parent before re-parenting
`pruned_fan_children`	Step 5: `reconstruct_fans`	Fan children that were removed

The complement’s size is proportional to what the migration discards. A migration that removes two leaf fields from a ten-field schema produces a complement roughly 20% the size of the source. A migration that projects away 90% of the schema produces a large complement.

Application code should treat the complement as opaque. Its internal structure is an implementation detail of the pipeline. The only valid operation on a complement is passing it to put.

Exercise

The complement for removeField stores the removed value. What does the complement for renameField store?

Solution

Nothing. A rename is a bijection: given the old name and the new name, you can always reconstruct the original field label without storing anything. The put operation renames back. This is why renames are classified as lossless.

7.3 Lenses in panproto

In panproto, get is the restrict pipeline plus complement capture. put is reconstruction from the complement.

The get direction runs the five-step W-type restrict pipeline (or functor restrict for SQL) and simultaneously records what was removed at each step. The put direction replays the removals in reverse: restore pruned fans, re-expand contracted ancestors, re-attach unreachable nodes, and restore pruned anchors.

pub fn put(view: &WTree, complement: &Complement) -> WTree {
    let mut tree = view.clone();

    // step 5 inverse: restore pruned fan children
    restore_fan_children(&mut tree, &complement.pruned_fan_children);

    // step 3 inverse: re-expand contracted paths
    expand_contractions(&mut tree, &complement.original_parent,
                        &complement.contraction_choices);

    // step 2 inverse: re-attach unreachable nodes
    reattach_unreachable(&mut tree, &complement.dropped_arcs);

    // step 1 inverse: restore pruned nodes
    restore_nodes(&mut tree, &complement.dropped_nodes);

    tree
}

The lens laws hold because the complement records exactly what get discarded. put has enough information to invert each step, and the sequential structure of the pipeline means the inversions compose correctly.

If the view has been modified (a field value changed, say), the modification is preserved in the reconstructed source. The complement fills in only the missing parts; it does not overwrite the view’s content. This is what makes lenses useful for bidirectional synchronization, not just round-tripping.

7.4 Cambria-style combinators

Building migrations by hand—specifying vertex maps, edge maps, and resolver tables—is precise but tedious. For common schema changes, panproto provides combinators: small, pre-built lenses that you compose into a pipeline.

The design follows the Cambria project’s lens combinators (Litt et al. 2022), adapted to panproto’s categorical framework. Each combinator is a well-behaved lens (it satisfies GetPut and PutGet). Composition of well-behaved lenses is a well-behaved lens. So a pipeline of combinators automatically satisfies the lens laws.

const lens = pipeline([
  renameField('displayName', 'name'),
  addField('bio', 'string', ''),
  removeField('legacyField'),
]);

7.4.1 The combinator catalog

Table 7.1: Schema-change combinators. All are lossless except removeField.

Combinator	Forward ($\mathrm{get}$)	Backward ($\mathrm{put}$)
`renameField(old, new)`	Rename field in output	Rename back
`addField(name, default)`	Add field with default value	Remove the field
`removeField(name)`	Remove the field	Restore from complement
`wrapInObject(field, wrapper)`	Nest field inside a new object	Unwrap
`hoistField(path)`	Move a nested field up one level	Move back down
`coerceType(field, from, to)`	Convert value (e.g., string → int)	Convert back

panproto also provides naming combinators that operate on the nine naming sites described in the chapter on names:

Table 7.2: Naming combinators. All are lossless (empty complement).

Combinator	What it renames	Cascades?
`renameVertex(old, new)`	Vertex ID	Yes: edges, constraints, variants, hyper-edges
`renameKind(vertex, kind)`	Single vertex’s kind	No
`renameEdgeKind(old, new)`	All edges with matching kind	No
`renameNsid(vertex, nsid)`	Namespace identifier	No
`renameConstraintSort(old, new)`	Constraint sort name	No
`applyTheoryMorphism(sortMap, opMap)`	Vertex kinds + edge kinds	Yes, via theory morphism
`rename(site, old, new)`	Any naming site	Depends on site

Each combinator handles its own complement. removeField stores the removed value. addField stores nothing (the default is known). renameField stores nothing (the mapping is invertible). wrapInObject stores the wrapper structure.

7.4.2 Composing combinators

Combinators compose via pipeline():

const migration = pipeline(
  renameField("userName", "handle"),
  removeField("internalId"),
  addField("version", 2),
  hoistField("profile.displayName"),
);

The pipeline applies combinators left to right for get and right to left for put. The complement is a stack: each combinator pushes its complement fragment during get, and put pops them in reverse order.

Mathematically, this is function composition in the lens category. If $\ell_1\colon S \rightleftarrows T$ and $\ell_2\colon T \rightleftarrows V$, the composite $\ell_2 \circ \ell_1\colon S \rightleftarrows V$ has:

\[\mathtt{get}_{2 \circ 1}(s) = \mathtt{get}_2(\pi_1(\mathtt{get}_1(s)))\] \[\mathtt{put}_{2 \circ 1}(v, (c_1, c_2)) = \mathtt{put}_1(\mathtt{put}_2(v, c_2), c_1)\]

The complement of the composite is the pair $(c_1, c_2)$.

7.5 Worked example: get and put

import { Panproto, renameField, addField, removeField, pipeline } from '@panproto/core';

const panproto = await Panproto.init();
const atproto = panproto.protocol('atproto');

// start snippet combinators
const lens = pipeline([
  renameField('displayName', 'name'),
  addField('bio', 'string', ''),
  removeField('legacyField'),
]);
// end snippet combinators

// start snippet get-put
// Assume schemaV1, schemaV2, and migration are already built
const inputRecord = {
  displayName: 'Alice',
  legacyField: 'old-data',
};

// Forward: project to target schema, capturing the complement
const { view, complement } = migration.get(inputRecord);
// view: { name: 'Alice', bio: '' }
// complement: Uint8Array (opaque—tracks dropped fields and resolver choices)

// Backward: restore from modified view + complement
const modifiedView = { name: 'Alice (updated)', bio: 'Hello!' };
const restored = migration.put(modifiedView, complement);
// restored.data: original structure with modifications propagated back
// end snippet get-put

The critical observation: restored.data.legacyField is "old-data" even though the view never contained it. The complement remembered it, and put restored it. Meanwhile, restored.data.name is "Alice (updated)" (from the modified view), not "Alice" (from the original source). The lens respected the view modification.

GetPut guarantees that if we had not modified the view, put(get(source)) would have returned the original source exactly. PutGet guarantees the converse: getting from the put result returns a view with name = "Alice (updated)".

7.6 A real-world example: a Mastodon-to-Bluesky bridge

Consider the interoperability problem from the introduction: you are building a bridge between Bluesky (ATProto) and Mastodon (ActivityPub). A Bluesky post arrives in camelCase ATProto format; the Mastodon side expects snake_case ActivityPub.

// ATProto post → ActivityPub Note
const atprotoToActivityPub = pipeline(
  renameField("text", "content"),
  renameField("createdAt", "published"),
  renameField("authorDid", "attributed_to"),
  removeField("likeCount"),       // ATProto-only; complement remembers it
  removeField("repostCount"),     // ATProto-only; complement remembers it
  addField("@context", "https://www.w3.org/ns/activitystreams"),
  addField("type", "Note"),
  wrapInObject("attributed_to", "actor"),
);

// a Bluesky post arrives
const bskyPost = {
  text: "Hello from Bluesky!",
  createdAt: "2025-12-01T12:00:00Z",
  authorDid: "did:plc:abc123",
  likeCount: 42,
  repostCount: 7,
};

// forward: translate to ActivityPub
const { view: apNote, complement } = atprotoToActivityPub.get(bskyPost);
// apNote:
// {
//   content: "Hello from Bluesky!",
//   published: "2025-12-01T12:00:00Z",
//   actor: { attributed_to: "did:plc:abc123" },
//   "@context": "https://www.w3.org/ns/activitystreams",
//   type: "Note",
// }

// a Mastodon user replies; the bridge modifies the note and sends it back
const modifiedNote = { ...apNote, content: "Hello from Bluesky! (edited)" };
const restored = atprotoToActivityPub.put(modifiedNote, complement);
// restored:
// {
//   text: "Hello from Bluesky! (edited)",
//   createdAt: "2025-12-01T12:00:00Z",
//   authorDid: "did:plc:abc123",
//   likeCount: 42,       ← restored from complement
//   repostCount: 7,      ← restored from complement
// }

likeCount and repostCount survive the round trip even though ActivityPub has no concept of them. The complement held onto the ATProto-specific data, and put restored it. The edit to content propagated back as an edit to text. No data was lost; no data was invented.

This is the pattern that makes bidirectional protocol bridges viable. Without the complement, you would need to store the original ATProto record somewhere and diff against it on the return trip, which amounts to reinventing the lens machinery by hand.

Exercise

Is there a schema change where the complement is larger than the source data? If so, construct one. If not, explain why not.

Solution

Yes. Consider a migration that wraps every leaf field in a new container object: wrapInObject("name", "nameWrapper") applied to every field. The complement must store the wrapper structure for each field so that put can unwrap them. If the schema has many fields and each wrapper adds structural metadata, the complement’s structural overhead can exceed the original flat data. In practice this is rare, but theoretically the complement is proportional to what the migration adds, not just what it removes.

7.7 Complement serialization

The complement must be stored somewhere between get and put. panproto serializes it as CBOR (Concise Binary Object Representation), which is compact and fast to parse.

For a typical schema evolution—removing a few fields from a 20-field schema—the serialized complement is a few hundred bytes per record. For bulk migrations, complements can be batched into a single CBOR array, compressed with zstd, and stored alongside the migrated data.

The storage strategy depends on the application. An ephemeral complement lives in memory during a request-response cycle: translate on the way in, translate back on the way out. A persistent complement lives in a database column or object store, enabling deferred round-tripping: migrate data forward today, receive modifications over time, migrate back later. A streaming complement attaches to a message header in a pipeline (Kafka, NATS), so downstream consumers can round-trip without access to the original source.

Because the complement consists mostly of dropped field values, which are often repetitive (many records with likeCount = 0 or displayName = null), it compresses well. In benchmarks on ATProto data, zstd compression reduces complement size by 85–95%.

7.8 How lenses relate to the restrict pipeline

The restrict pipeline from the previous chapters is get without complement capture. panproto implements get by running the restrict pipeline and simultaneously building the complement at each step: step 1 records pruned nodes, step 2 records dropped arcs, step 3 records original parents, step 4 records contraction choices, step 5 records pruned fan children.

The complement is built during the forward pass, not after. The restrict pipeline makes a single pass over the instance tree, and complement capture adds only constant overhead per node.

Exercise

Suppose we swapped the order of steps 3 and 4 in the restrict pipeline. Would the complement still produce a correct round-trip?

Solution

No. Step 3 (ancestor contraction) determines which parent-child pairs need new edge labels. Step 4 (edge resolution) assigns those labels. If you resolve edges before contracting ancestors, you assign labels to the original (pre-contraction) tree, where the parent-child relationships are different. The complement recorded during a swapped-order get would encode the wrong structural associations, and put would reconstruct an incorrect tree. The pipeline’s correctness depends on each step’s complement being recorded in the context established by all previous steps.

7.9 Batch mode and incremental mode

Lenses operate in two modes. In batch mode, get and put operate on whole instances: you hand get a complete record and receive a complete view plus complement. This is the mode described throughout this chapter, and it is the right tool for format conversion and VCS migration.

In incremental mode, get_edit and put_edit operate on individual edits (patches): you hand get_edit a single field update or node insertion, and it returns the corresponding edit in the view schema, updating the complement as a state machine. Incremental mode is the right tool for live synchronization, where re-migrating the entire dataset on every edit would be too expensive. The chapter on edit lenses covers this in detail.

7.10 Further reading

The foundational paper on asymmetric lenses for bidirectional programming is Foster et al. (2007), which treats the string case (regular-expression-based transformations) but whose abstract framework applies directly to trees and graphs. Johnson and Rosebrugh (2013) generalized both asymmetric and symmetric lenses to the delta lens framework, connecting lenses to fibrations and indexed categories. Litt et al. (2022) is the practical inspiration for panproto’s combinator API; Cambria demonstrated that lens combinators work for real schema evolution. panproto extends Cambria’s approach from flat JSON schemas to recursive, polynomial-functor schemas.¹

The next chapter wires all of this into CI: classifying schema changes as fully compatible, backward compatible, or breaking, and enforcing the classification in a pull request workflow.

Clarke, Bryce. 2020. “Internal Lenses as Functors and Cofunctors.” Applied Category Theory 2019, Electronic proceedings in theoretical computer science, vol. 323: 183–95. https://doi.org/10.4204/EPTCS.323.13.

Foster, J. Nathan, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. “Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-Update Problem.” ACM Transactions on Programming Languages and Systems 29 (3): Article 17. https://doi.org/10.1145/1232420.1232424.

Johnson, Michael, and Robert Rosebrugh. 2013. “Delta Lenses and Opfibrations.” Proceedings of the 2nd International Workshop on Bidirectional Transformations (BX 2013), Electronic communications of the EASST, vol. 57. https://doi.org/10.14279/tuj.eceasst.57.875.

Litt, Geoffrey, Martin Kleppmann, and Marc Shapiro. 2022. Project Cambria: Schema Evolution for CRDTs. Ink & Switch research essay. https://www.inkandswitch.com/cambria/.

Tracking complements through lens composition gives the system the structure of a fibration, a mathematical framework for propagating “extra data” through a pipeline. See Appendix A for the formal treatment (Johnson and Rosebrugh 2013; Clarke 2020).↩︎

--- title: "Bidirectional Migration with Lenses" id: sec-lenses --- # Bidirectional Migration with Lenses {#sec-lenses} Everything so far has been one-directional: given a v2 record, produce a v1 record. The `tags` field disappears, and that is that. But what if you need to go *back*? A Mastodon bridge translates a Bluesky post into ActivityPub format, a user edits the translated version, and the bridge needs to reconstruct a Bluesky post from the edited result. The naive approach is to re-translate from scratch, but that loses every ATProto-specific field that ActivityPub has no concept of---like counts, reply threading, and labels. A *lens* solves this by recording what the forward translation discarded, so the backward translation can restore it. Think of it as a store receipt: when you buy an item (`get`), you receive the item plus a receipt. When you return the item (`put`), you hand back the item and the receipt, and the store restores its original state. ## The lens abstraction An asymmetric lens $\ell\colon S \rightleftarrows V$ consists of two operations: $$\mathtt{get}\colon S \to (V, C) \qquad \mathtt{put}\colon (V, C) \to S$$ The source $S$ is the full data (a Bluesky post), the view $V$ is the projected data (an ActivityPub note), and the complement $C$ is the receipt---everything `get` discarded so `put` can reconstruct the original. These two operations must satisfy two laws. **GetPut (round-tripping).** Getting and then putting back without modifying the view recovers the original source exactly: $$\mathtt{put}(\mathtt{get}(s)) = s.$$ No information is lost. **PutGet (view consistency).** Putting a view into the source and then getting it back returns the view you put in: $$\pi_1(\mathtt{get}(\mathtt{put}(v, c))) = v.$$ The view is faithfully represented in the reconstructed source. Together, these laws say that `get` and `put` are mutual inverses up to the complement. The complement is the slack that allows the source to carry more information than the view without violating invertibility. :::{.callout-note} ## Why "lens"? The name comes from the idea of focusing. A lens focuses on a part of a larger structure (the view) while keeping the rest (the complement) available for reconstruction. Like an optical lens, it lets you see a specific slice while the surrounding context remains intact but out of focus. ::: ## The complement as data receipt {#sec-complement} The complement is the central data structure. It records exactly what `get` discarded, organized by pipeline step. ```rust pub struct Complement { /// nodes pruned in step 1 (anchor not surviving) pub dropped_nodes: HashMap<u32, Node>, /// arcs connecting pruned nodes to surviving nodes pub dropped_arcs: Vec<(u32, u32, Edge)>, /// edge labels chosen by the resolver in step 4 pub contraction_choices: HashMap<(u32, u32), Vec<VertexId>>, /// original parent of each contracted node (before re-parenting) pub original_parent: HashMap<u32, u32>, /// fan children pruned in step 5 (hypergraph schemas only) pub pruned_fan_children: HashMap<FanId, Vec<u32>>, } ``` Each field corresponds to a step of the restrict pipeline from the previous chapters: | Complement field | Pipeline step | What it records | |------------------|---------------|-----------------| | `dropped_nodes` | Step 1: `anchor_surviving` | Nodes whose anchor was pruned | | `dropped_arcs` | Step 2: `reachable_from_root` | Edges to unreachable nodes | | `contraction_choices` | Steps 3–4: contraction + resolution | The intermediate path that was collapsed | | `original_parent` | Step 3: `ancestor_contraction` | Original parent before re-parenting | | `pruned_fan_children` | Step 5: `reconstruct_fans` | Fan children that were removed | The complement's size is proportional to what the migration discards. A migration that removes two leaf fields from a ten-field schema produces a complement roughly 20% the size of the source. A migration that projects away 90% of the schema produces a large complement. Application code should treat the complement as opaque. Its internal structure is an implementation detail of the pipeline. The only valid operation on a complement is passing it to `put`. ::: {.callout-caution} ## Exercise The complement for `removeField` stores the removed value. What does the complement for `renameField` store? ::: ::: {.callout-tip collapse=true} ## Solution Nothing. A rename is a bijection: given the old name and the new name, you can always reconstruct the original field label without storing anything. The `put` operation renames back. This is why renames are classified as lossless. ::: ## Lenses in panproto In panproto, `get` is the restrict pipeline plus complement capture. `put` is reconstruction from the complement. The `get` direction runs the five-step W-type restrict pipeline (or functor restrict for SQL) and simultaneously records what was removed at each step. The `put` direction replays the removals in reverse: restore pruned fans, re-expand contracted ancestors, re-attach unreachable nodes, and restore pruned anchors. ```rust pub fn put(view: &WTree, complement: &Complement) -> WTree { let mut tree = view.clone(); // step 5 inverse: restore pruned fan children restore_fan_children(&mut tree, &complement.pruned_fan_children); // step 3 inverse: re-expand contracted paths expand_contractions(&mut tree, &complement.original_parent, &complement.contraction_choices); // step 2 inverse: re-attach unreachable nodes reattach_unreachable(&mut tree, &complement.dropped_arcs); // step 1 inverse: restore pruned nodes restore_nodes(&mut tree, &complement.dropped_nodes); tree } ``` The lens laws hold because the complement records *exactly* what `get` discarded. `put` has enough information to invert each step, and the sequential structure of the pipeline means the inversions compose correctly. If the view has been modified (a field value changed, say), the modification is preserved in the reconstructed source. The complement fills in only the *missing* parts; it does not overwrite the view's content. This is what makes lenses useful for bidirectional synchronization, not just round-tripping. ## Cambria-style combinators {#sec-combinators} Building migrations by hand---specifying vertex maps, edge maps, and resolver tables---is precise but tedious. For common schema changes, panproto provides *combinators*: small, pre-built lenses that you compose into a pipeline. The design follows the Cambria project's lens combinators [@litt2022], adapted to panproto's categorical framework. Each combinator is a well-behaved lens (it satisfies GetPut and PutGet). Composition of well-behaved lenses is a well-behaved lens. So a pipeline of combinators automatically satisfies the lens laws. ```{.typescript include="../code/ts/08-lens-combinators.ts" snippet="combinators"} ``` ### The combinator catalog | Combinator | Forward ($\mathrm{get}$) | Backward ($\mathrm{put}$) | |------------|-----------------|-------------------| | `renameField(old, new)` | Rename field in output | Rename back | | `addField(name, default)` | Add field with default value | Remove the field | | `removeField(name)` | Remove the field | Restore from complement | | `wrapInObject(field, wrapper)` | Nest field inside a new object | Unwrap | | `hoistField(path)` | Move a nested field up one level | Move back down | | `coerceType(field, from, to)` | Convert value (e.g., string → int) | Convert back | : Schema-change combinators. All are lossless except `removeField`. {#tbl-original-combinators} panproto also provides naming combinators that operate on the nine naming sites described in the chapter on names: | Combinator | What it renames | Cascades? | |------------|----------------|-----------| | `renameVertex(old, new)` | Vertex ID | Yes: edges, constraints, variants, hyper-edges | | `renameKind(vertex, kind)` | Single vertex's kind | No | | `renameEdgeKind(old, new)` | All edges with matching kind | No | | `renameNsid(vertex, nsid)` | Namespace identifier | No | | `renameConstraintSort(old, new)` | Constraint sort name | No | | `applyTheoryMorphism(sortMap, opMap)` | Vertex kinds + edge kinds | Yes, via theory morphism | | `rename(site, old, new)` | Any naming site | Depends on site | : Naming combinators. All are lossless (empty complement). {#tbl-naming-combinators} Each combinator handles its own complement. `removeField` stores the removed value. `addField` stores nothing (the default is known). `renameField` stores nothing (the mapping is invertible). `wrapInObject` stores the wrapper structure. ### Composing combinators Combinators compose via `pipeline()`: ```typescript const migration = pipeline( renameField("userName", "handle"), removeField("internalId"), addField("version", 2), hoistField("profile.displayName"), ); ``` The pipeline applies combinators left to right for `get` and right to left for `put`. The complement is a stack: each combinator pushes its complement fragment during `get`, and `put` pops them in reverse order. Mathematically, this is function composition in the lens category. If $\ell_1\colon S \rightleftarrows T$ and $\ell_2\colon T \rightleftarrows V$, the composite $\ell_2 \circ \ell_1\colon S \rightleftarrows V$ has: $$\mathtt{get}_{2 \circ 1}(s) = \mathtt{get}_2(\pi_1(\mathtt{get}_1(s)))$$ $$\mathtt{put}_{2 \circ 1}(v, (c_1, c_2)) = \mathtt{put}_1(\mathtt{put}_2(v, c_2), c_1)$$ The complement of the composite is the pair $(c_1, c_2)$. ## Worked example: get and put {#sec-get-put-example} ```{.typescript include="../code/ts/08-lens-combinators.ts" snippet="get-put"} ``` The critical observation: `restored.data.legacyField` is `"old-data"` even though the view never contained it. The complement remembered it, and `put` restored it. Meanwhile, `restored.data.name` is `"Alice (updated)"` (from the modified view), not `"Alice"` (from the original source). The lens respected the view modification. GetPut guarantees that if we had not modified the view, `put(get(source))` would have returned the original source exactly. PutGet guarantees the converse: getting from the put result returns a view with `name = "Alice (updated)"`. ## A real-world example: a Mastodon-to-Bluesky bridge {#sec-bridge-example} Consider the interoperability problem from the introduction: you are building a bridge between Bluesky (ATProto) and Mastodon (ActivityPub). A Bluesky post arrives in camelCase ATProto format; the Mastodon side expects snake_case ActivityPub. ```typescript // ATProto post → ActivityPub Note const atprotoToActivityPub = pipeline( renameField("text", "content"), renameField("createdAt", "published"), renameField("authorDid", "attributed_to"), removeField("likeCount"), // ATProto-only; complement remembers it removeField("repostCount"), // ATProto-only; complement remembers it addField("@context", "https://www.w3.org/ns/activitystreams"), addField("type", "Note"), wrapInObject("attributed_to", "actor"), ); // a Bluesky post arrives const bskyPost = { text: "Hello from Bluesky!", createdAt: "2025-12-01T12:00:00Z", authorDid: "did:plc:abc123", likeCount: 42, repostCount: 7, }; // forward: translate to ActivityPub const { view: apNote, complement } = atprotoToActivityPub.get(bskyPost); // apNote: // { // content: "Hello from Bluesky!", // published: "2025-12-01T12:00:00Z", // actor: { attributed_to: "did:plc:abc123" }, // "@context": "https://www.w3.org/ns/activitystreams", // type: "Note", // } // a Mastodon user replies; the bridge modifies the note and sends it back const modifiedNote = { ...apNote, content: "Hello from Bluesky! (edited)" }; const restored = atprotoToActivityPub.put(modifiedNote, complement); // restored: // { // text: "Hello from Bluesky! (edited)", // createdAt: "2025-12-01T12:00:00Z", // authorDid: "did:plc:abc123", // likeCount: 42, ← restored from complement // repostCount: 7, ← restored from complement // } ``` `likeCount` and `repostCount` survive the round trip even though ActivityPub has no concept of them. The complement held onto the ATProto-specific data, and `put` restored it. The edit to `content` propagated back as an edit to `text`. No data was lost; no data was invented. This is the pattern that makes bidirectional protocol bridges viable. Without the complement, you would need to store the original ATProto record somewhere and diff against it on the return trip, which amounts to reinventing the lens machinery by hand. ::: {.callout-caution} ## Exercise Is there a schema change where the complement is larger than the source data? If so, construct one. If not, explain why not. ::: ::: {.callout-tip collapse=true} ## Solution Yes. Consider a migration that wraps every leaf field in a new container object: `wrapInObject("name", "nameWrapper")` applied to every field. The complement must store the wrapper structure for each field so that `put` can unwrap them. If the schema has many fields and each wrapper adds structural metadata, the complement's structural overhead can exceed the original flat data. In practice this is rare, but theoretically the complement is proportional to what the migration *adds*, not just what it removes. ::: ## Complement serialization The complement must be stored somewhere between `get` and `put`. panproto serializes it as CBOR (Concise Binary Object Representation), which is compact and fast to parse. For a typical schema evolution---removing a few fields from a 20-field schema---the serialized complement is a few hundred bytes per record. For bulk migrations, complements can be batched into a single CBOR array, compressed with zstd, and stored alongside the migrated data. The storage strategy depends on the application. An *ephemeral complement* lives in memory during a request-response cycle: translate on the way in, translate back on the way out. A *persistent complement* lives in a database column or object store, enabling deferred round-tripping: migrate data forward today, receive modifications over time, migrate back later. A *streaming complement* attaches to a message header in a pipeline (Kafka, NATS), so downstream consumers can round-trip without access to the original source. Because the complement consists mostly of dropped field values, which are often repetitive (many records with `likeCount = 0` or `displayName = null`), it compresses well. In benchmarks on ATProto data, zstd compression reduces complement size by 85–95%. ## How lenses relate to the restrict pipeline The restrict pipeline from the previous chapters is `get` without complement capture. panproto implements `get` by running the restrict pipeline and simultaneously building the complement at each step: step 1 records pruned nodes, step 2 records dropped arcs, step 3 records original parents, step 4 records contraction choices, step 5 records pruned fan children. The complement is built *during* the forward pass, not after. The restrict pipeline makes a single pass over the instance tree, and complement capture adds only constant overhead per node. ::: {.callout-caution} ## Exercise Suppose we swapped the order of steps 3 and 4 in the restrict pipeline. Would the complement still produce a correct round-trip? ::: ::: {.callout-tip collapse=true} ## Solution No. Step 3 (ancestor contraction) determines which parent-child pairs need new edge labels. Step 4 (edge resolution) assigns those labels. If you resolve edges before contracting ancestors, you assign labels to the original (pre-contraction) tree, where the parent-child relationships are different. The complement recorded during a swapped-order `get` would encode the wrong structural associations, and `put` would reconstruct an incorrect tree. The pipeline's correctness depends on each step's complement being recorded in the context established by all previous steps. ::: ## Batch mode and incremental mode Lenses operate in two modes. In *batch mode*, `get` and `put` operate on whole instances: you hand `get` a complete record and receive a complete view plus complement. This is the mode described throughout this chapter, and it is the right tool for format conversion and VCS migration. In *incremental mode*, `get_edit` and `put_edit` operate on individual edits (patches): you hand `get_edit` a single field update or node insertion, and it returns the corresponding edit in the view schema, updating the complement as a state machine. Incremental mode is the right tool for live synchronization, where re-migrating the entire dataset on every edit would be too expensive. The chapter on edit lenses covers this in detail. ## Further reading The foundational paper on asymmetric lenses for bidirectional programming is @foster2007, which treats the string case (regular-expression-based transformations) but whose abstract framework applies directly to trees and graphs. @johnson2013 generalized both asymmetric and symmetric lenses to the delta lens framework, connecting lenses to fibrations and indexed categories. @litt2022 is the practical inspiration for panproto's combinator API; Cambria demonstrated that lens combinators work for real schema evolution. panproto extends Cambria's approach from flat JSON schemas to recursive, polynomial-functor schemas.^[Tracking complements through lens composition gives the system the structure of a fibration, a mathematical framework for propagating "extra data" through a pipeline. See [Appendix A](../appendices/A-formal-foundations.qmd) for the formal treatment [@johnson2013; @clarke2020].] The next chapter wires all of this into CI: classifying schema changes as fully compatible, backward compatible, or breaking, and enforcing the classification in a pull request workflow.