17 Automatic Lens Generation

You have two schema versions. You want a lens to migrate data between them. You could write the lens by hand, or you could ask the system: “What changed? Generate the lens automatically.” That’s automatic lens generation.

The process runs in five stages, transforming two schemas into a concrete lens. The first three stages analyze structure alone (schema-independent). The last two take a specific schema and produce a concrete lens (schema-specific). Understanding this boundary is crucial for knowing when auto-generation works, when it needs help, and when it fails.

Consider the call ProtolensChainHandle.autoGenerate(oldSchema, newSchema) from Section 16.9. One call runs all five stages under the hood.

17.1 The five-stage pipeline

Given two schemas, panproto generates a lens in these steps:

Schemas -> Morphism -> Factorization -> Protolens Chain -> Instantiation -> Lens
  (1)       (2)          (3)              (4)               (5)

Morphism discovery. Find a structure-preserving map between schemas (the homomorphism search from Chapter 13).
Morphism analysis. Classify the morphism: which vertices are renamed, added, removed, or restructured?
Factorization. Decompose the morphism into a sequence of elementary schema transforms from Table 16.1.
Protolens chain construction. Convert each schema transform into its corresponding elementary protolens.
Instantiation. Apply the protolens chain to a specific schema to produce a concrete lens.

Steps 1 through 3 depend only on the two reference schemas. They produce a protolens chain, reusable across many schemas. Steps 4 and 5 depend on the target schema and produce a concrete lens with concrete get/put operations and a concrete complement type.

This separation is why protolenses scale: steps 1 through 3 run once; steps 4 and 5 run per schema.

import { ProtolensChainHandle, LensHandle } from "@panproto/core";

// Steps 1-3: schema-independent (run once)
const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema);

// Steps 4-5: schema-specific (run per schema)
const lensA: LensHandle = chain.instantiate(schemaA);
const lensB: LensHandle = chain.instantiate(schemaB);
const lensC: LensHandle = chain.instantiate(schemaC);

When does auto-generation need help?

If the morphism is ambiguous (multiple plausible mappings with similar scores), auto-generation may pick the wrong one. The chain.incomplete property lists steps with uncertain factorizations. Ambiguity warnings (two morphisms within 5% of each other) signal that the engine’s top choice may not match your intent. Provide manual hints via the hints parameter to disambiguate.

17.2 Factorization: decomposing a morphism

The core algorithmic step is factorization: decomposing a morphism into elementary schema transforms. Think of it as factoring an integer into primes. The result is a canonical (or near-canonical) sequence of simple operations whose composition equals the original morphism.

panproto uses a greedy algorithm with four passes:

Rename pass. Identify vertices and edges that changed names but preserved structure. Emit RenameVertex and RenameEdge transforms.
Removal pass. Identify vertices in the source that have no image in the target. Emit RemoveVertex and RemoveEdge transforms.
Addition pass. Identify vertices in the target that have no preimage in the source. Emit AddVertex and AddEdge transforms.
Restructuring pass. Identify structural changes (nesting, type coercion, splitting, merging). Emit WrapVertex, HoistVertex, CoerceType, MergeVertices, or SplitVertex transforms.

Order matters. Renames happen first (so subsequent passes see canonical names), then removals (so additions don’t conflict with existing vertices), then additions, finally restructuring.

pub fn factorize(morphism: &Morphism) -> Vec<TheoryTransform> {
    let mut steps = Vec::new();
    steps.extend(extract_renames(morphism));
    steps.extend(extract_removals(morphism));
    steps.extend(extract_additions(morphism));
    steps.extend(extract_restructurings(morphism));
    steps
}

17.3 Handling ambiguity

Multiple valid morphisms may exist. Multiple valid decompositions may fit a single morphism.

17.3.1 Multiple morphisms

When find_morphisms returns several results, panproto scores them using the quality metric from Chapter 13:

Name similarity: Edit distance between mapped vertex IDs.
Edge preservation: Fraction of edges that keep the same label.
Structural preservation: Fraction of the schema that is unchanged.

The highest-scoring morphism wins. If the top two are within 5% of each other, panproto warns:

warning: ambiguous morphism
  morphism 1 (score 0.87): maps author -> creator, post -> post
  morphism 2 (score 0.84): maps author -> post.author, post -> post
  using morphism 1 (highest score)
  hint: use --morphism-index 2 to select the alternative

17.3.2 Multiple factorizations

A single morphism may factor in more than one way. Renaming a to b then removing c gives the same result as removing c then renaming a to b (when they’re independent). panproto canonicalizes by applying passes in fixed order: renames, removals, additions, restructurings. Within each pass, operations are sorted by vertex ID for determinism.

For genuinely ambiguous restructurings (where the same morphism could be factored as “wrap then hoist” or “hoist then wrap”), panproto prefers the factorization with fewer steps.

17.4 Complement specs: filling the holes

Some schema transforms require information the morphism alone doesn’t provide:

AddVertex needs a default value. What should version default to?
CoerceType needs a coercion function. How do you convert age: string to age: int?
MergeVertices needs merge and split functions. How do you combine firstName and lastName?

These are complement specs: user-provided parameters that complete the protolens. Without them, auto-generation produces a partial chain with “holes”:

const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema);

console.log(chain.incomplete);
// [
//   { step: 3, type: "AddVertex", id: "version", missing: "default" },
//   { step: 5, type: "CoerceType", id: "age", missing: "coercion" },
// ]

// Fill the holes
chain.setDefault("version", 1);
chain.setCoercion("age", {
  forward: (s: string) => parseInt(s, 10),
  backward: (n: number) => n.toString(),
});

In Python:

chain = ProtolensChainHandle.auto_generate(old_schema, new_schema)

print(chain.incomplete)
# [
#   ComplementHole(step=3, type="AddVertex", id="version", missing="default"),
#   ComplementHole(step=5, type="CoerceType", id="age", missing="coercion"),
# ]

# Fill the holes
chain.set_default("version", 1)
chain.set_coercion("age",
    forward=lambda s: int(s),
    backward=lambda n: str(n),
)

Attempting to instantiate an incomplete chain raises an error listing the missing specs.

What if two steps both need a default for the same vertex?

If two steps both require a default for the same vertex (e.g., because it was removed and re-added), does one spec override the other, or does each step carry independent defaults?

Answer

Each step carries its own complement spec independently. If two steps both require a default for the same vertex (because the vertex was removed and re-added), both defaults are stored separately. The first step’s default is used during its put pass, and the second step’s during its own. The chain doesn’t enforce global uniqueness of defaults across steps; each operates on the schema as it exists at that point in the pipeline.

17.5 The fallback: overlap-based alignment

When two schemas are too different for morphism discovery to find a meaningful map (different vertex kinds, different edge structures, different theories), panproto falls back to overlap-based alignment via discover_overlap (from Section 13.4).

Instead of finding a full morphism, it finds the largest shared sub-schema and builds a symmetric lens through the overlap:

const chain = ProtolensChainHandle.autoGenerate(schemaA, schemaB);
// If no morphism found, falls back to overlap

console.log(chain.strategy);
// "overlap" (instead of "morphism")

console.log(chain.overlap);
// { shared: ["post", "author", "text"], aOnly: ["likeCount"], bOnly: ["published"] }

The overlap strategy produces a protolens chain with RemoveVertex steps for A-only fields and AddVertex steps for B-only fields. It’s less precise than morphism-based generation (it cannot detect renames or restructurings), but it works even when the schemas have no common theory.

17.6 Reusing chains across schemas

Since chain construction and instantiation are separate operations, a single chain applies to many schemas.

17.6.1 Platform-wide transformations

A platform operator defines a protolens chain once and applies it to every user schema:

// Define the transformation
const snakeCaseChain = ProtolensChainHandle.fromSteps([
  Protolens.renameVertex("createdAt", "created_at"),
  Protolens.renameVertex("updatedAt", "updated_at"),
  Protolens.renameVertex("deletedAt", "deleted_at"),
]);

// Apply to all schemas
for (const schema of await Panproto.listSchemas()) {
  try {
    const lens = snakeCaseChain.instantiate(schema);
    await migrateAllRecords(schema, lens);
  } catch (e) {
    // Schema doesn't have these fields; skip
  }
}

17.6.2 Version-controlled chains

Protolens chains are serializable. Store them in the schematic version control system (Chapter 10) alongside schema versions:

$ prot protolens v1.json v2.json --save chain-v1-v2.json
$ prot protolens v2.json v3.json --save chain-v2-v3.json

# Compose chains
$ prot protolens --compose chain-v1-v2.json chain-v2-v3.json --save chain-v1-v3.json

17.7 VCS integration

The prot lens-diff command shows the protolens chain between two schema versions:

$ prot lens-diff HEAD~1..HEAD

Schema changed: post-schema
  Protolens chain (2 steps):
    1. RenameVertex("userName" -> "handle")
    2. AddVertex("version", default: 1)
  Complement spec needed: none (all defaults available)
  Auto-instantiation: ready

Schema changed: comment-schema
  Protolens chain (1 step):
    1. RemoveVertex("legacyThreadId")
  Complement spec needed: none
  Auto-instantiation: ready

Combined with prot convert, you get end-to-end automatic migration:

# Migrate all data from the previous version to the current version
$ prot lens-diff HEAD~1..HEAD --apply --data ./records/

For multi-version jumps, compose chains across multiple commits:

# Compose protolens chains across three versions
$ prot lens-diff HEAD~3..HEAD

Composed protolens chain (5 steps):
  1. RenameVertex("userName" -> "handle")      [from v1->v2]
  2. AddVertex("version", default: 1)          [from v1->v2]
  3. RemoveVertex("legacyThreadId")            [from v2->v3]
  4. RenameEdge("hasAuthor" -> "hasCreator")    [from v3->v4]
  5. CoerceType("age", string -> int)           [from v3->v4]

17.8 Worked example: ATProto schema evolution

Three schema versions, based on real Bluesky evolution patterns.

17.8.1 Version 1 (original)

{
  "vertices": ["post", "text", "createdAt", "author", "likeCount"],
  "edges": [
    { "from": "post", "to": "text", "kind": "prop" },
    { "from": "post", "to": "createdAt", "kind": "prop" },
    { "from": "post", "to": "author", "kind": "ref" },
    { "from": "post", "to": "likeCount", "kind": "prop" }
  ]
}

17.8.2 Version 2 (rename + add)

createdAt becomes created_at; version field added.

const chain_v1_v2 = ProtolensChainHandle.autoGenerate(v1, v2);
// Steps:
//   1. RenameVertex("createdAt" -> "created_at")
//   2. AddVertex("version", default: 1)

17.8.3 Version 3 (remove + restructure)

likeCount removed; author split into author.did and author.handle.

const chain_v2_v3 = ProtolensChainHandle.autoGenerate(v2, v3);
// Steps:
//   1. RemoveVertex("likeCount")
//   2. SplitVertex("author", into: ["author.did", "author.handle"])
// Complement spec needed: split function for "author"

chain_v2_v3.setSplit("author", {
  split: (author) => ({ did: author.did, handle: author.handle }),
  merge: (did, handle) => ({ did, handle }),
});

17.8.4 Full pipeline: v1 to v3

// Compose the two chains
const chain_v1_v3 = chain_v1_v2.compose(chain_v2_v3);
// Steps:
//   1. RenameVertex("createdAt" -> "created_at")
//   2. AddVertex("version", default: 1)
//   3. RemoveVertex("likeCount")
//   4. SplitVertex("author", into: ["author.did", "author.handle"])

// Instantiate and migrate
const lens = chain_v1_v3.instantiate(v1);
const { view, complement } = lens.get(v1Data);
// view has: post, text, created_at, version, author.did, author.handle
// complement has: likeCount value, original author value

The complement from step 3 (RemoveVertex("likeCount")) stores the like count. The complement from step 4 (SplitVertex) stores the original author value for merge on put. If the view is modified and put is called, both complements reconstruct the v1 data with modifications applied.

17.9 Fusion: composing transforms lazily

Multi-step chains can be fused into a single protolens via fuse(). Fusion composes the endofunctors of all steps into a single TheoryTransform::Compose tree, so instantiate() applies one combined transform instead of materializing intermediate schemas.

const chain = ProtolensChainHandle.autoGenerate(v1, v3);
// chain has 4 steps

const fused = chain.fuse();
// fused is a single Protolens with a composed transform

const lens = fused.instantiate(schema);
// One schema transform, no intermediate materialization

chain = ProtolensChainHandle.auto_generate(v1, v3)
fused = chain.fuse()
lens = fused.instantiate(schema)

When you call instantiate() on a multi-step ProtolensChain, it automatically fuses before instantiation. Calling fuse() explicitly is useful when you want to inspect the composed transform or serialize the fused protolens separately.

For long chains (10+ steps), fusion avoids allocating and discarding intermediate schemas. The performance benefit scales linearly with chain length.

17.10 Computing complement requirements

chain.requirements() computes the full complement specification for a chain, tracking intermediate schema state through each step. For removal steps, the spec reports DataCaptured fields. For addition steps, it reports DefaultsRequired entries via the AddedElement complement variant.

const spec = chain.requirements(schema, protocol);

for (const entry of spec.entries) {
  if (entry.type === "DataCaptured") {
    console.log(`Step ${entry.step}: captures ${entry.fields.length} fields`);
  } else if (entry.type === "DefaultsRequired") {
    console.log(`Step ${entry.step}: needs ${entry.defaults.length} defaults`);
  }
}

The complement is schema-dependent: the same protolens chain applied to two different schemas can produce different complement counts. A removeVertex("metadata") step captures 2 fields when the schema has 2 metadata edges, and 5 fields when it has 5.

spec_a = chain.requirements(schema_a, protocol)
spec_b = chain.requirements(schema_b, protocol)

# Same chain, different schemas, different counts
assert spec_a.captured_count != spec_b.captured_count

Why are complements schema-dependent?

A single removeVertex("metadata") protolens captures different numbers of fields depending on the schema. If you’re reusing a chain across dozens of schemas, how do you predict total complement storage cost?

Answer

Because the protolens operates on the schema’s structure, and different schemas have different structures. A removeVertex("metadata") protolens captures whatever edges metadata has in the specific schema. Schema A might have 2 metadata edges; schema B might have 5. To predict total complement storage cost before migration, call chain.requirements(schema, protocol) on each schema; it reports the exact complement size without performing the migration.

17.11 When auto-generation fails

Auto-generation is not magic. It fails predictably.

17.11.1 No morphism exists

If the schemas are too different (different theories, no shared structure), find_morphisms returns empty. panproto falls back to overlap-based alignment (Section 17.5), but the resulting lens may be lossy in both directions.

17.11.2 Ambiguous restructuring

When a vertex could map to multiple targets (e.g., name could map to displayName or userName), the quality score may not disambiguate. Provide manual disambiguation:

const chain = ProtolensChainHandle.autoGenerate(old, new, {
  hints: { "name": "displayName" },  // manual disambiguation
});

17.11.3 Non-elementary transformations

Some schema changes cannot decompose into the 11 elementary schema transforms. A change that splits a vertex and renames it and changes its type in a single step may not factor cleanly. panproto generates a partial chain and reports the remainder:

warning: partial factorization
  factored: 4 of 5 changes
  unfactored: vertex "metadata" changed kind, name, and structure simultaneously
  hint: break this change into smaller steps, or write a manual lens for this step

17.11.4 Manual composition for hard cases

When auto-generation fails, compose auto-generated steps with a manual protolens for the difficult part:

// Auto-generate what we can
const autoChain = ProtolensChainHandle.autoGenerate(old, new, {
  exclude: ["metadata"],  // skip the hard part
});

// Write the hard part manually
const manualStep = Protolens.custom({
  precondition: (schema) => schema.hasVertex("metadata"),
  forward: (schema) => { /* custom transformation */ },
  backward: (schema) => { /* custom inverse */ },
});

// Compose
const fullChain = autoChain.append(manualStep);

# Automatic Lens Generation {#sec-auto-lens} You have two schema versions. You want a lens to migrate data between them. You could write the lens by hand, or you could ask the system: "What changed? Generate the lens automatically." That's automatic lens generation. The process runs in five stages, transforming two schemas into a concrete lens. The first three stages analyze *structure* alone (schema-independent). The last two take a specific schema and produce a concrete lens (schema-specific). Understanding this boundary is crucial for knowing when auto-generation works, when it needs help, and when it fails. Consider the call `ProtolensChainHandle.autoGenerate(oldSchema, newSchema)` from @sec-protolens-ts. One call runs all five stages under the hood. ## The five-stage pipeline {#sec-full-pipeline} Given two schemas, panproto generates a lens in these steps: ``` Schemas -> Morphism -> Factorization -> Protolens Chain -> Instantiation -> Lens (1) (2) (3) (4) (5) ``` 1. **Morphism discovery.** Find a structure-preserving map between schemas (the homomorphism search from @sec-auto-migration). 2. **Morphism analysis.** Classify the morphism: which vertices are renamed, added, removed, or restructured? 3. **Factorization.** Decompose the morphism into a sequence of elementary schema transforms from @tbl-schema-transforms. 4. **Protolens chain construction.** Convert each schema transform into its corresponding elementary protolens. 5. **Instantiation.** Apply the protolens chain to a specific schema to produce a concrete lens. Steps 1 through 3 depend only on the two reference schemas. They produce a *protolens chain*, reusable across many schemas. Steps 4 and 5 depend on the target schema and produce a *concrete lens* with concrete `get`/`put` operations and a concrete complement type. This separation is why protolenses scale: steps 1 through 3 run once; steps 4 and 5 run per schema. ```{.typescript} import { ProtolensChainHandle, LensHandle } from "@panproto/core"; // Steps 1-3: schema-independent (run once) const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema); // Steps 4-5: schema-specific (run per schema) const lensA: LensHandle = chain.instantiate(schemaA); const lensB: LensHandle = chain.instantiate(schemaB); const lensC: LensHandle = chain.instantiate(schemaC); ``` ::: {.callout-caution} ## When does auto-generation need help? If the morphism is ambiguous (multiple plausible mappings with similar scores), auto-generation may pick the wrong one. The `chain.incomplete` property lists steps with uncertain factorizations. Ambiguity warnings (two morphisms within 5% of each other) signal that the engine's top choice may not match your intent. Provide manual hints via the `hints` parameter to disambiguate. ::: ## Factorization: decomposing a morphism {#sec-factorization} The core algorithmic step is **factorization**: decomposing a morphism into elementary schema transforms. Think of it as factoring an integer into primes. The result is a canonical (or near-canonical) sequence of simple operations whose composition equals the original morphism. panproto uses a greedy algorithm with four passes: 1. **Rename pass.** Identify vertices and edges that changed names but preserved structure. Emit `RenameVertex` and `RenameEdge` transforms. 2. **Removal pass.** Identify vertices in the source that have no image in the target. Emit `RemoveVertex` and `RemoveEdge` transforms. 3. **Addition pass.** Identify vertices in the target that have no preimage in the source. Emit `AddVertex` and `AddEdge` transforms. 4. **Restructuring pass.** Identify structural changes (nesting, type coercion, splitting, merging). Emit `WrapVertex`, `HoistVertex`, `CoerceType`, `MergeVertices`, or `SplitVertex` transforms. Order matters. Renames happen first (so subsequent passes see canonical names), then removals (so additions don't conflict with existing vertices), then additions, finally restructuring. ```{.rust} pub fn factorize(morphism: &Morphism) -> Vec<TheoryTransform> { let mut steps = Vec::new(); steps.extend(extract_renames(morphism)); steps.extend(extract_removals(morphism)); steps.extend(extract_additions(morphism)); steps.extend(extract_restructurings(morphism)); steps } ``` ## Handling ambiguity {#sec-ambiguity} Multiple valid morphisms may exist. Multiple valid decompositions may fit a single morphism. ### Multiple morphisms When `find_morphisms` returns several results, panproto scores them using the quality metric from @sec-auto-migration: - **Name similarity:** Edit distance between mapped vertex IDs. - **Edge preservation:** Fraction of edges that keep the same label. - **Structural preservation:** Fraction of the schema that is unchanged. The highest-scoring morphism wins. If the top two are within 5% of each other, panproto warns: ``` warning: ambiguous morphism morphism 1 (score 0.87): maps author -> creator, post -> post morphism 2 (score 0.84): maps author -> post.author, post -> post using morphism 1 (highest score) hint: use --morphism-index 2 to select the alternative ``` ### Multiple factorizations A single morphism may factor in more than one way. Renaming `a` to `b` then removing `c` gives the same result as removing `c` then renaming `a` to `b` (when they're independent). panproto canonicalizes by applying passes in fixed order: renames, removals, additions, restructurings. Within each pass, operations are sorted by vertex ID for determinism. For genuinely ambiguous restructurings (where the same morphism could be factored as "wrap then hoist" or "hoist then wrap"), panproto prefers the factorization with fewer steps. ## Complement specs: filling the holes {#sec-complement-specs} Some schema transforms require information the morphism alone doesn't provide: - **`AddVertex`** needs a default value. What should `version` default to? - **`CoerceType`** needs a coercion function. How do you convert `age: string` to `age: int`? - **`MergeVertices`** needs merge and split functions. How do you combine `firstName` and `lastName`? These are **complement specs**: user-provided parameters that complete the protolens. Without them, auto-generation produces a partial chain with "holes": ```{.typescript} const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema); console.log(chain.incomplete); // [ // { step: 3, type: "AddVertex", id: "version", missing: "default" }, // { step: 5, type: "CoerceType", id: "age", missing: "coercion" }, // ] // Fill the holes chain.setDefault("version", 1); chain.setCoercion("age", { forward: (s: string) => parseInt(s, 10), backward: (n: number) => n.toString(), }); ``` In Python: ```{.python} chain = ProtolensChainHandle.auto_generate(old_schema, new_schema) print(chain.incomplete) # [ # ComplementHole(step=3, type="AddVertex", id="version", missing="default"), # ComplementHole(step=5, type="CoerceType", id="age", missing="coercion"), # ] # Fill the holes chain.set_default("version", 1) chain.set_coercion("age", forward=lambda s: int(s), backward=lambda n: str(n), ) ``` Attempting to instantiate an incomplete chain raises an error listing the missing specs. ::: {.callout-caution} ## What if two steps both need a default for the same vertex? If two steps both require a default for the same vertex (e.g., because it was removed and re-added), does one spec override the other, or does each step carry independent defaults? ::: ::: {.callout-tip collapse=true} ## Answer Each step carries its own complement spec independently. If two steps both require a default for the same vertex (because the vertex was removed and re-added), both defaults are stored separately. The first step's default is used during its `put` pass, and the second step's during its own. The chain doesn't enforce global uniqueness of defaults across steps; each operates on the schema as it exists at that point in the pipeline. ::: ## The fallback: overlap-based alignment {#sec-overlap-fallback} When two schemas are too different for morphism discovery to find a meaningful map (different vertex kinds, different edge structures, different theories), panproto falls back to **overlap-based alignment** via `discover_overlap` (from @sec-schema-merge). Instead of finding a full morphism, it finds the largest shared sub-schema and builds a symmetric lens through the overlap: ```{.typescript} const chain = ProtolensChainHandle.autoGenerate(schemaA, schemaB); // If no morphism found, falls back to overlap console.log(chain.strategy); // "overlap" (instead of "morphism") console.log(chain.overlap); // { shared: ["post", "author", "text"], aOnly: ["likeCount"], bOnly: ["published"] } ``` The overlap strategy produces a protolens chain with `RemoveVertex` steps for A-only fields and `AddVertex` steps for B-only fields. It's less precise than morphism-based generation (it cannot detect renames or restructurings), but it works even when the schemas have no common theory. ## Reusing chains across schemas {#sec-reuse} Since chain construction and instantiation are separate operations, a single chain applies to many schemas. ### Platform-wide transformations A platform operator defines a protolens chain once and applies it to every user schema: ```{.typescript} // Define the transformation const snakeCaseChain = ProtolensChainHandle.fromSteps([ Protolens.renameVertex("createdAt", "created_at"), Protolens.renameVertex("updatedAt", "updated_at"), Protolens.renameVertex("deletedAt", "deleted_at"), ]); // Apply to all schemas for (const schema of await Panproto.listSchemas()) { try { const lens = snakeCaseChain.instantiate(schema); await migrateAllRecords(schema, lens); } catch (e) { // Schema doesn't have these fields; skip } } ``` ### Version-controlled chains Protolens chains are serializable. Store them in the schematic version control system (@sec-version-control) alongside schema versions: ```bash $ prot protolens v1.json v2.json --save chain-v1-v2.json $ prot protolens v2.json v3.json --save chain-v2-v3.json # Compose chains $ prot protolens --compose chain-v1-v2.json chain-v2-v3.json --save chain-v1-v3.json ``` ## VCS integration {#sec-lens-diff} The `prot lens-diff` command shows the protolens chain between two schema versions: ```bash $ prot lens-diff HEAD~1..HEAD Schema changed: post-schema Protolens chain (2 steps): 1. RenameVertex("userName" -> "handle") 2. AddVertex("version", default: 1) Complement spec needed: none (all defaults available) Auto-instantiation: ready Schema changed: comment-schema Protolens chain (1 step): 1. RemoveVertex("legacyThreadId") Complement spec needed: none Auto-instantiation: ready ``` Combined with `prot convert`, you get end-to-end automatic migration: ```bash # Migrate all data from the previous version to the current version $ prot lens-diff HEAD~1..HEAD --apply --data ./records/ ``` For multi-version jumps, compose chains across multiple commits: ```bash # Compose protolens chains across three versions $ prot lens-diff HEAD~3..HEAD Composed protolens chain (5 steps): 1. RenameVertex("userName" -> "handle") [from v1->v2] 2. AddVertex("version", default: 1) [from v1->v2] 3. RemoveVertex("legacyThreadId") [from v2->v3] 4. RenameEdge("hasAuthor" -> "hasCreator") [from v3->v4] 5. CoerceType("age", string -> int) [from v3->v4] ``` ## Worked example: ATProto schema evolution {#sec-atproto-example} Three schema versions, based on real Bluesky evolution patterns. ### Version 1 (original) ```{.json} { "vertices": ["post", "text", "createdAt", "author", "likeCount"], "edges": [ { "from": "post", "to": "text", "kind": "prop" }, { "from": "post", "to": "createdAt", "kind": "prop" }, { "from": "post", "to": "author", "kind": "ref" }, { "from": "post", "to": "likeCount", "kind": "prop" } ] } ``` ### Version 2 (rename + add) `createdAt` becomes `created_at`; `version` field added. ```{.typescript} const chain_v1_v2 = ProtolensChainHandle.autoGenerate(v1, v2); // Steps: // 1. RenameVertex("createdAt" -> "created_at") // 2. AddVertex("version", default: 1) ``` ### Version 3 (remove + restructure) `likeCount` removed; `author` split into `author.did` and `author.handle`. ```{.typescript} const chain_v2_v3 = ProtolensChainHandle.autoGenerate(v2, v3); // Steps: // 1. RemoveVertex("likeCount") // 2. SplitVertex("author", into: ["author.did", "author.handle"]) // Complement spec needed: split function for "author" chain_v2_v3.setSplit("author", { split: (author) => ({ did: author.did, handle: author.handle }), merge: (did, handle) => ({ did, handle }), }); ``` ### Full pipeline: v1 to v3 ```{.typescript} // Compose the two chains const chain_v1_v3 = chain_v1_v2.compose(chain_v2_v3); // Steps: // 1. RenameVertex("createdAt" -> "created_at") // 2. AddVertex("version", default: 1) // 3. RemoveVertex("likeCount") // 4. SplitVertex("author", into: ["author.did", "author.handle"]) // Instantiate and migrate const lens = chain_v1_v3.instantiate(v1); const { view, complement } = lens.get(v1Data); // view has: post, text, created_at, version, author.did, author.handle // complement has: likeCount value, original author value ``` The complement from step 3 (`RemoveVertex("likeCount")`) stores the like count. The complement from step 4 (`SplitVertex`) stores the original `author` value for merge on `put`. If the view is modified and `put` is called, both complements reconstruct the v1 data with modifications applied. ## Fusion: composing transforms lazily {#sec-lazy-composition} Multi-step chains can be **fused** into a single protolens via `fuse()`. Fusion composes the endofunctors of all steps into a single `TheoryTransform::Compose` tree, so `instantiate()` applies one combined transform instead of materializing intermediate schemas. ```{.typescript} const chain = ProtolensChainHandle.autoGenerate(v1, v3); // chain has 4 steps const fused = chain.fuse(); // fused is a single Protolens with a composed transform const lens = fused.instantiate(schema); // One schema transform, no intermediate materialization ``` ```{.python} chain = ProtolensChainHandle.auto_generate(v1, v3) fused = chain.fuse() lens = fused.instantiate(schema) ``` When you call `instantiate()` on a multi-step `ProtolensChain`, it automatically fuses before instantiation. Calling `fuse()` explicitly is useful when you want to inspect the composed transform or serialize the fused protolens separately. For long chains (10+ steps), fusion avoids allocating and discarding intermediate schemas. The performance benefit scales linearly with chain length. ## Computing complement requirements {#sec-complement-requirements} `chain.requirements()` computes the full complement specification for a chain, tracking intermediate schema state through each step. For removal steps, the spec reports `DataCaptured` fields. For addition steps, it reports `DefaultsRequired` entries via the `AddedElement` complement variant. ```{.typescript} const spec = chain.requirements(schema, protocol); for (const entry of spec.entries) { if (entry.type === "DataCaptured") { console.log(`Step ${entry.step}: captures ${entry.fields.length} fields`); } else if (entry.type === "DefaultsRequired") { console.log(`Step ${entry.step}: needs ${entry.defaults.length} defaults`); } } ``` The complement is schema-dependent: the same protolens chain applied to two different schemas can produce different complement counts. A `removeVertex("metadata")` step captures 2 fields when the schema has 2 metadata edges, and 5 fields when it has 5. ```{.python} spec_a = chain.requirements(schema_a, protocol) spec_b = chain.requirements(schema_b, protocol) # Same chain, different schemas, different counts assert spec_a.captured_count != spec_b.captured_count ``` ::: {.callout-caution} ## Why are complements schema-dependent? A single `removeVertex("metadata")` protolens captures different numbers of fields depending on the schema. If you're reusing a chain across dozens of schemas, how do you predict total complement storage cost? ::: ::: {.callout-tip collapse=true} ## Answer Because the protolens operates on the schema's structure, and different schemas have different structures. A `removeVertex("metadata")` protolens captures whatever edges `metadata` has in the specific schema. Schema A might have 2 metadata edges; schema B might have 5. To predict total complement storage cost before migration, call `chain.requirements(schema, protocol)` on each schema; it reports the exact complement size without performing the migration. ::: ## When auto-generation fails {#sec-auto-gen-failure} Auto-generation is not magic. It fails predictably. ### No morphism exists If the schemas are too different (different theories, no shared structure), `find_morphisms` returns empty. panproto falls back to overlap-based alignment (@sec-overlap-fallback), but the resulting lens may be lossy in both directions. ### Ambiguous restructuring When a vertex could map to multiple targets (e.g., `name` could map to `displayName` or `userName`), the quality score may not disambiguate. Provide manual disambiguation: ```{.typescript} const chain = ProtolensChainHandle.autoGenerate(old, new, { hints: { "name": "displayName" }, // manual disambiguation }); ``` ### Non-elementary transformations Some schema changes cannot decompose into the 11 elementary schema transforms. A change that splits a vertex *and* renames it *and* changes its type in a single step may not factor cleanly. panproto generates a partial chain and reports the remainder: ``` warning: partial factorization factored: 4 of 5 changes unfactored: vertex "metadata" changed kind, name, and structure simultaneously hint: break this change into smaller steps, or write a manual lens for this step ``` ### Manual composition for hard cases When auto-generation fails, compose auto-generated steps with a manual protolens for the difficult part: ```{.typescript} // Auto-generate what we can const autoChain = ProtolensChainHandle.autoGenerate(old, new, { exclude: ["metadata"], // skip the hard part }); // Write the hard part manually const manualStep = Protolens.custom({ precondition: (schema) => schema.hasVertex("metadata"), forward: (schema) => { /* custom transformation */ }, backward: (schema) => { /* custom inverse */ }, }); // Compose const fullChain = autoChain.append(manualStep); ```