17 Automatic Lens Generation
You have two schema versions. You want a lens to migrate data between them. You could write the lens by hand, or you could ask the system: “What changed? Generate the lens automatically.” That’s automatic lens generation.
The process runs in five stages, transforming two schemas into a concrete lens. The first three stages analyze structure alone (schema-independent). The last two take a specific schema and produce a concrete lens (schema-specific). Understanding this boundary is crucial for knowing when auto-generation works, when it needs help, and when it fails.
Consider the call ProtolensChainHandle.autoGenerate(oldSchema, newSchema) from Section 16.9. One call runs all five stages under the hood.
17.1 The five-stage pipeline
Given two schemas, panproto generates a lens in these steps:
Schemas -> Morphism -> Factorization -> Protolens Chain -> Instantiation -> Lens
(1) (2) (3) (4) (5)
- Morphism discovery. Find a structure-preserving map between schemas (the homomorphism search from Chapter 13).
- Morphism analysis. Classify the morphism: which vertices are renamed, added, removed, or restructured?
- Factorization. Decompose the morphism into a sequence of elementary schema transforms from Table 16.1.
- Protolens chain construction. Convert each schema transform into its corresponding elementary protolens.
- Instantiation. Apply the protolens chain to a specific schema to produce a concrete lens.
Steps 1 through 3 depend only on the two reference schemas. They produce a protolens chain, reusable across many schemas. Steps 4 and 5 depend on the target schema and produce a concrete lens with concrete get/put operations and a concrete complement type.
This separation is why protolenses scale: steps 1 through 3 run once; steps 4 and 5 run per schema.
import { ProtolensChainHandle, LensHandle } from "@panproto/core";
// Steps 1-3: schema-independent (run once)
const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema);
// Steps 4-5: schema-specific (run per schema)
const lensA: LensHandle = chain.instantiate(schemaA);
const lensB: LensHandle = chain.instantiate(schemaB);
const lensC: LensHandle = chain.instantiate(schemaC);If the morphism is ambiguous (multiple plausible mappings with similar scores), auto-generation may pick the wrong one. The chain.incomplete property lists steps with uncertain factorizations. Ambiguity warnings (two morphisms within 5% of each other) signal that the engine’s top choice may not match your intent. Provide manual hints via the hints parameter to disambiguate.
17.2 Factorization: decomposing a morphism
The core algorithmic step is factorization: decomposing a morphism into elementary schema transforms. Think of it as factoring an integer into primes. The result is a canonical (or near-canonical) sequence of simple operations whose composition equals the original morphism.
panproto uses a greedy algorithm with four passes:
- Rename pass. Identify vertices and edges that changed names but preserved structure. Emit
RenameVertexandRenameEdgetransforms. - Removal pass. Identify vertices in the source that have no image in the target. Emit
RemoveVertexandRemoveEdgetransforms. - Addition pass. Identify vertices in the target that have no preimage in the source. Emit
AddVertexandAddEdgetransforms. - Restructuring pass. Identify structural changes (nesting, type coercion, splitting, merging). Emit
WrapVertex,HoistVertex,CoerceType,MergeVertices, orSplitVertextransforms.
Order matters. Renames happen first (so subsequent passes see canonical names), then removals (so additions don’t conflict with existing vertices), then additions, finally restructuring.
pub fn factorize(morphism: &Morphism) -> Vec<TheoryTransform> {
let mut steps = Vec::new();
steps.extend(extract_renames(morphism));
steps.extend(extract_removals(morphism));
steps.extend(extract_additions(morphism));
steps.extend(extract_restructurings(morphism));
steps
}17.3 Handling ambiguity
Multiple valid morphisms may exist. Multiple valid decompositions may fit a single morphism.
17.3.1 Multiple morphisms
When find_morphisms returns several results, panproto scores them using the quality metric from Chapter 13:
- Name similarity: Edit distance between mapped vertex IDs.
- Edge preservation: Fraction of edges that keep the same label.
- Structural preservation: Fraction of the schema that is unchanged.
The highest-scoring morphism wins. If the top two are within 5% of each other, panproto warns:
warning: ambiguous morphism
morphism 1 (score 0.87): maps author -> creator, post -> post
morphism 2 (score 0.84): maps author -> post.author, post -> post
using morphism 1 (highest score)
hint: use --morphism-index 2 to select the alternative
17.3.2 Multiple factorizations
A single morphism may factor in more than one way. Renaming a to b then removing c gives the same result as removing c then renaming a to b (when they’re independent). panproto canonicalizes by applying passes in fixed order: renames, removals, additions, restructurings. Within each pass, operations are sorted by vertex ID for determinism.
For genuinely ambiguous restructurings (where the same morphism could be factored as “wrap then hoist” or “hoist then wrap”), panproto prefers the factorization with fewer steps.
17.4 Complement specs: filling the holes
Some schema transforms require information the morphism alone doesn’t provide:
AddVertexneeds a default value. What shouldversiondefault to?CoerceTypeneeds a coercion function. How do you convertage: stringtoage: int?MergeVerticesneeds merge and split functions. How do you combinefirstNameandlastName?
These are complement specs: user-provided parameters that complete the protolens. Without them, auto-generation produces a partial chain with “holes”:
const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema);
console.log(chain.incomplete);
// [
// { step: 3, type: "AddVertex", id: "version", missing: "default" },
// { step: 5, type: "CoerceType", id: "age", missing: "coercion" },
// ]
// Fill the holes
chain.setDefault("version", 1);
chain.setCoercion("age", {
forward: (s: string) => parseInt(s, 10),
backward: (n: number) => n.toString(),
});In Python:
chain = ProtolensChainHandle.auto_generate(old_schema, new_schema)
print(chain.incomplete)
# [
# ComplementHole(step=3, type="AddVertex", id="version", missing="default"),
# ComplementHole(step=5, type="CoerceType", id="age", missing="coercion"),
# ]
# Fill the holes
chain.set_default("version", 1)
chain.set_coercion("age",
forward=lambda s: int(s),
backward=lambda n: str(n),
)Attempting to instantiate an incomplete chain raises an error listing the missing specs.
If two steps both require a default for the same vertex (e.g., because it was removed and re-added), does one spec override the other, or does each step carry independent defaults?
Each step carries its own complement spec independently. If two steps both require a default for the same vertex (because the vertex was removed and re-added), both defaults are stored separately. The first step’s default is used during its put pass, and the second step’s during its own. The chain doesn’t enforce global uniqueness of defaults across steps; each operates on the schema as it exists at that point in the pipeline.
17.5 The fallback: overlap-based alignment
When two schemas are too different for morphism discovery to find a meaningful map (different vertex kinds, different edge structures, different theories), panproto falls back to overlap-based alignment via discover_overlap (from Section 13.4).
Instead of finding a full morphism, it finds the largest shared sub-schema and builds a symmetric lens through the overlap:
const chain = ProtolensChainHandle.autoGenerate(schemaA, schemaB);
// If no morphism found, falls back to overlap
console.log(chain.strategy);
// "overlap" (instead of "morphism")
console.log(chain.overlap);
// { shared: ["post", "author", "text"], aOnly: ["likeCount"], bOnly: ["published"] }The overlap strategy produces a protolens chain with RemoveVertex steps for A-only fields and AddVertex steps for B-only fields. It’s less precise than morphism-based generation (it cannot detect renames or restructurings), but it works even when the schemas have no common theory.
17.6 Reusing chains across schemas
Since chain construction and instantiation are separate operations, a single chain applies to many schemas.
17.6.1 Platform-wide transformations
A platform operator defines a protolens chain once and applies it to every user schema:
// Define the transformation
const snakeCaseChain = ProtolensChainHandle.fromSteps([
Protolens.renameVertex("createdAt", "created_at"),
Protolens.renameVertex("updatedAt", "updated_at"),
Protolens.renameVertex("deletedAt", "deleted_at"),
]);
// Apply to all schemas
for (const schema of await Panproto.listSchemas()) {
try {
const lens = snakeCaseChain.instantiate(schema);
await migrateAllRecords(schema, lens);
} catch (e) {
// Schema doesn't have these fields; skip
}
}17.6.2 Version-controlled chains
Protolens chains are serializable. Store them in the schematic version control system (Chapter 10) alongside schema versions:
$ prot protolens v1.json v2.json --save chain-v1-v2.json
$ prot protolens v2.json v3.json --save chain-v2-v3.json
# Compose chains
$ prot protolens --compose chain-v1-v2.json chain-v2-v3.json --save chain-v1-v3.json17.7 VCS integration
The prot lens-diff command shows the protolens chain between two schema versions:
$ prot lens-diff HEAD~1..HEAD
Schema changed: post-schema
Protolens chain (2 steps):
1. RenameVertex("userName" -> "handle")
2. AddVertex("version", default: 1)
Complement spec needed: none (all defaults available)
Auto-instantiation: ready
Schema changed: comment-schema
Protolens chain (1 step):
1. RemoveVertex("legacyThreadId")
Complement spec needed: none
Auto-instantiation: readyCombined with prot convert, you get end-to-end automatic migration:
# Migrate all data from the previous version to the current version
$ prot lens-diff HEAD~1..HEAD --apply --data ./records/For multi-version jumps, compose chains across multiple commits:
# Compose protolens chains across three versions
$ prot lens-diff HEAD~3..HEAD
Composed protolens chain (5 steps):
1. RenameVertex("userName" -> "handle") [from v1->v2]
2. AddVertex("version", default: 1) [from v1->v2]
3. RemoveVertex("legacyThreadId") [from v2->v3]
4. RenameEdge("hasAuthor" -> "hasCreator") [from v3->v4]
5. CoerceType("age", string -> int) [from v3->v4]17.8 Worked example: ATProto schema evolution
Three schema versions, based on real Bluesky evolution patterns.
17.8.1 Version 1 (original)
{
"vertices": ["post", "text", "createdAt", "author", "likeCount"],
"edges": [
{ "from": "post", "to": "text", "kind": "prop" },
{ "from": "post", "to": "createdAt", "kind": "prop" },
{ "from": "post", "to": "author", "kind": "ref" },
{ "from": "post", "to": "likeCount", "kind": "prop" }
]
}17.8.2 Version 2 (rename + add)
createdAt becomes created_at; version field added.
const chain_v1_v2 = ProtolensChainHandle.autoGenerate(v1, v2);
// Steps:
// 1. RenameVertex("createdAt" -> "created_at")
// 2. AddVertex("version", default: 1)17.8.3 Version 3 (remove + restructure)
likeCount removed; author split into author.did and author.handle.
const chain_v2_v3 = ProtolensChainHandle.autoGenerate(v2, v3);
// Steps:
// 1. RemoveVertex("likeCount")
// 2. SplitVertex("author", into: ["author.did", "author.handle"])
// Complement spec needed: split function for "author"
chain_v2_v3.setSplit("author", {
split: (author) => ({ did: author.did, handle: author.handle }),
merge: (did, handle) => ({ did, handle }),
});17.8.4 Full pipeline: v1 to v3
// Compose the two chains
const chain_v1_v3 = chain_v1_v2.compose(chain_v2_v3);
// Steps:
// 1. RenameVertex("createdAt" -> "created_at")
// 2. AddVertex("version", default: 1)
// 3. RemoveVertex("likeCount")
// 4. SplitVertex("author", into: ["author.did", "author.handle"])
// Instantiate and migrate
const lens = chain_v1_v3.instantiate(v1);
const { view, complement } = lens.get(v1Data);
// view has: post, text, created_at, version, author.did, author.handle
// complement has: likeCount value, original author valueThe complement from step 3 (RemoveVertex("likeCount")) stores the like count. The complement from step 4 (SplitVertex) stores the original author value for merge on put. If the view is modified and put is called, both complements reconstruct the v1 data with modifications applied.
17.9 Fusion: composing transforms lazily
Multi-step chains can be fused into a single protolens via fuse(). Fusion composes the endofunctors of all steps into a single TheoryTransform::Compose tree, so instantiate() applies one combined transform instead of materializing intermediate schemas.
const chain = ProtolensChainHandle.autoGenerate(v1, v3);
// chain has 4 steps
const fused = chain.fuse();
// fused is a single Protolens with a composed transform
const lens = fused.instantiate(schema);
// One schema transform, no intermediate materializationchain = ProtolensChainHandle.auto_generate(v1, v3)
fused = chain.fuse()
lens = fused.instantiate(schema)When you call instantiate() on a multi-step ProtolensChain, it automatically fuses before instantiation. Calling fuse() explicitly is useful when you want to inspect the composed transform or serialize the fused protolens separately.
For long chains (10+ steps), fusion avoids allocating and discarding intermediate schemas. The performance benefit scales linearly with chain length.
17.10 Computing complement requirements
chain.requirements() computes the full complement specification for a chain, tracking intermediate schema state through each step. For removal steps, the spec reports DataCaptured fields. For addition steps, it reports DefaultsRequired entries via the AddedElement complement variant.
const spec = chain.requirements(schema, protocol);
for (const entry of spec.entries) {
if (entry.type === "DataCaptured") {
console.log(`Step ${entry.step}: captures ${entry.fields.length} fields`);
} else if (entry.type === "DefaultsRequired") {
console.log(`Step ${entry.step}: needs ${entry.defaults.length} defaults`);
}
}The complement is schema-dependent: the same protolens chain applied to two different schemas can produce different complement counts. A removeVertex("metadata") step captures 2 fields when the schema has 2 metadata edges, and 5 fields when it has 5.
spec_a = chain.requirements(schema_a, protocol)
spec_b = chain.requirements(schema_b, protocol)
# Same chain, different schemas, different counts
assert spec_a.captured_count != spec_b.captured_countA single removeVertex("metadata") protolens captures different numbers of fields depending on the schema. If you’re reusing a chain across dozens of schemas, how do you predict total complement storage cost?
Because the protolens operates on the schema’s structure, and different schemas have different structures. A removeVertex("metadata") protolens captures whatever edges metadata has in the specific schema. Schema A might have 2 metadata edges; schema B might have 5. To predict total complement storage cost before migration, call chain.requirements(schema, protocol) on each schema; it reports the exact complement size without performing the migration.
17.11 When auto-generation fails
Auto-generation is not magic. It fails predictably.
17.11.1 No morphism exists
If the schemas are too different (different theories, no shared structure), find_morphisms returns empty. panproto falls back to overlap-based alignment (Section 17.5), but the resulting lens may be lossy in both directions.
17.11.2 Ambiguous restructuring
When a vertex could map to multiple targets (e.g., name could map to displayName or userName), the quality score may not disambiguate. Provide manual disambiguation:
const chain = ProtolensChainHandle.autoGenerate(old, new, {
hints: { "name": "displayName" }, // manual disambiguation
});17.11.3 Non-elementary transformations
Some schema changes cannot decompose into the 11 elementary schema transforms. A change that splits a vertex and renames it and changes its type in a single step may not factor cleanly. panproto generates a partial chain and reports the remainder:
warning: partial factorization
factored: 4 of 5 changes
unfactored: vertex "metadata" changed kind, name, and structure simultaneously
hint: break this change into smaller steps, or write a manual lens for this step
17.11.4 Manual composition for hard cases
When auto-generation fails, compose auto-generated steps with a manual protolens for the difficult part:
// Auto-generate what we can
const autoChain = ProtolensChainHandle.autoGenerate(old, new, {
exclude: ["metadata"], // skip the hard part
});
// Write the hard part manually
const manualStep = Protolens.custom({
precondition: (schema) => schema.hasVertex("metadata"),
forward: (schema) => { /* custom transformation */ },
backward: (schema) => { /* custom inverse */ },
});
// Compose
const fullChain = autoChain.append(manualStep);