flowchart LR
S1[Schema v1] --> HOM["hom_search"]
S2[Schema v2] --> HOM
HOM --> MOR["TheoryMorphism"]
MOR --> FAC["factorize"]
FAC --> CHAIN["ProtolensChain"]
CHAIN --> INST["instantiate"]
INST --> LENS["Lens"]
style LENS fill:#e8f4e8,stroke:#2d7d2d
20 Automated Lens Generation Pipeline
Hand-writing a lens for every schema change is viable when changes are infrequent and small. It stops being viable when a platform has hundreds of schemas evolving independently. The automated lens generation pipeline closes this gap: given two schemas, it searches for a structure-preserving map, decomposes it into elementary steps, converts each step to a protolens, and instantiates the resulting chain—no manual lens construction required.
For the protolens engine internals, see Chapter 19. For homomorphism search and the extend/restrict/complete operations, see Section 18.3.
20.1 End-to-end pipeline
The automated lens generation pipeline consists of five stages:
- Homomorphism search (
hom_search): find a structure-preserving map between the source and target schemas. Uses the CSP-based backtracking search described in Section 18.3. - Factorization (
factorize): decompose the resulting theory morphism into a sequence of elementary transforms (drops, renames, and adds). - Protolens chain construction (
diff_to_protolens): map each elementary transform to an elementary protolens constructor. - Instantiation: apply the protolens chain to the source schema, producing a concrete
Lens. - Application: use the lens’s
getandputto migrate data bidirectionally.
20.2 schema_to_implicit_theory: bridging schema-level to GAT-level
The pipeline’s first bridge is schema_to_implicit_theory, which extracts a GAT from a schema’s structural information:
- Each unique vertex kind becomes a sort (a type in a formal theory, not a sorting operation; here simple, with arity 0).
- Each unique edge kind becomes a unary operation from the source vertex’s kind to the target vertex’s kind.
This implicit theory is sufficient for constraint evaluation and morphism factorization. It doesn’t capture equations or dependent sorts; those are properties of the protocol’s schema theory, not of individual schemas.
fn schema_to_implicit_theory(schema: &Schema) -> Theory {
// Collect unique vertex kinds as sorts
// Collect unique edge kinds as operations
Theory::new("implicit", sorts, ops, Vec::new())
}The implicit theory is computed lazily during Protolens::applicable_to and isn’t cached, since schemas are typically small and the extraction is \(O(|V| + |E|)\).
The implicit theory is a presentation of the schema’s structure in the language of GATs. It’s not the same as the protocol’s schema theory (which defines vertex kinds and edge rules globally). The implicit theory is local to a specific schema instance, capturing only the kinds that actually appear.
20.3 The factorize algorithm
File: crates/panproto-gat/src/factorize.rs
The factorize function decomposes a theory morphism \(F: T_1 \to T_2\) into a sequence of elementary TheoryEndofunctor steps:
///
/// Returns [`GatError::FactorizationError`] if the morphism cannot be
/// factorized into a valid sequence of transforms.
pub fn factorize(
morphism: &TheoryMorphism,
domain: &Theory,
codomain: &Theory,
) -> Result<Factorization, GatError> {
let mut steps = Vec::new();
// Phase 1: Identify renames
let mut sort_renames: Vec<(Arc<str>, Arc<str>)> = Vec::new();
let mut op_renames: Vec<(Arc<str>, Arc<str>)> = Vec::new();
for (old, new) in &morphism.sort_map {
if old != new && domain.has_sort(old) && codomain.has_sort(new) {
sort_renames.push((Arc::clone(old), Arc::clone(new)));
}
}
for (old, new) in &morphism.op_map {
if old != new && domain.has_op(old) && codomain.has_op(new) {
op_renames.push((Arc::clone(old), Arc::clone(new)));
}
}
// Phase 2: Drops
emit_drops(&mut steps, morphism, domain, codomain);
// Phase 3: Renames
emit_renames(&mut steps, &sort_renames, &op_renames);
// Phase 4: Adds
emit_adds(&mut steps, morphism, domain, codomain);
Ok(Factorization {
steps,
domain: Arc::clone(&morphism.domain),20.3.1 Phase ordering
The factorization proceeds in four phases, ordered to maintain well-formedness at each intermediate step:
- Identify renames: sort-map and op-map entries where
old != newand both names exist in their respective theories. - Emit drops (equations first, then ops, then sorts): remove elements present in the domain but absent from the codomain. Dropping equations before operations, and operations before sorts, ensures the result stays well-formed.
- Emit renames: apply the identified sort and operation renames.
- Emit adds (sorts first, then ops, then equations): add elements present in the codomain but absent from the (renamed) domain. Sorts are topologically sorted by parameter dependencies to ensure a sort’s parameters exist before the sort is added.
Dropping before renaming avoids name collisions: if we renamed first, a rename A -> B followed by a drop of B would drop the wrong element. Renaming before adding ensures the adds operate on the already-renamed theory, so new elements can reference renamed sorts without ambiguity.
20.3.2 Topological sort of dependent sorts
When adding new sorts, the emit_adds function performs a topological sort based on SortParam dependencies:
fn emit_adds(
steps: &mut Vec<TheoryEndofunctor>,
morphism: &TheoryMorphism,
domain: &Theory,
codomain: &Theory,
) {
let domain_sort_names_after_renames: FxHashSet<Arc<str>> = domain
.sorts
.iter()
.map(|s| morphism.sort_map.get(&s.name).unwrap_or(&s.name).clone())
.collect();
let domain_op_names_after_renames: FxHashSet<Arc<str>> = domain
.ops
.iter()
.map(|o| morphism.op_map.get(&o.name).unwrap_or(&o.name).clone())
.collect();
// Sorts to add (topologically sorted by parameter deps)
let sorts_to_add: Vec<_> = codomain
.sorts
.iter()
.filter(|s| !domain_sort_names_after_renames.contains(&s.name))
.collect();
let mut added_sorts = domain_sort_names_after_renames;
let mut sorted_adds = Vec::new();
let mut remaining = sorts_to_add;
let max_iterations = remaining.len() + 1;
for _ in 0..max_iterations {
if remaining.is_empty() {
break;
}
let (ready, not_ready): (Vec<_>, Vec<_>) = remaining
.into_iter()
.partition(|s| s.params.iter().all(|p| added_sorts.contains(&p.sort)));
for sort in &ready {
added_sorts.insert(Arc::clone(&sort.name));
sorted_adds.push((*sort).clone());
}
remaining = not_ready;
}
for sort in sorted_adds {
steps.push(TheoryEndofunctor {
name: Arc::from(format!("add_sort_{}", sort.name)),
precondition: TheoryConstraint::Unconstrained,
transform: TheoryTransform::AddSort {
sort,
vertex_kind: None,A sort with parameters depending on other sorts must be added after those dependency sorts. The algorithm iterates until all additions are emitted, with a cycle guard (max_iterations) to prevent infinite loops in malformed inputs.
20.3.3 Factorization result
The Factorization struct holds the ordered sequence of steps plus domain and codomain names:
#[derive(Debug, Clone)]
pub struct Factorization {
/// Ordered sequence of elementary endofunctors.
pub steps: Vec<TheoryEndofunctor>,
/// Domain theory name.
pub domain: Arc<str>,
/// Codomain theory name.
pub codomain: Arc<str>,
}20.3.4 Validation
validate_factorization applies the step sequence to the domain theory and checks that the result contains all sorts and operations from the codomain:
///
/// Returns [`GatError::FactorizationError`] if the factorized theory is
/// missing sorts or operations from the codomain.
pub fn validate_factorization(
factorization: &Factorization,
domain: &Theory,
codomain: &Theory,
) -> Result<(), GatError> {
let mut current = domain.clone();
for step in &factorization.steps {
current = step.transform.apply(¤t)?;
}
for sort in &codomain.sorts {
if !current.has_sort(&sort.name) {
return Err(GatError::FactorizationError(format!(
"factorized theory missing sort '{}' from codomain",
sort.name
)));
}
}
for op in &codomain.ops {
if !current.has_op(&op.name) {
return Err(GatError::FactorizationError(format!(
"factorized theory missing op '{}' from codomain",
op.name
)));
}This is an end-to-end correctness check: if validation passes, the factorization faithfully represents the original morphism.
20.4 diff_to_protolens: mapping transforms to protolenses
Each TheoryEndofunctor in a factorization maps directly to an elementary protolens constructor:
TheoryTransform variant |
elementary:: constructor |
Complement |
|---|---|---|
AddSort(s) |
add_sort(s.name, s.name, Null) |
Empty |
DropSort(n) |
drop_sort(n) |
DroppedSortData |
RenameSort{old,new} |
rename_sort(old, new) |
Empty |
AddOp(op) |
add_op(op.name, ...) |
Empty |
DropOp(n) |
drop_op(n) |
DroppedOpData |
RenameOp{old,new} |
rename_op(old, new) |
Empty |
AddEquation(eq) |
add_equation(eq) |
Empty |
DropEquation(n) |
drop_equation(n) |
Empty |
Pullback(morph) |
pullback(morph) |
Empty |
The conversion is mechanical: each transform step becomes a protolens step with the same semantics. The resulting Vec<Protolens> is wrapped in a ProtolensChain for instantiation.
20.5 Symmetric lens construction from overlap discovery
When two schemas overlap partially, the pipeline constructs a symmetric lens by:
- Discovering overlap: use
find_best_morphism(frompanproto-mig/src/overlap.rs) withmonic: trueto find the maximal common substructure. - Building protolens chains for each direction: factorize the morphisms from each schema to the overlap, producing two protolens chains (one forward, one backward).1
- Instantiating both chains: produce \(\text{Lens}(S_1, \text{overlap})\) and \(\text{Lens}(S_2, \text{overlap})\).
- Composing into a symmetric lens: combine the forward lens from \(S_1\) to the overlap with the backward lens from the overlap to \(S_2\).
The overlap serves as the “common view” that both schemas can project onto. Data not in the overlap is captured in the respective complements.
flowchart LR
S1["Schema S1"] -->|"project"| O["Overlap"]
S2["Schema S2"] -->|"project"| O
O -->|"embed"| S1
O -->|"embed"| S2
style O fill:#e8f4e8,stroke:#2d7d2d
20.6 WASM boundary
Protolens chains cross the WASM boundary via handle-based allocation, consistent with panproto’s handle/slab architecture (see Chapter 13):
create_protolens_chain: allocates aProtolensChainin the Rust slab, returns au32handle to JavaScript.instantiate_protolens: takes a chain handle and a schema handle, instantiates the chain, and returns a lens handle.protolens_chain_info: returns metadata (step count, complement types) as a MessagePack-encoded blob.
The ComplementConstructor enum is serialized via serde into MessagePack for transmission across the WASM boundary. The TypeScript SDK deserializes it into a discriminated union type.
Protolens chains are lightweight on the Rust side (no computation until instantiation), so it’s cheap to create multiple chains speculatively in JavaScript, evaluate their applicability, and instantiate only the best match.
20.7 CLI handlers
The CLI exposes protolens functionality through several subcommands:
| Command | Description |
|---|---|
prot convert |
Convert data between schema versions using the automated pipeline |
prot lens |
Build a lens from a combinator chain or protolens specification |
prot lens-apply |
Apply a lens to an instance (get or put direction) |
prot lens-verify |
Verify GetPut and PutGet laws on a lens with test data |
prot lens-compose |
Compose two lenses sequentially |
prot lens-diff |
Compute the protolens chain between two schemas |
The lens-diff command is the primary entry point for the automated pipeline:
prot lens-diff --src schema-v1.json --tgt schema-v2.json --protocol json-schemaThis runs the full pipeline (hom_search, factorize, diff_to_protolens) and outputs the protolens chain as a JSON array of steps. Each step includes the transform names, preconditions, and complement constructors, providing a human-readable audit trail of the derived migration.
The lens-verify command property-tests the lens laws:
prot lens-verify --lens lens.json --instance data.json --iterations 1000This generates random modifications to the instance data and verifies that GetPut and PutGet hold for each modification.
20.8 Fleet application
apply_to_fleet iterates over a collection of schemas, checks each against the chain’s preconditions, and either instantiates the chain or records the failure:
pub fn apply_to_fleet(
chain: &ProtolensChain,
schemas: &[Schema],
protocol: &Protocol,
) -> FleetResult {
let mut applied = Vec::new();
let mut skipped = Vec::new();
for schema in schemas {
let failures = chain.check_applicability(schema);
if failures.is_empty() {
match chain.instantiate(schema, protocol) {
Ok(lens) => applied.push(FleetEntry { schema, lens }),
Err(e) => skipped.push(SkippedEntry {
schema,
reasons: vec![e.to_string()],
}),
}
} else {
skipped.push(SkippedEntry {
schema,
reasons: failures.iter().map(|f| f.to_string()).collect(),
});
}
}
FleetResult { applied, skipped }
}FleetResult partitions outcomes into applied (successful instantiations) and skipped (schemas that failed precondition checks or instantiation). Failed instantiations are skipped rather than fatal; a single broken schema doesn’t abort the fleet migration.
20.9 Complement state tracking
chain_complement_spec advances a current_schema through each step’s target transform before evaluating the next step’s complement. Each step’s complement is computed against the schema as it exists after all preceding steps. This means a drop_sort("metadata") followed by add_sort("metadata_v2") correctly reports the drop’s captured data count from the original schema and the add’s default requirement from the intermediate schema.
The AddedElement variant of ComplementConstructor handles add_sort and add_op protolenses. These require default values at instantiation time. AddedElement { name, kind } records what the user must supply, and chain.requirements() aggregates these into a complete list of defaults needed for the chain to instantiate.
20.10 Serialization format
Protolens chains serialize to JSON with the following structure:
{
"steps": [
{
"name": "rename_sort",
"source": {
"transform": "Identity",
"constraint": "Unconstrained"
},
"target": {
"transform": { "RenameSort": { "old": "Vertex", "new": "Node" } },
"constraint": { "HasSort": "Vertex" }
},
"complement_constructor": "Empty"
},
{
"name": "drop_op",
"source": {
"transform": "Identity",
"constraint": "Unconstrained"
},
"target": {
"transform": { "DropOp": "legacyEdge" },
"constraint": { "HasOp": "legacyEdge" }
},
"complement_constructor": { "DroppedOpData": { "op": "legacyEdge" } }
}
]
}Each step includes its name, source and target endofunctors (transform + constraint), and complement constructor. The from_json(to_json(chain)) round-trip produces a chain that is structurally equal to the original: same step count, same transforms, same constraints, same complement constructors.
Individual protolenses also support to_json() / from_json() with the same per-step structure (minus the outer steps array).
20.11 Testing strategy
The protolens engine uses three testing approaches:
20.11.1 Unit tests
Each elementary protolens constructor has unit tests verifying:
- Applicability: the protolens is applicable to schemas containing the relevant sorts/ops and not applicable otherwise.
- Instantiation correctness: the resulting lens has the expected source and target schemas.
- Losslessness classification:
is_lossless()returns the correct value.
20.11.2 Commutativity checks
Commutativity is tested by instantiating the same protolens at multiple related schemas and verifying that the resulting lenses commute with schema morphisms between those schemas. The test fixture generates a family of schemas of increasing complexity and checks commutativity for each pair.
20.11.3 Round-trip verification
The check_laws function from panproto-lens/src/laws.rs is applied to instantiated protolenses:
- Generate a random instance conforming to the source schema.
- Apply
getto produce a view and complement. - Apply
putwith the unmodified view and complement. - Verify that the restored instance equals the original (GetPut).
- Apply
getto the restored instance. - Verify that the new view and complement equal the originals (PutGet).
20.11.4 Property-based tests
The integration test suite (tests/integration/tests/lens_laws.rs) includes property-based tests that:
- Generate random schema pairs by applying random sequences of elementary protolens constructors to a base schema.
- Factorize the resulting morphism.
- Verify that
validate_factorizationpasses. - Instantiate the protolens chain and verify lens laws.
This provides coverage over the full pipeline, catching regressions in factorization ordering, schema transformation, and migration derivation.
Each direction involves a morphism from a full schema to the overlap subschema. Factorizing these morphisms into elementary transforms and mapping them to protolenses yields two chains whose composition forms the symmetric lens.↩︎