20  Automated Lens Generation Pipeline

Hand-writing a lens for every schema change is viable when changes are infrequent and small. It stops being viable when a platform has hundreds of schemas evolving independently. The automated lens generation pipeline closes this gap: given two schemas, it searches for a structure-preserving map, decomposes it into elementary steps, converts each step to a protolens, and instantiates the resulting chain—no manual lens construction required.

For the protolens engine internals, see Chapter 19. For homomorphism search and the extend/restrict/complete operations, see Section 18.3.

20.1 End-to-end pipeline

The automated lens generation pipeline consists of five stages:

flowchart LR
    S1[Schema v1] --> HOM["hom_search"]
    S2[Schema v2] --> HOM
    HOM --> MOR["TheoryMorphism"]
    MOR --> FAC["factorize"]
    FAC --> CHAIN["ProtolensChain"]
    CHAIN --> INST["instantiate"]
    INST --> LENS["Lens"]

    style LENS fill:#e8f4e8,stroke:#2d7d2d
Figure 20.1: End-to-end automated lens generation pipeline. Each stage transforms the representation one step closer to an executable lens.
  1. Homomorphism search (hom_search): find a structure-preserving map between the source and target schemas. Uses the CSP-based backtracking search described in Section 18.3.
  2. Factorization (factorize): decompose the resulting theory morphism into a sequence of elementary transforms (drops, renames, and adds).
  3. Protolens chain construction (diff_to_protolens): map each elementary transform to an elementary protolens constructor.
  4. Instantiation: apply the protolens chain to the source schema, producing a concrete Lens.
  5. Application: use the lens’s get and put to migrate data bidirectionally.

20.2 schema_to_implicit_theory: bridging schema-level to GAT-level

The pipeline’s first bridge is schema_to_implicit_theory, which extracts a GAT from a schema’s structural information:

  • Each unique vertex kind becomes a sort (a type in a formal theory, not a sorting operation; here simple, with arity 0).
  • Each unique edge kind becomes a unary operation from the source vertex’s kind to the target vertex’s kind.

This implicit theory is sufficient for constraint evaluation and morphism factorization. It doesn’t capture equations or dependent sorts; those are properties of the protocol’s schema theory, not of individual schemas.

fn schema_to_implicit_theory(schema: &Schema) -> Theory {
    // Collect unique vertex kinds as sorts
    // Collect unique edge kinds as operations
    Theory::new("implicit", sorts, ops, Vec::new())
}

The implicit theory is computed lazily during Protolens::applicable_to and isn’t cached, since schemas are typically small and the extraction is \(O(|V| + |E|)\).

Note

The implicit theory is a presentation of the schema’s structure in the language of GATs. It’s not the same as the protocol’s schema theory (which defines vertex kinds and edge rules globally). The implicit theory is local to a specific schema instance, capturing only the kinds that actually appear.

20.3 The factorize algorithm

File: crates/panproto-gat/src/factorize.rs

The factorize function decomposes a theory morphism \(F: T_1 \to T_2\) into a sequence of elementary TheoryEndofunctor steps:

///
/// Returns [`GatError::FactorizationError`] if the morphism cannot be
/// factorized into a valid sequence of transforms.
pub fn factorize(
    morphism: &TheoryMorphism,
    domain: &Theory,
    codomain: &Theory,
) -> Result<Factorization, GatError> {
    let mut steps = Vec::new();

    // Phase 1: Identify renames
    let mut sort_renames: Vec<(Arc<str>, Arc<str>)> = Vec::new();
    let mut op_renames: Vec<(Arc<str>, Arc<str>)> = Vec::new();

    for (old, new) in &morphism.sort_map {
        if old != new && domain.has_sort(old) && codomain.has_sort(new) {
            sort_renames.push((Arc::clone(old), Arc::clone(new)));
        }
    }
    for (old, new) in &morphism.op_map {
        if old != new && domain.has_op(old) && codomain.has_op(new) {
            op_renames.push((Arc::clone(old), Arc::clone(new)));
        }
    }

    // Phase 2: Drops
    emit_drops(&mut steps, morphism, domain, codomain);

    // Phase 3: Renames
    emit_renames(&mut steps, &sort_renames, &op_renames);

    // Phase 4: Adds
    emit_adds(&mut steps, morphism, domain, codomain);

    Ok(Factorization {
        steps,
        domain: Arc::clone(&morphism.domain),

20.3.1 Phase ordering

The factorization proceeds in four phases, ordered to maintain well-formedness at each intermediate step:

  1. Identify renames: sort-map and op-map entries where old != new and both names exist in their respective theories.
  2. Emit drops (equations first, then ops, then sorts): remove elements present in the domain but absent from the codomain. Dropping equations before operations, and operations before sorts, ensures the result stays well-formed.
  3. Emit renames: apply the identified sort and operation renames.
  4. Emit adds (sorts first, then ops, then equations): add elements present in the codomain but absent from the (renamed) domain. Sorts are topologically sorted by parameter dependencies to ensure a sort’s parameters exist before the sort is added.
TipWhy this specific phase order?

Dropping before renaming avoids name collisions: if we renamed first, a rename A -> B followed by a drop of B would drop the wrong element. Renaming before adding ensures the adds operate on the already-renamed theory, so new elements can reference renamed sorts without ambiguity.

20.3.2 Topological sort of dependent sorts

When adding new sorts, the emit_adds function performs a topological sort based on SortParam dependencies:

fn emit_adds(
    steps: &mut Vec<TheoryEndofunctor>,
    morphism: &TheoryMorphism,
    domain: &Theory,
    codomain: &Theory,
) {
    let domain_sort_names_after_renames: FxHashSet<Arc<str>> = domain
        .sorts
        .iter()
        .map(|s| morphism.sort_map.get(&s.name).unwrap_or(&s.name).clone())
        .collect();
    let domain_op_names_after_renames: FxHashSet<Arc<str>> = domain
        .ops
        .iter()
        .map(|o| morphism.op_map.get(&o.name).unwrap_or(&o.name).clone())
        .collect();

    // Sorts to add (topologically sorted by parameter deps)
    let sorts_to_add: Vec<_> = codomain
        .sorts
        .iter()
        .filter(|s| !domain_sort_names_after_renames.contains(&s.name))
        .collect();
    let mut added_sorts = domain_sort_names_after_renames;
    let mut sorted_adds = Vec::new();
    let mut remaining = sorts_to_add;
    let max_iterations = remaining.len() + 1;
    for _ in 0..max_iterations {
        if remaining.is_empty() {
            break;
        }
        let (ready, not_ready): (Vec<_>, Vec<_>) = remaining
            .into_iter()
            .partition(|s| s.params.iter().all(|p| added_sorts.contains(&p.sort)));
        for sort in &ready {
            added_sorts.insert(Arc::clone(&sort.name));
            sorted_adds.push((*sort).clone());
        }
        remaining = not_ready;
    }
    for sort in sorted_adds {
        steps.push(TheoryEndofunctor {
            name: Arc::from(format!("add_sort_{}", sort.name)),
            precondition: TheoryConstraint::Unconstrained,
            transform: TheoryTransform::AddSort {
                sort,
                vertex_kind: None,

A sort with parameters depending on other sorts must be added after those dependency sorts. The algorithm iterates until all additions are emitted, with a cycle guard (max_iterations) to prevent infinite loops in malformed inputs.

20.3.3 Factorization result

The Factorization struct holds the ordered sequence of steps plus domain and codomain names:

#[derive(Debug, Clone)]
pub struct Factorization {
    /// Ordered sequence of elementary endofunctors.
    pub steps: Vec<TheoryEndofunctor>,
    /// Domain theory name.
    pub domain: Arc<str>,
    /// Codomain theory name.
    pub codomain: Arc<str>,
}

20.3.4 Validation

validate_factorization applies the step sequence to the domain theory and checks that the result contains all sorts and operations from the codomain:

///
/// Returns [`GatError::FactorizationError`] if the factorized theory is
/// missing sorts or operations from the codomain.
pub fn validate_factorization(
    factorization: &Factorization,
    domain: &Theory,
    codomain: &Theory,
) -> Result<(), GatError> {
    let mut current = domain.clone();
    for step in &factorization.steps {
        current = step.transform.apply(&current)?;
    }
    for sort in &codomain.sorts {
        if !current.has_sort(&sort.name) {
            return Err(GatError::FactorizationError(format!(
                "factorized theory missing sort '{}' from codomain",
                sort.name
            )));
        }
    }
    for op in &codomain.ops {
        if !current.has_op(&op.name) {
            return Err(GatError::FactorizationError(format!(
                "factorized theory missing op '{}' from codomain",
                op.name
            )));
        }

This is an end-to-end correctness check: if validation passes, the factorization faithfully represents the original morphism.

20.4 diff_to_protolens: mapping transforms to protolenses

Each TheoryEndofunctor in a factorization maps directly to an elementary protolens constructor:

TheoryTransform variant elementary:: constructor Complement
AddSort(s) add_sort(s.name, s.name, Null) Empty
DropSort(n) drop_sort(n) DroppedSortData
RenameSort{old,new} rename_sort(old, new) Empty
AddOp(op) add_op(op.name, ...) Empty
DropOp(n) drop_op(n) DroppedOpData
RenameOp{old,new} rename_op(old, new) Empty
AddEquation(eq) add_equation(eq) Empty
DropEquation(n) drop_equation(n) Empty
Pullback(morph) pullback(morph) Empty

The conversion is mechanical: each transform step becomes a protolens step with the same semantics. The resulting Vec<Protolens> is wrapped in a ProtolensChain for instantiation.

20.5 Symmetric lens construction from overlap discovery

When two schemas overlap partially, the pipeline constructs a symmetric lens by:

  1. Discovering overlap: use find_best_morphism (from panproto-mig/src/overlap.rs) with monic: true to find the maximal common substructure.
  2. Building protolens chains for each direction: factorize the morphisms from each schema to the overlap, producing two protolens chains (one forward, one backward).1
  3. Instantiating both chains: produce \(\text{Lens}(S_1, \text{overlap})\) and \(\text{Lens}(S_2, \text{overlap})\).
  4. Composing into a symmetric lens: combine the forward lens from \(S_1\) to the overlap with the backward lens from the overlap to \(S_2\).

The overlap serves as the “common view” that both schemas can project onto. Data not in the overlap is captured in the respective complements.

flowchart LR
    S1["Schema S1"] -->|"project"| O["Overlap"]
    S2["Schema S2"] -->|"project"| O
    O -->|"embed"| S1
    O -->|"embed"| S2

    style O fill:#e8f4e8,stroke:#2d7d2d
Figure 20.2: Symmetric lens construction via overlap discovery. The overlap is the greatest common subschema.

20.6 WASM boundary

Protolens chains cross the WASM boundary via handle-based allocation, consistent with panproto’s handle/slab architecture (see Chapter 13):

  • create_protolens_chain: allocates a ProtolensChain in the Rust slab, returns a u32 handle to JavaScript.
  • instantiate_protolens: takes a chain handle and a schema handle, instantiates the chain, and returns a lens handle.
  • protolens_chain_info: returns metadata (step count, complement types) as a MessagePack-encoded blob.

The ComplementConstructor enum is serialized via serde into MessagePack for transmission across the WASM boundary. The TypeScript SDK deserializes it into a discriminated union type.

Tip

Protolens chains are lightweight on the Rust side (no computation until instantiation), so it’s cheap to create multiple chains speculatively in JavaScript, evaluate their applicability, and instantiate only the best match.

20.7 CLI handlers

The CLI exposes protolens functionality through several subcommands:

Command Description
prot convert Convert data between schema versions using the automated pipeline
prot lens Build a lens from a combinator chain or protolens specification
prot lens-apply Apply a lens to an instance (get or put direction)
prot lens-verify Verify GetPut and PutGet laws on a lens with test data
prot lens-compose Compose two lenses sequentially
prot lens-diff Compute the protolens chain between two schemas

The lens-diff command is the primary entry point for the automated pipeline:

prot lens-diff --src schema-v1.json --tgt schema-v2.json --protocol json-schema

This runs the full pipeline (hom_search, factorize, diff_to_protolens) and outputs the protolens chain as a JSON array of steps. Each step includes the transform names, preconditions, and complement constructors, providing a human-readable audit trail of the derived migration.

The lens-verify command property-tests the lens laws:

prot lens-verify --lens lens.json --instance data.json --iterations 1000

This generates random modifications to the instance data and verifies that GetPut and PutGet hold for each modification.

20.8 Fleet application

apply_to_fleet iterates over a collection of schemas, checks each against the chain’s preconditions, and either instantiates the chain or records the failure:

pub fn apply_to_fleet(
    chain: &ProtolensChain,
    schemas: &[Schema],
    protocol: &Protocol,
) -> FleetResult {
    let mut applied = Vec::new();
    let mut skipped = Vec::new();
    for schema in schemas {
        let failures = chain.check_applicability(schema);
        if failures.is_empty() {
            match chain.instantiate(schema, protocol) {
                Ok(lens) => applied.push(FleetEntry { schema, lens }),
                Err(e) => skipped.push(SkippedEntry {
                    schema,
                    reasons: vec![e.to_string()],
                }),
            }
        } else {
            skipped.push(SkippedEntry {
                schema,
                reasons: failures.iter().map(|f| f.to_string()).collect(),
            });
        }
    }
    FleetResult { applied, skipped }
}

FleetResult partitions outcomes into applied (successful instantiations) and skipped (schemas that failed precondition checks or instantiation). Failed instantiations are skipped rather than fatal; a single broken schema doesn’t abort the fleet migration.

20.9 Complement state tracking

chain_complement_spec advances a current_schema through each step’s target transform before evaluating the next step’s complement. Each step’s complement is computed against the schema as it exists after all preceding steps. This means a drop_sort("metadata") followed by add_sort("metadata_v2") correctly reports the drop’s captured data count from the original schema and the add’s default requirement from the intermediate schema.

The AddedElement variant of ComplementConstructor handles add_sort and add_op protolenses. These require default values at instantiation time. AddedElement { name, kind } records what the user must supply, and chain.requirements() aggregates these into a complete list of defaults needed for the chain to instantiate.

20.10 Serialization format

Protolens chains serialize to JSON with the following structure:

{
  "steps": [
    {
      "name": "rename_sort",
      "source": {
        "transform": "Identity",
        "constraint": "Unconstrained"
      },
      "target": {
        "transform": { "RenameSort": { "old": "Vertex", "new": "Node" } },
        "constraint": { "HasSort": "Vertex" }
      },
      "complement_constructor": "Empty"
    },
    {
      "name": "drop_op",
      "source": {
        "transform": "Identity",
        "constraint": "Unconstrained"
      },
      "target": {
        "transform": { "DropOp": "legacyEdge" },
        "constraint": { "HasOp": "legacyEdge" }
      },
      "complement_constructor": { "DroppedOpData": { "op": "legacyEdge" } }
    }
  ]
}

Each step includes its name, source and target endofunctors (transform + constraint), and complement constructor. The from_json(to_json(chain)) round-trip produces a chain that is structurally equal to the original: same step count, same transforms, same constraints, same complement constructors.

Individual protolenses also support to_json() / from_json() with the same per-step structure (minus the outer steps array).

20.11 Testing strategy

The protolens engine uses three testing approaches:

20.11.1 Unit tests

Each elementary protolens constructor has unit tests verifying:

  • Applicability: the protolens is applicable to schemas containing the relevant sorts/ops and not applicable otherwise.
  • Instantiation correctness: the resulting lens has the expected source and target schemas.
  • Losslessness classification: is_lossless() returns the correct value.

20.11.2 Commutativity checks

Commutativity is tested by instantiating the same protolens at multiple related schemas and verifying that the resulting lenses commute with schema morphisms between those schemas. The test fixture generates a family of schemas of increasing complexity and checks commutativity for each pair.

20.11.3 Round-trip verification

The check_laws function from panproto-lens/src/laws.rs is applied to instantiated protolenses:

  1. Generate a random instance conforming to the source schema.
  2. Apply get to produce a view and complement.
  3. Apply put with the unmodified view and complement.
  4. Verify that the restored instance equals the original (GetPut).
  5. Apply get to the restored instance.
  6. Verify that the new view and complement equal the originals (PutGet).

20.11.4 Property-based tests

The integration test suite (tests/integration/tests/lens_laws.rs) includes property-based tests that:

  • Generate random schema pairs by applying random sequences of elementary protolens constructors to a base schema.
  • Factorize the resulting morphism.
  • Verify that validate_factorization passes.
  • Instantiate the protolens chain and verify lens laws.

This provides coverage over the full pipeline, catching regressions in factorization ordering, schema transformation, and migration derivation.


  1. Each direction involves a morphism from a full schema to the overlap subschema. Factorizing these morphisms into elementary transforms and mapping them to protolenses yields two chains whose composition forms the symmetric lens.↩︎