16 Protolenses: Reusable Lens Families

Suppose you’re managing a platform where hundreds of user-defined schemas exist, and your legal team mandates a naming convention change: createdAt becomes created_at, updatedAt becomes updated_at. You could hand-write the same lens twice, three times, a hundred times. Or you could write it once and reuse it across all compatible schemas. That’s what a protolens does.

A protolens is a transformation pattern that works across schemas. It’s a pair $(P, F)$: a precondition $P$ that describes which schemas are applicable, and a lens generator $F$ that produces a concrete lens for any schema satisfying $P$. A renameVertex("author", "creator") protolens has precondition “schema must contain a vertex named author” and generates a lens for each schema meeting that requirement.

The power comes from schema-dependence. The complement type depends on the schema: when you remove metadata, the complement stores the removed data. If metadata is a string, the complement stores a string. If metadata is a structured object, the complement stores that object. The same protolens produces different complement shapes for different schemas—this is why protolenses are more than glorified templates.

16.1 Why manual lenses become tedious

Picture a real platform scenario. You have user-defined schemas A, B, C, and more. Each has a metadata block with createdAt and updatedAt fields. Policy changes. You need to normalize to snake_case.

With the combinator API from Section 7.4, you write:

// For schema A
const lensA = pipeline(
  renameField("metadata.createdAt", "metadata.created_at"),
  renameField("metadata.updatedAt", "metadata.updated_at"),
);

// For schema B (identical logic, different schema)
const lensB = pipeline(
  renameField("metadata.createdAt", "metadata.created_at"),
  renameField("metadata.updatedAt", "metadata.updated_at"),
);

// For schema C, D, E, F, ...

Repetition. The transformation pattern stays the same while the target schemas differ. A protolens captures the pattern once and applies it to any schema that matches the precondition.

16.2 What a protolens is

A protolens has two parts:

Precondition $P$: a test on the schema. “Does this schema have fields metadata.createdAt and metadata.updatedAt?”
Lens generator: for any schema $S$ where $P(S)$ holds, produce a lens $\ell_S : S \rightleftarrows F(S)$, where $F(S)$ is the transformed schema.

The schema $S$ varies. The transformation pattern stays fixed. The output type depends on the input: different schemas produce different lenses with different complement structures.¹

The lens laws from Chapter 7 (GetPut and PutGet) hold for every instantiated lens $\ell_S$. These aren’t re-derived; they follow by construction from the elementary protolens constructors.

16.3 Schema transforms

The function $F$ above—which modifies a schema—is called a schema transform. panproto defines a library of elementary schema transforms, each a TheoryTransform variant:

Table 16.1: Elementary schema transforms.

Schema transform	What it does	Example
`RenameVertex(old, new)`	Rename a vertex ID	`author` becomes `creator`
`RenameEdge(old, new)`	Rename an edge kind	`hasAuthor` becomes `hasCreator`
`AddVertex(spec)`	Add a vertex with given kind	Add `version: int`
`RemoveVertex(id)`	Remove a vertex and its edges	Drop `internalId`
`AddEdge(spec)`	Add an edge between existing vertices	Add `author -> org` edge
`RemoveEdge(id)`	Remove an edge	Drop `replyTo` edge
`WrapVertex(id, wrapper)`	Nest a vertex inside a new container	`address` becomes `address.street` inside `address`
`HoistVertex(id)`	Move a nested vertex up one level	`profile.name` becomes `name`
`CoerceType(id, target)`	Change a vertex’s data type	`age: string` becomes `age: int`
`MergeVertices(ids, into)`	Merge multiple vertices	`firstName` + `lastName` into `fullName`
`SplitVertex(id, into)`	Split a vertex into multiple	`fullName` into `firstName` + `lastName`

Each transform has a precondition. RenameVertex("author", "creator") requires author to exist. RemoveVertex("internalId") requires the vertex to exist. WrapVertex("address", "location") requires address to exist and location to not exist yet.

Transforms compose into pipelines. A sequence $F_1, F_2, \ldots, F_n$ applies step by step, and the precondition for the composite checks each step against the schema as it exists after previous steps: $P_1(S) \wedge P_2(F_1(S)) \wedge \ldots \wedge P_n(F_{n-1}(\ldots F_1(S)))$.

16.4 Instantiation: materializing a lens

You have a protolens. You have a schema. Instantiation checks the precondition and produces a concrete lens if it passes.

import { Panproto, ProtolensChainHandle } from "@panproto/core";

// Define a protolens chain
const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema);

// Instantiate against a specific schema
const lens = chain.instantiate(concreteSchema);

// Use the lens as in @sec-lenses
const { view, complement } = lens.get(sourceData);
const restored = lens.put(modifiedView, complement);

If concreteSchema doesn’t satisfy the precondition, instantiation fails with details:

error: protolens precondition not satisfied
  step 2: RemoveVertex("internalId")
  requires: vertex "internalId" exists in schema
  but: schema "my-schema-v3" has no vertex "internalId"
  hint: this schema may have already undergone this transformation

The same chain instantiates against many schemas:

const schemas = await Panproto.listSchemas();
for (const schema of schemas) {
  try {
    const lens = chain.instantiate(schema);
    console.log(`${schema.name}: lens ready`);
  } catch (e) {
    console.log(`${schema.name}: not applicable: ${e.message}`);
  }
}

What if a schema has already been transformed?

If a schema underwent the transformation already (e.g., internalId was already removed in a prior version), instantiation fails. This is intentional: applying the same protolens twice violates the precondition. Use check_applicability to filter schemas before instantiation.

Answer

Instantiation fails. The precondition checks whether the schema has the elements the protolens expects. If a prior transformation already removed internalId, the RemoveVertex("internalId") step’s precondition is not satisfied, and instantiation reports which step failed and why. Use check_applicability to filter schemas before instantiation.

16.5 Commutativity: applying protolenses before or after migration

Here’s a useful property: applying a protolens and then migrating gives the same result as migrating first and then applying the protolens.²

Suppose you have schemas v1 and v2 with a migration between them. You also have a protolens that renames createdAt to created_at. You can:

Migrate data from v1 to v2, then apply the rename protolens to v2 data, or
Apply the rename protolens to v1 data, then migrate the renamed data from v1 to v2

Both paths produce the same output. In a migration pipeline $S_1 \to S_2 \to S_3$, you can apply a schema-level protolens at any point and get consistent results. Without this property, the order would matter, and composing protolenses with migrations would require careful sequencing.

Does this hold for all protolens constructors?

Commutativity holds for all 11 elementary constructors and their compositions, provided the migration preserves the protolens’s precondition. If the migration removes a vertex the protolens needs, both sides of the equation fail (the protolens isn’t applicable), and the property holds vacuously.

Answer

Yes, for all 11 elementary constructors and their compositions, provided the migration preserves the protolens’s precondition. If the migration removes a vertex the protolens requires, the protolens is not applicable post-migration, and both sides fail—the commutativity condition holds vacuously.

16.6 The 11 elementary protolens constructors

Each schema transform from Table 16.1 has a corresponding protolens constructor:

16.6.1 1. `renameVertex`

const pl = Protolens.renameVertex("author", "creator");
// precondition: schema has vertex "author"
// effect: renames to "creator", all edges updated
// complement: empty (bijective)

16.6.2 2. `renameEdge`

const pl = Protolens.renameEdge("hasAuthor", "hasCreator");
// precondition: schema has edge kind "hasAuthor"
// effect: renames edge kind
// complement: empty (bijective)

16.6.3 3. `addVertex`

const pl = Protolens.addVertex("version", { kind: "int", default: 1 });
// precondition: schema does NOT have vertex "version"
// effect: adds vertex with default value
// complement: stores added value (for removal on put)

16.6.4 4. `removeVertex`

const pl = Protolens.removeVertex("internalId");
// precondition: schema has vertex "internalId"
// effect: removes vertex and its edges
// complement: stores removed vertex data

16.6.5 5. `addEdge`

const pl = Protolens.addEdge("worksAt", { from: "author", to: "org" });
// precondition: both endpoints exist, edge does not
// effect: adds edge (data must be provided or defaulted)
// complement: stores added edge data

16.6.6 6. `removeEdge`

const pl = Protolens.removeEdge("replyTo");
// precondition: schema has edge "replyTo"
// effect: removes edge
// complement: stores removed edge data

16.6.7 7. `wrapVertex`

const pl = Protolens.wrapVertex("street", { wrapper: "address" });
// precondition: "street" exists, "address" does not
// effect: creates "address" container, nests "street" inside
// complement: stores wrapper structure

16.6.8 8. `hoistVertex`

const pl = Protolens.hoistVertex("address.street");
// precondition: "address.street" exists
// effect: moves "street" up to parent level
// complement: stores original nesting

16.6.9 9. `coerceType`

const pl = Protolens.coerceType("age", { from: "string", to: "int" });
// precondition: "age" exists with kind "string"
// effect: converts value using registered coercion
// complement: empty if coercion is bijective, stores original otherwise

16.6.10 10. `mergeVertices`

const pl = Protolens.mergeVertices(["firstName", "lastName"], {
  into: "fullName",
  merge: (first, last) => `${first} ${last}`,
  split: (full) => { const [f, ...l] = full.split(" "); return [f, l.join(" ")]; },
});
// precondition: all source vertices exist
// effect: merges into single vertex
// complement: stores split function result for round-trip

16.6.11 11. `splitVertex`

const pl = Protolens.splitVertex("fullName", {
  into: ["firstName", "lastName"],
  split: (full) => { const [f, ...l] = full.split(" "); return [f, l.join(" ")]; },
  merge: (first, last) => `${first} ${last}`,
});
// precondition: "fullName" exists
// effect: splits into multiple vertices
// complement: stores merge function result for round-trip

16.7 Schema-dependent complement types

In Section 7.2 we saw that complements are data structures. For protolenses, the complement structure depends on the schema. A removeVertex("internalId") protolens produces different complements:

Schema A has internalId: string → complement stores a string
Schema B has internalId: { hash: bytes, counter: int } → complement stores an object

The complement type is a function from schemas to types. In panproto’s implementation, complements are dynamically typed (CBOR-serialized), so this dependence is handled at runtime. The complement structure—which fields it contains, how they nest—varies with the schema.

This is what distinguishes protolenses from templates. A template would produce the same complement every time. A protolens produces a complement whose structure matches the schema it was instantiated against.

16.8 Trying it from the CLI

The prot protolens command shows the protolens chain between two schemas:

$ prot protolens old.json new.json

Protolens chain (3 steps):
  1. RenameVertex("userName" -> "handle")
  2. RemoveVertex("internalId")
     complement: stores { internalId: string }
  3. AddVertex("version", default: 1)

Precondition:
  - vertex "userName" must exist
  - vertex "internalId" must exist
  - vertex "version" must NOT exist

Applicable to: any schema satisfying the precondition

Instantiate against a specific schema:

$ prot protolens old.json new.json --instantiate my-schema.json

Instantiated lens for "my-schema-v2":
  get: my-schema-v2 -> my-schema-v2' (3 fields affected)
  put: my-schema-v2' x complement -> my-schema-v2
  complement size: ~48 bytes per record

Convert data through the instantiated lens:

$ prot convert --protolens old.json new.json --schema my-schema.json data.json

16.9 Using protolenses in TypeScript

The full workflow:

import { Panproto, ProtolensChainHandle, LensHandle } from "@panproto/core";

// Step 1: Generate a protolens chain from two reference schemas
const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema);

// Step 2: Inspect the chain
console.log(chain.steps);
// [
//   { type: "RenameVertex", from: "userName", to: "handle" },
//   { type: "RemoveVertex", id: "internalId" },
//   { type: "AddVertex", id: "version", default: 1 },
// ]

// Step 3: Instantiate against any compatible schema
const lens: LensHandle = chain.instantiate(mySchema);

// Step 4: Use the lens
const { view, complement } = lens.get(sourceData);
// ... modify view ...
const restored = lens.put(modifiedView, complement);

// Step 5: Convert data directly
const converted = Panproto.convert(sourceData, { protolens: chain, schema: mySchema });

16.10 Using protolenses in Python

The equivalent Python workflow:

from panproto import Panproto, ProtolensChainHandle, LensHandle

# Step 1: Generate a protolens chain from two reference schemas
chain = ProtolensChainHandle.auto_generate(old_schema, new_schema)

# Step 2: Inspect the chain
print(chain.steps)
# [
#   RenameVertex(from_="userName", to="handle"),
#   RemoveVertex(id="internalId"),
#   AddVertex(id="version", default=1),
# ]

# Step 3: Instantiate against any compatible schema
lens: LensHandle = chain.instantiate(my_schema)

# Step 4: Use the lens
view, complement = lens.get(source_data)
# ... modify view ...
restored = lens.put(modified_view, complement)

# Step 5: Convert data directly
converted = Panproto.convert(source_data, protolens=chain, schema=my_schema)

16.11 Serialization for reuse

Protolens chains serialize to JSON losslessly:

const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema);

// Serialize to JSON
const json = chain.toJson();
fs.writeFileSync("policy.json", JSON.stringify(json));

// Deserialize from JSON
const restored = ProtolensChainHandle.fromJson(json);
const lens = restored.instantiate(someSchema);

chain = ProtolensChainHandle.auto_generate(old_schema, new_schema)

# Serialize
policy = chain.to_json()
Path("policy.json").write_text(json.dumps(policy))

# Deserialize
restored = ProtolensChainHandle.from_json(policy)
lens = restored.instantiate(some_schema)

Three reasons this matters:

Distributing policies. A platform team defines a protolens chain and publishes it as JSON. Downstream teams apply it to their own schemas without regenerating.
Version control. Protolens chains committed alongside schema versions provide an auditable migration history.
Deferred application. A chain created today can apply to schemas that don’t yet exist. When a new schema arrives, deserialize and instantiate.

From the CLI:

schema lens old.json new.json --protocol atproto --chain > policy.json

16.12 Checking which schemas are compatible

The check_applicability method returns failure reasons instead of a boolean:

const reasons = chain.checkApplicability(schema);
if (reasons.length > 0) {
  console.log("Not applicable:");
  for (const r of reasons) {
    console.log(`  step ${r.step}: ${r.reason}`);
  }
} else {
  const lens = chain.instantiate(schema);
}

reasons = chain.check_applicability(schema)
if reasons:
    for r in reasons:
        print(f"  step {r.step}: {r.reason}")
else:
    lens = chain.instantiate(schema)

applicable_to is a boolean. check_applicability tells you why not: which step failed, what’s missing, and what the schema would need. Use this before attempting instantiation on a batch of schemas.

From the CLI:

schema lens-verify --check-fleet ./schemas/

16.13 Applying a protolens to many schemas

apply_to_fleet runs a protolens chain across multiple schemas, partitioning results into successes and failures:

import { applyToFleet } from "@panproto/core";

const result = applyToFleet(chain, schemas, protocol);

console.log(`Applied to ${result.applied.length} schemas`);
for (const entry of result.applied) {
  console.log(`  ${entry.schema.name}: lens ready`);
}

console.log(`Skipped ${result.skipped.length} schemas`);
for (const entry of result.skipped) {
  console.log(`  ${entry.schema.name}: ${entry.reasons.join(", ")}`);
}

from panproto import apply_to_fleet

result = apply_to_fleet(chain, schemas, protocol)

print(f"Applied to {len(result.applied)} schemas")
for entry in result.applied:
    print(f"  {entry.schema.name}: lens ready")

print(f"Skipped {len(result.skipped)} schemas")
for entry in result.skipped:
    print(f"  {entry.schema.name}: {', '.join(entry.reasons)}")

From the CLI, --dry-run produces a report without applying changes:

schema lens-fleet policy.json ./schemas/ --protocol atproto --dry-run

Sample output:

Fleet migration report:
  Applied (12 schemas):
    users.json: 3 steps, complement: Empty
    posts.json: 3 steps, complement: DataCaptured(1 field)
    comments.json: 3 steps, complement: Empty
    ...
  Skipped (2 schemas):
    legacy-events.json: step 1 requires vertex "metadata" (not found)
    internal-logs.json: step 2 requires edge kind "hasAuthor" (not found)

16.14 What protolenses owe to prior work

The concept of a schema-parameterized lens family with schema-dependent complements is panproto’s contribution. But the concept builds on established ideas.

16.14.1 Lenses (Foster et al. 2007)

The asymmetric lens framework of Foster et al. (2007) defines a lens as a pair of get/put functions between two fixed types, satisfying the GetPut and PutGet laws (Chapter 7). A Foster-style lens binds to specific source and view types. If you want the same transformation (say, renaming a field) applied across many schemas, you write many lenses. A protolens writes one.

See: Combinators for Bidirectional Tree Transformations

16.14.2 Edit lenses and Cambria (Hofmann, Pierce, Wagner 2011; Litt et al. 2022)

Cambria (Litt et al. 2022) pioneered the combinator approach to schema evolution, building on the edit lenses of Hofmann et al. (2011). Cambria’s combinators (rename, add, remove, wrap, hoist, convert) directly inspired panproto’s elementary protolens constructors.

The distinction lies in schema handling:

Table 16.2: Cambria vs. panproto protolenses.

	Cambria	panproto protolenses
Schema awareness	Implicit: operates on whatever JSON document you hand it	Explicit: takes a schema as input, checks preconditions
Scope	Flat JSON objects	Recursive schemas with arbitrary nesting, any protocol
Complement	Implicit (encoded in edit lens state)	Explicit and schema-dependent: different schemas produce different complement structures
Precondition checking	None: silently passes through unknown fields	Verified at instantiation time with diagnostic messages
Reuse across schemas	Copy-paste combinator code	Single protolens chain, instantiated against many schemas
Commutativity	Not formalized	Guaranteed: applying a protolens before or after migration produces the same result

Cambria is well-suited for its domain: schema evolution in CRDT-based editors where schemas are implicit and changes are simple. Protolenses target platform-wide governance where changes must be verified across many schemas. If you work with a single schema evolving incrementally, Cambria-style combinators (available via the pipeline() API from Chapter 7) may suffice.

See: Project Cambria, Symmetric Lenses

16.14.3 Polymorphic lenses / lens families (Kmett 2012)

Haskell’s lens library defines a Lens s t a b with four type parameters, allowing source and target types to differ. A “lens family” works across different type instantiations. Protolenses are conceptually similar (one definition, many instantiations) but parameterized over schemas rather than types, and the parameterization affects the complement structure, not just the source/target types.

See: lens on Hackage

16.14.4 Delta lenses (Johnson and Rosebrugh 2013)

Delta lenses (Johnson and Rosebrugh 2013) generalize asymmetric lenses by making source and view categories rather than sets, with get as a functor between them. This is the closest categorical framework to what protolenses do: a protolens can be understood as a delta lens where the source category is the category of schemas satisfying a precondition. The commutativity property of protolenses (applying a protolens commutes with schema migration) is the same kind of coherence condition delta lenses formalize.

See: Delta Lenses and Opfibrations

16.14.5 What’s new in panproto’s protolenses

Three things together distinguish protolenses from prior work:

Schema-dependent complements. A removeVertex("metadata") protolens produces a complement storing a string if metadata is a string, or an object if metadata is structured. The complement type varies with the schema. Cambria and Haskell lens families don’t track this.
Precondition checking with diagnostics. Instantiation verifies that a schema satisfies the protolens’s requirements and reports which step failed and why. Cambria combinators don’t check; they silently pass through unrecognized fields.
Commutativity with migration. Applying a protolens before or after a schema migration produces the same result. This isn’t a property of Cambria (which lacks a formal migration framework) or Haskell lens families (which don’t operate in a schema-migration context).

Foster, J. Nathan, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. “Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-Update Problem.” ACM Transactions on Programming Languages and Systems 29 (3): Article 17. https://doi.org/10.1145/1232420.1232424.

Hofmann, Martin, Benjamin C. Pierce, and Daniel Wagner. 2011. “Symmetric Lenses.” Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011), 371–84. https://doi.org/10.1145/1926385.1926428.

Johnson, Michael, and Robert Rosebrugh. 2013. “Delta Lenses and Opfibrations.” Proceedings of the 2nd International Workshop on Bidirectional Transformations (BX 2013), Electronic communications of the EASST, vol. 57. https://doi.org/10.14279/tuj.eceasst.57.875.

Litt, Geoffrey, Martin Kleppmann, and Marc Shapiro. 2022. Project Cambria: Schema Evolution for CRDTs. Ink & Switch research essay. https://www.inkandswitch.com/cambria/.

Formally, a protolens is a dependent function $\Pi(S : \mathsf{Schema} \mid P(S)). \mathsf{Lens}(S, F(S))$. $S$ ranges over schemas satisfying $P$, and $F$ transforms the schema. See Appendix A.↩︎
This is a naturality condition in the categorical sense. See Appendix A.↩︎

# Protolenses: Reusable Lens Families {#sec-protolenses} Suppose you're managing a platform where hundreds of user-defined schemas exist, and your legal team mandates a naming convention change: `createdAt` becomes `created_at`, `updatedAt` becomes `updated_at`. You could hand-write the same lens twice, three times, a hundred times. Or you could write it once and reuse it across all compatible schemas. That's what a protolens does. A protolens is a **transformation pattern** that works across schemas. It's a pair $(P, F)$: a precondition $P$ that describes which schemas are applicable, and a lens generator $F$ that produces a concrete lens for any schema satisfying $P$. A `renameVertex("author", "creator")` protolens has precondition "schema must contain a vertex named `author`" and generates a lens for each schema meeting that requirement. The power comes from schema-dependence. The complement type depends on the schema: when you remove `metadata`, the complement stores the removed data. If `metadata` is a string, the complement stores a string. If `metadata` is a structured object, the complement stores that object. The same protolens produces different complement shapes for different schemas—this is why protolenses are more than glorified templates. ## Why manual lenses become tedious {#sec-manual-lens-problem} Picture a real platform scenario. You have user-defined schemas A, B, C, and more. Each has a `metadata` block with `createdAt` and `updatedAt` fields. Policy changes. You need to normalize to snake_case. With the combinator API from @sec-combinators, you write: ```{.typescript} // For schema A const lensA = pipeline( renameField("metadata.createdAt", "metadata.created_at"), renameField("metadata.updatedAt", "metadata.updated_at"), ); // For schema B (identical logic, different schema) const lensB = pipeline( renameField("metadata.createdAt", "metadata.created_at"), renameField("metadata.updatedAt", "metadata.updated_at"), ); // For schema C, D, E, F, ... ``` Repetition. The transformation pattern stays the same while the target schemas differ. A protolens captures the pattern once and applies it to any schema that matches the precondition. ## What a protolens is {#sec-protolens-definition} A protolens has two parts: 1. **Precondition $P$**: a test on the schema. "Does this schema have fields `metadata.createdAt` and `metadata.updatedAt`?" 2. **Lens generator**: for any schema $S$ where $P(S)$ holds, produce a lens $\ell_S : S \rightleftarrows F(S)$, where $F(S)$ is the transformed schema. The schema $S$ varies. The transformation pattern stays fixed. The output type depends on the input: different schemas produce different lenses with different complement structures.^[Formally, a protolens is a dependent function $\Pi(S : \mathsf{Schema} \mid P(S)). \mathsf{Lens}(S, F(S))$. $S$ ranges over schemas satisfying $P$, and $F$ transforms the schema. See [Appendix A](../appendices/A-formal-foundations.qmd).] The lens laws from @sec-lenses (GetPut and PutGet) hold for every instantiated lens $\ell_S$. These aren't re-derived; they follow by construction from the elementary protolens constructors. ## Schema transforms {#sec-schema-transforms} The function $F$ above—which modifies a schema—is called a **schema transform**. panproto defines a library of elementary schema transforms, each a `TheoryTransform` variant: | Schema transform | What it does | Example | |-------------|-------------|---------| | `RenameVertex(old, new)` | Rename a vertex ID | `author` becomes `creator` | | `RenameEdge(old, new)` | Rename an edge kind | `hasAuthor` becomes `hasCreator` | | `AddVertex(spec)` | Add a vertex with given kind | Add `version: int` | | `RemoveVertex(id)` | Remove a vertex and its edges | Drop `internalId` | | `AddEdge(spec)` | Add an edge between existing vertices | Add `author -> org` edge | | `RemoveEdge(id)` | Remove an edge | Drop `replyTo` edge | | `WrapVertex(id, wrapper)` | Nest a vertex inside a new container | `address` becomes `address.street` inside `address` | | `HoistVertex(id)` | Move a nested vertex up one level | `profile.name` becomes `name` | | `CoerceType(id, target)` | Change a vertex's data type | `age: string` becomes `age: int` | | `MergeVertices(ids, into)` | Merge multiple vertices | `firstName` + `lastName` into `fullName` | | `SplitVertex(id, into)` | Split a vertex into multiple | `fullName` into `firstName` + `lastName` | : Elementary schema transforms. {#tbl-schema-transforms} Each transform has a precondition. `RenameVertex("author", "creator")` requires `author` to exist. `RemoveVertex("internalId")` requires the vertex to exist. `WrapVertex("address", "location")` requires `address` to exist and `location` to not exist yet. Transforms compose into pipelines. A sequence $F_1, F_2, \ldots, F_n$ applies step by step, and the precondition for the composite checks each step against the schema *as it exists after previous steps*: $P_1(S) \wedge P_2(F_1(S)) \wedge \ldots \wedge P_n(F_{n-1}(\ldots F_1(S)))$. ## Instantiation: materializing a lens {#sec-protolens-instantiation} You have a protolens. You have a schema. **Instantiation** checks the precondition and produces a concrete lens if it passes. ```{.typescript} import { Panproto, ProtolensChainHandle } from "@panproto/core"; // Define a protolens chain const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema); // Instantiate against a specific schema const lens = chain.instantiate(concreteSchema); // Use the lens as in @sec-lenses const { view, complement } = lens.get(sourceData); const restored = lens.put(modifiedView, complement); ``` If `concreteSchema` doesn't satisfy the precondition, instantiation fails with details: ``` error: protolens precondition not satisfied step 2: RemoveVertex("internalId") requires: vertex "internalId" exists in schema but: schema "my-schema-v3" has no vertex "internalId" hint: this schema may have already undergone this transformation ``` The same chain instantiates against many schemas: ```{.typescript} const schemas = await Panproto.listSchemas(); for (const schema of schemas) { try { const lens = chain.instantiate(schema); console.log(`${schema.name}: lens ready`); } catch (e) { console.log(`${schema.name}: not applicable: ${e.message}`); } } ``` :::{.callout-caution} ## What if a schema has already been transformed? If a schema underwent the transformation already (e.g., `internalId` was already removed in a prior version), instantiation fails. This is intentional: applying the same protolens twice violates the precondition. Use `check_applicability` to filter schemas before instantiation. ::: ::: {.callout-tip collapse=true} ## Answer Instantiation fails. The precondition checks whether the schema has the elements the protolens expects. If a prior transformation already removed `internalId`, the `RemoveVertex("internalId")` step's precondition is not satisfied, and instantiation reports which step failed and why. Use `check_applicability` to filter schemas before instantiation. ::: ## Commutativity: applying protolenses before or after migration {#sec-naturality} Here's a useful property: applying a protolens and then migrating gives the same result as migrating first and then applying the protolens.^[This is a *naturality condition* in the categorical sense. See [Appendix A](../appendices/A-formal-foundations.qmd).] Suppose you have schemas v1 and v2 with a migration between them. You also have a protolens that renames `createdAt` to `created_at`. You can: 1. Migrate data from v1 to v2, *then* apply the rename protolens to v2 data, or 2. Apply the rename protolens to v1 data, *then* migrate the renamed data from v1 to v2 Both paths produce the same output. In a migration pipeline $S_1 \to S_2 \to S_3$, you can apply a schema-level protolens at any point and get consistent results. Without this property, the order would matter, and composing protolenses with migrations would require careful sequencing. :::{.callout-caution} ## Does this hold for all protolens constructors? Commutativity holds for all 11 elementary constructors and their compositions, provided the migration preserves the protolens's precondition. If the migration removes a vertex the protolens needs, both sides of the equation fail (the protolens isn't applicable), and the property holds vacuously. ::: ::: {.callout-tip collapse=true} ## Answer Yes, for all 11 elementary constructors and their compositions, provided the migration preserves the protolens's precondition. If the migration removes a vertex the protolens requires, the protolens is not applicable post-migration, and both sides fail—the commutativity condition holds vacuously. ::: ## The 11 elementary protolens constructors {#sec-elementary-protolenses} Each schema transform from @tbl-schema-transforms has a corresponding protolens constructor: ### 1. `renameVertex` ```{.typescript} const pl = Protolens.renameVertex("author", "creator"); // precondition: schema has vertex "author" // effect: renames to "creator", all edges updated // complement: empty (bijective) ``` ### 2. `renameEdge` ```{.typescript} const pl = Protolens.renameEdge("hasAuthor", "hasCreator"); // precondition: schema has edge kind "hasAuthor" // effect: renames edge kind // complement: empty (bijective) ``` ### 3. `addVertex` ```{.typescript} const pl = Protolens.addVertex("version", { kind: "int", default: 1 }); // precondition: schema does NOT have vertex "version" // effect: adds vertex with default value // complement: stores added value (for removal on put) ``` ### 4. `removeVertex` ```{.typescript} const pl = Protolens.removeVertex("internalId"); // precondition: schema has vertex "internalId" // effect: removes vertex and its edges // complement: stores removed vertex data ``` ### 5. `addEdge` ```{.typescript} const pl = Protolens.addEdge("worksAt", { from: "author", to: "org" }); // precondition: both endpoints exist, edge does not // effect: adds edge (data must be provided or defaulted) // complement: stores added edge data ``` ### 6. `removeEdge` ```{.typescript} const pl = Protolens.removeEdge("replyTo"); // precondition: schema has edge "replyTo" // effect: removes edge // complement: stores removed edge data ``` ### 7. `wrapVertex` ```{.typescript} const pl = Protolens.wrapVertex("street", { wrapper: "address" }); // precondition: "street" exists, "address" does not // effect: creates "address" container, nests "street" inside // complement: stores wrapper structure ``` ### 8. `hoistVertex` ```{.typescript} const pl = Protolens.hoistVertex("address.street"); // precondition: "address.street" exists // effect: moves "street" up to parent level // complement: stores original nesting ``` ### 9. `coerceType` ```{.typescript} const pl = Protolens.coerceType("age", { from: "string", to: "int" }); // precondition: "age" exists with kind "string" // effect: converts value using registered coercion // complement: empty if coercion is bijective, stores original otherwise ``` ### 10. `mergeVertices` ```{.typescript} const pl = Protolens.mergeVertices(["firstName", "lastName"], { into: "fullName", merge: (first, last) => `${first} ${last}`, split: (full) => { const [f, ...l] = full.split(" "); return [f, l.join(" ")]; }, }); // precondition: all source vertices exist // effect: merges into single vertex // complement: stores split function result for round-trip ``` ### 11. `splitVertex` ```{.typescript} const pl = Protolens.splitVertex("fullName", { into: ["firstName", "lastName"], split: (full) => { const [f, ...l] = full.split(" "); return [f, l.join(" ")]; }, merge: (first, last) => `${first} ${last}`, }); // precondition: "fullName" exists // effect: splits into multiple vertices // complement: stores merge function result for round-trip ``` ## Schema-dependent complement types {#sec-dependent-complements} In @sec-complement we saw that complements are data structures. For protolenses, the complement structure **depends on the schema**. A `removeVertex("internalId")` protolens produces different complements: - Schema A has `internalId: string` → complement stores a string - Schema B has `internalId: { hash: bytes, counter: int }` → complement stores an object The complement type is a function from schemas to types. In panproto's implementation, complements are dynamically typed (CBOR-serialized), so this dependence is handled at runtime. The complement structure—which fields it contains, how they nest—varies with the schema. This is what distinguishes protolenses from templates. A template would produce the same complement every time. A protolens produces a complement whose structure matches the schema it was instantiated against. ## Trying it from the CLI {#sec-protolens-cli} The `prot protolens` command shows the protolens chain between two schemas: ```bash $ prot protolens old.json new.json Protolens chain (3 steps): 1. RenameVertex("userName" -> "handle") 2. RemoveVertex("internalId") complement: stores { internalId: string } 3. AddVertex("version", default: 1) Precondition: - vertex "userName" must exist - vertex "internalId" must exist - vertex "version" must NOT exist Applicable to: any schema satisfying the precondition ``` Instantiate against a specific schema: ```bash $ prot protolens old.json new.json --instantiate my-schema.json Instantiated lens for "my-schema-v2": get: my-schema-v2 -> my-schema-v2' (3 fields affected) put: my-schema-v2' x complement -> my-schema-v2 complement size: ~48 bytes per record ``` Convert data through the instantiated lens: ```bash $ prot convert --protolens old.json new.json --schema my-schema.json data.json ``` ## Using protolenses in TypeScript {#sec-protolens-ts} The full workflow: ```{.typescript} import { Panproto, ProtolensChainHandle, LensHandle } from "@panproto/core"; // Step 1: Generate a protolens chain from two reference schemas const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema); // Step 2: Inspect the chain console.log(chain.steps); // [ // { type: "RenameVertex", from: "userName", to: "handle" }, // { type: "RemoveVertex", id: "internalId" }, // { type: "AddVertex", id: "version", default: 1 }, // ] // Step 3: Instantiate against any compatible schema const lens: LensHandle = chain.instantiate(mySchema); // Step 4: Use the lens const { view, complement } = lens.get(sourceData); // ... modify view ... const restored = lens.put(modifiedView, complement); // Step 5: Convert data directly const converted = Panproto.convert(sourceData, { protolens: chain, schema: mySchema }); ``` ## Using protolenses in Python {#sec-protolens-py} The equivalent Python workflow: ```{.python} from panproto import Panproto, ProtolensChainHandle, LensHandle # Step 1: Generate a protolens chain from two reference schemas chain = ProtolensChainHandle.auto_generate(old_schema, new_schema) # Step 2: Inspect the chain print(chain.steps) # [ # RenameVertex(from_="userName", to="handle"), # RemoveVertex(id="internalId"), # AddVertex(id="version", default=1), # ] # Step 3: Instantiate against any compatible schema lens: LensHandle = chain.instantiate(my_schema) # Step 4: Use the lens view, complement = lens.get(source_data) # ... modify view ... restored = lens.put(modified_view, complement) # Step 5: Convert data directly converted = Panproto.convert(source_data, protolens=chain, schema=my_schema) ``` ## Serialization for reuse {#sec-protolens-serialization} Protolens chains serialize to JSON losslessly: ```{.typescript} const chain = ProtolensChainHandle.autoGenerate(oldSchema, newSchema); // Serialize to JSON const json = chain.toJson(); fs.writeFileSync("policy.json", JSON.stringify(json)); // Deserialize from JSON const restored = ProtolensChainHandle.fromJson(json); const lens = restored.instantiate(someSchema); ``` ```{.python} chain = ProtolensChainHandle.auto_generate(old_schema, new_schema) # Serialize policy = chain.to_json() Path("policy.json").write_text(json.dumps(policy)) # Deserialize restored = ProtolensChainHandle.from_json(policy) lens = restored.instantiate(some_schema) ``` Three reasons this matters: 1. **Distributing policies.** A platform team defines a protolens chain and publishes it as JSON. Downstream teams apply it to their own schemas without regenerating. 2. **Version control.** Protolens chains committed alongside schema versions provide an auditable migration history. 3. **Deferred application.** A chain created today can apply to schemas that don't yet exist. When a new schema arrives, deserialize and instantiate. From the CLI: ```{.sh} schema lens old.json new.json --protocol atproto --chain > policy.json ``` ## Checking which schemas are compatible {#sec-check-applicability} The `check_applicability` method returns failure reasons instead of a boolean: ```{.typescript} const reasons = chain.checkApplicability(schema); if (reasons.length > 0) { console.log("Not applicable:"); for (const r of reasons) { console.log(` step ${r.step}: ${r.reason}`); } } else { const lens = chain.instantiate(schema); } ``` ```{.python} reasons = chain.check_applicability(schema) if reasons: for r in reasons: print(f" step {r.step}: {r.reason}") else: lens = chain.instantiate(schema) ``` `applicable_to` is a boolean. `check_applicability` tells you why not: which step failed, what's missing, and what the schema would need. Use this before attempting instantiation on a batch of schemas. From the CLI: ```{.sh} schema lens-verify --check-fleet ./schemas/ ``` ## Applying a protolens to many schemas {#sec-fleet-migrations} `apply_to_fleet` runs a protolens chain across multiple schemas, partitioning results into successes and failures: ```{.typescript} import { applyToFleet } from "@panproto/core"; const result = applyToFleet(chain, schemas, protocol); console.log(`Applied to ${result.applied.length} schemas`); for (const entry of result.applied) { console.log(` ${entry.schema.name}: lens ready`); } console.log(`Skipped ${result.skipped.length} schemas`); for (const entry of result.skipped) { console.log(` ${entry.schema.name}: ${entry.reasons.join(", ")}`); } ``` ```{.python} from panproto import apply_to_fleet result = apply_to_fleet(chain, schemas, protocol) print(f"Applied to {len(result.applied)} schemas") for entry in result.applied: print(f" {entry.schema.name}: lens ready") print(f"Skipped {len(result.skipped)} schemas") for entry in result.skipped: print(f" {entry.schema.name}: {', '.join(entry.reasons)}") ``` From the CLI, `--dry-run` produces a report without applying changes: ```{.sh} schema lens-fleet policy.json ./schemas/ --protocol atproto --dry-run ``` Sample output: ``` Fleet migration report: Applied (12 schemas): users.json: 3 steps, complement: Empty posts.json: 3 steps, complement: DataCaptured(1 field) comments.json: 3 steps, complement: Empty ... Skipped (2 schemas): legacy-events.json: step 1 requires vertex "metadata" (not found) internal-logs.json: step 2 requires edge kind "hasAuthor" (not found) ``` ## What protolenses owe to prior work {#sec-protolens-related-work} The concept of a schema-parameterized lens family with schema-dependent complements is panproto's contribution. But the concept builds on established ideas. ### Lenses (Foster et al. 2007) The asymmetric lens framework of @foster2007 defines a lens as a pair of `get`/`put` functions between two *fixed* types, satisfying the GetPut and PutGet laws (@sec-lenses). A Foster-style lens binds to specific source and view types. If you want the same transformation (say, renaming a field) applied across many schemas, you write many lenses. A protolens writes one. See: [Combinators for Bidirectional Tree Transformations](https://doi.org/10.1145/1232420.1232424) ### Edit lenses and Cambria (Hofmann, Pierce, Wagner 2011; Litt et al. 2022) [Cambria](https://www.inkandswitch.com/cambria/) [@litt2022] pioneered the combinator approach to schema evolution, building on the *edit lenses* of @hofmann2011. Cambria's combinators (`rename`, `add`, `remove`, `wrap`, `hoist`, `convert`) directly inspired panproto's elementary protolens constructors. The distinction lies in schema handling: | | [Cambria](https://www.inkandswitch.com/cambria/) | panproto protolenses | |---|---------|---------------------| | **Schema awareness** | Implicit: operates on whatever JSON document you hand it | Explicit: takes a schema as input, checks preconditions | | **Scope** | Flat JSON objects | Recursive schemas with arbitrary nesting, any protocol | | **Complement** | Implicit (encoded in edit lens state) | Explicit and schema-dependent: different schemas produce different complement structures | | **Precondition checking** | None: silently passes through unknown fields | Verified at instantiation time with diagnostic messages | | **Reuse across schemas** | Copy-paste combinator code | Single protolens chain, instantiated against many schemas | | **Commutativity** | Not formalized | Guaranteed: applying a protolens before or after migration produces the same result | : Cambria vs. panproto protolenses. {#tbl-cambria-comparison} Cambria is well-suited for its domain: schema evolution in CRDT-based editors where schemas are implicit and changes are simple. Protolenses target platform-wide governance where changes must be verified across many schemas. If you work with a single schema evolving incrementally, Cambria-style combinators (available via the `pipeline()` API from @sec-lenses) may suffice. See: [Project Cambria](https://www.inkandswitch.com/cambria/), [Symmetric Lenses](https://doi.org/10.1145/1926385.1926428) ### Polymorphic lenses / lens families (Kmett 2012) Haskell's [`lens`](https://hackage.haskell.org/package/lens) library defines a `Lens s t a b` with four type parameters, allowing source and target types to differ. A "lens family" works across different type instantiations. Protolenses are conceptually similar (one definition, many instantiations) but parameterized over *schemas* rather than *types*, and the parameterization affects the complement structure, not just the source/target types. See: [lens on Hackage](https://hackage.haskell.org/package/lens) ### Delta lenses (Johnson and Rosebrugh 2013) Delta lenses [@johnson2013] generalize asymmetric lenses by making source and view *categories* rather than sets, with `get` as a functor between them. This is the closest categorical framework to what protolenses do: a protolens can be understood as a delta lens where the source category is the category of schemas satisfying a precondition. The commutativity property of protolenses (applying a protolens commutes with schema migration) is the same kind of coherence condition delta lenses formalize. See: [Delta Lenses and Opfibrations](https://doi.org/10.14279/tuj.eceasst.57.875) ### What's new in panproto's protolenses Three things together distinguish protolenses from prior work: 1. **Schema-dependent complements.** A `removeVertex("metadata")` protolens produces a complement storing a string if `metadata` is a string, or an object if `metadata` is structured. The complement *type* varies with the schema. Cambria and Haskell lens families don't track this. 2. **Precondition checking with diagnostics.** Instantiation verifies that a schema satisfies the protolens's requirements and reports which step failed and why. Cambria combinators don't check; they silently pass through unrecognized fields. 3. **Commutativity with migration.** Applying a protolens before or after a schema migration produces the same result. This isn't a property of Cambria (which lacks a formal migration framework) or Haskell lens families (which don't operate in a schema-migration context).