23 Field Transform Architecture

FieldTransform operates at the value level—the complement to TheoryTransform, which works at the schema level. Where TheoryTransform modifies vertex and edge kinds, FieldTransform modifies the extra_fields of individual nodes during wtype_restrict.

Note

For the user-facing walkthrough of value-dependent migration, see the tutorial.

23.1 The FieldTransform algebra

FieldTransform is a nine-variant enum in panproto-inst/src/wtype.rs. Each variant expresses one kind of node mutation:

Variant	What it does	When to use
`RenameField`	Renames a field by key	Normalizing field names across versions
`DropField`	Removes a field	Removing deprecated fields
`AddField`	Injects a constant field	Providing defaults for new required fields
`KeepFields`	Retains only named fields	Restricting to target schema fields
`ApplyExpr`	Transforms a field value via expression	Type coercions, string normalization
`PathTransform`	Applies any transform to a nested field	Working with nested objects
`ComputeField`	Derives a field from other fields	Computing aggregates or combinations
`Case`	Conditional logic based on field values	Branching on runtime data
`MapReferences`	Updates string fields that hold vertex names	Propagating vertex renames

These transforms compose sequentially: apply_field_transforms iterates a Vec<FieldTransform> and applies each in order. Earlier transforms see the original fields; later transforms see results from all preceding ones. This sequential model is why the builder appends to a vector rather than replacing.

23.2 PathTransform as path functor action

PathTransform wraps an inner transform and a path:

PathTransform {
    path: Vec<String>,
    inner: Box<FieldTransform>,
}

It navigates the extra_fields tree by treating each path segment as a map key lookup. At the path’s end, a temporary Node is constructed with the nested map as its extra_fields. The inner transform runs on that node, then the result gets written back.

Think of this as lifting a transform on flat fields to act on a subtree. An empty path means the transform applies directly to the node’s own fields.

The recursion bottoms out when path.len() == 1. For deeper paths, the function recurses with &path[1..] and the nested map. Paths that don’t resolve (missing keys or non-map values) are silently no-ops—the transform produces no change rather than an error. This design avoids failures on data lacking an expected structure.

23.3 MapReferences as functorial action

When a vertex gets renamed in the schema, any fields holding vertex names as strings must be updated. That’s what MapReferences does:

MapReferences {
    field: String,
    rename_map: HashMap<String, Option<String>>,
}

The Option<String> value handles both rename (Some(new_name)) and removal (None). Implementation in apply_map_references handles two cases:

Flat string fields: Value::Str(s) is looked up in the rename map. If found, it’s replaced or removed.

Encoded array fields: panproto stores JSON arrays in extra_fields as maps with an __array_len sentinel. For example, ["u-url", "external"] becomes {"0": "u-url", "1": "external", "__array_len": 2}. The function detects this encoding, iterates the indexed entries, applies the rename to each string, drops entries mapped to None, and rebuilds with updated indices.

The categorical interpretation: the rename is a morphism in the name-reference algebra, and MapReferences applies that morphism to each element stored in a field. References outside the rename map stay unchanged, which corresponds to the morphism acting as identity there.

23.4 Case as coproduct eliminator

Case branches on node values:

Case {
    branches: Vec<CaseBranch>,
}

pub struct CaseBranch {
    pub predicate: panproto_expr::Expr,
    pub transforms: Vec<FieldTransform>,
}

The evaluator uses the node’s current extra_fields to build an expression environment via build_env_from_extra_fields. Each branch’s predicate is evaluated. The first branch returning Literal::Bool(true) has its transforms applied; no subsequent branches run. If none match, the node is unchanged.

Mathematically, this is the dependent function space $\Pi(x : \text{Node}). \text{FieldTransform}$. The selection happens at runtime, not compile-time, so different nodes can receive entirely different transformations.

Each branch’s transforms list is a full Vec<FieldTransform>, so branches can nest Case blocks, wrap with PathTransform, or use ComputeField. This completeness lets Case express arbitrary conditional logic.

What if no branch matches?

The node passes through unchanged. Add a final branch with a predicate that always evaluates to true if you need a “catch-all.”

23.5 ConditionalSurvival as a refined survival predicate

ConditionalSurvival is stored separately from field_transforms in CompiledMigration:

pub conditional_survival: HashMap<Name, panproto_expr::Expr>,

It’s checked before field transforms during the BFS pass in wtype_restrict. For each child node:

Is the remapped target anchor in surviving_verts? If not, skip (structurally pruned).
Is there a conditional survival predicate for the source anchor? If so, evaluate it.
If the predicate returns false, push the node back on the BFS queue (so descendants can still be processed through ancestor contraction) but don’t add it to the surviving set.
If the predicate returns true (or no predicate exists), add it to the surviving set and apply field transforms.

This two-step design separates structure from value. The schema migration answers: “does this vertex type survive?” The predicate answers: “does this specific node satisfy the condition?” All predicates are evaluated with the same build_env_from_extra_fields environment used for ComputeField and Case.

23.6 build_env_from_extra_fields: variable binding strategy

This shared environment builder is used by ComputeField, Case, and conditional_survival. Its three binding strategies cover different storage conventions:

Flat bindings: every key in extra_fields becomes a top-level variable. A field "level": 2 evaluates as Expr::Var("level") → Literal::Int(2).

Qualified bindings for non-structural keys: most keys (anything except "attrs", "name", "$type", "parents") get a qualified attrs.key binding too. This means predicates can reference attrs.level without knowing whether level is flat or nested.

Nested attrs expansion: if extra_fields contains an "attrs" key holding a map, every entry in that map binds under both the qualified name attrs.key and (unless shadowed) the unqualified key. Predicates written against attrs.level work whether level is flat or nested.

This dual binding exists because different protocols store the same semantic information at different nesting depths. AT Protocol uses top-level keys; some annotation formats use an attrs object. Dual bindings let predicates work across both conventions without format-specific logic.

Why both level and attrs.level?

Different protocols nest the same concept differently. Binding both means predicates work across protocols. A predicate like attrs.level > 2 matches whether level is top-level or inside attrs.

23.7 value_to_expr_literal: Value-to-Literal conversion

value_to_expr_literal converts an instance Value to a panproto_expr::Literal:

`Value` variant	Converts to
`Value::Bool(b)`	`Literal::Bool(b)`
`Value::Int(i)`	`Literal::Int(i)`
`Value::Float(f)`	`Literal::Float(f)`
`Value::Str(s)`	`Literal::Str(s)`
`Value::Unknown` with `__array_len`	Comma-separated string of elements
`Value::Unknown` without `__array_len`	`Literal::Null`
`Value::Null` and others	`Literal::Null`

The encoded-array case deserves explanation. panproto stores JSON arrays in extra_fields as maps because extra_fields is a HashMap<String, Value> without a dedicated list variant. The array ["u-url", "external"] becomes {"0": "u-url", "1": "external", "__array_len": 2}. When this reaches value_to_expr_literal, the __array_len sentinel is detected and array elements are collected and joined with commas: "u-url,external". This lets BuiltinOp::Contains check membership in the serialized array.

The inverse conversion (expr_literal_to_value) normalizes float-valued integers: a Literal::Float with zero fractional part and magnitude within i64 becomes Value::Int for JSON round-trip fidelity. This handles the common case where arithmetic on integers returns 1.0.

23.8 Transform ordering: addattrs, computefield, dropattrs

field_transforms is a Vec<FieldTransform> applied in sequence. Order matters because later transforms see the result of earlier ones.

The recommended convention for migrations needing to add, compute, and clean up fields:

AddField for inputs: add constant fields that expressions need but source data doesn’t carry.
PathTransform for nested normalization: flatten or reorganize before expressions run.
ComputeField and ApplyExpr: compute derived fields using the now-complete input set.
Case: apply conditional logic that may depend on computed fields from step 3.
DropField / KeepFields: remove intermediate fields used as inputs but not in the output.

If you add a level field (step 1) because the source stores it elsewhere, compute name from level (step 3), then drop level (step 5), all three transforms reference the same intermediate state. Reversing steps 3 and 5 would compute name from a missing level variable, leaving name unchanged.

There’s no automatic dependency analysis. Transform authors are responsible for ordering. The builder API appends in registration order, so the first add_field_default fires before the first add_computed_field, which fires before the first add_field_drop.

23.9 Integration with wtype_restrict’s BFS pass

Field transforms are applied inside the fused BFS pass in wtype_restrict, not as a separate post-pass. The execution point is after a node is confirmed to survive (both structural and value-dependent predicates passed) and after its anchor is remapped to the target vertex name.

The relevant portion of the BFS loop:

// apply value-level field transforms if any exist for this vertex
if let Some(transforms) = migration.field_transforms.get(&child_node.anchor) {
    apply_field_transforms(&mut new_node, transforms);
}
new_nodes.insert(child_id, new_node.clone());

The lookup key is child_node.anchor, the source vertex name. The field_transforms map is keyed by source vertex name so migration authors can write migration.add_field_rename("h1", ...) against the source schema. The anchor is remapped on the new_node clone before field transforms run, so by the time apply_field_transforms executes, new_node.anchor holds the target name, but the lookup used the original child_node.anchor.

Important

field_transforms and conditional_survival predicates are both keyed by source vertex name and looked up before anchor remapping. However, the node passed to apply_field_transforms already has its anchor remapped. If your transforms read node.anchor (unusual), they see the target name, not the source.

The BFS pass also handles fan reconstruction (step 5 of wtype_restrict) as a separate pass operating on the original fan list. Field transforms don’t run on fans independently; fans are structural groupings reconstructed after the BFS pass finishes. If a node participating in a fan also has field transforms, those transforms run when the node is added to new_nodes during BFS, before fan reconstruction.

23.10 Source locations

All types and functions described in this chapter are in crates/panproto-inst/src/wtype.rs:

FieldTransform enum: lines 81 to 193
CaseBranch struct: lines 195 to 205
CompiledMigration::conditional_survival field: lines 70 to 71
Builder methods (add_field_rename, add_path_transform, etc.): lines 207 to 349
apply_field_transforms: lines 801 to 864
apply_path_transform: lines 868 to 886
apply_map_references: lines 888 to 940
build_env_from_extra_fields: lines 947 to 972
value_to_expr_literal: lines 975 to 999
expr_literal_to_value: lines 1005 to 1025
BFS integration point: lines 742 to 745
Conditional survival evaluation: lines 723 to 733

# Field Transform Architecture {#sec-field-transforms} `FieldTransform` operates at the value level—the complement to `TheoryTransform`, which works at the schema level. Where `TheoryTransform` modifies vertex and edge kinds, `FieldTransform` modifies the `extra_fields` of individual nodes during `wtype_restrict`. ::: {.callout-note} For the user-facing walkthrough of value-dependent migration, see [the tutorial](https://panproto.dev/tutorial/chapters/20-value-dependent-transforms.html). ::: ## The FieldTransform algebra {#sec-field-transform-algebra} `FieldTransform` is a nine-variant enum in `panproto-inst/src/wtype.rs`. Each variant expresses one kind of node mutation: | Variant | What it does | When to use | |---------|-------------|-----------| | `RenameField` | Renames a field by key | Normalizing field names across versions | | `DropField` | Removes a field | Removing deprecated fields | | `AddField` | Injects a constant field | Providing defaults for new required fields | | `KeepFields` | Retains only named fields | Restricting to target schema fields | | `ApplyExpr` | Transforms a field value via expression | Type coercions, string normalization | | `PathTransform` | Applies any transform to a nested field | Working with nested objects | | `ComputeField` | Derives a field from other fields | Computing aggregates or combinations | | `Case` | Conditional logic based on field values | Branching on runtime data | | `MapReferences` | Updates string fields that hold vertex names | Propagating vertex renames | These transforms compose sequentially: `apply_field_transforms` iterates a `Vec<FieldTransform>` and applies each in order. Earlier transforms see the original fields; later transforms see results from all preceding ones. This sequential model is why the builder appends to a vector rather than replacing. ## PathTransform as path functor action {#sec-path-transform-arch} `PathTransform` wraps an inner transform and a path: ```rust PathTransform { path: Vec<String>, inner: Box<FieldTransform>, } ``` It navigates the `extra_fields` tree by treating each path segment as a map key lookup. At the path's end, a temporary `Node` is constructed with the nested map as its `extra_fields`. The inner transform runs on that node, then the result gets written back. Think of this as lifting a transform on flat fields to act on a subtree. An empty path means the transform applies directly to the node's own fields. The recursion bottoms out when `path.len() == 1`. For deeper paths, the function recurses with `&path[1..]` and the nested map. Paths that don't resolve (missing keys or non-map values) are silently no-ops—the transform produces no change rather than an error. This design avoids failures on data lacking an expected structure. ## MapReferences as functorial action {#sec-map-references-arch} When a vertex gets renamed in the schema, any fields holding vertex names as strings must be updated. That's what `MapReferences` does: ```rust MapReferences { field: String, rename_map: HashMap<String, Option<String>>, } ``` The `Option<String>` value handles both rename (`Some(new_name)`) and removal (`None`). Implementation in `apply_map_references` handles two cases: **Flat string fields**: `Value::Str(s)` is looked up in the rename map. If found, it's replaced or removed. **Encoded array fields**: panproto stores JSON arrays in `extra_fields` as maps with an `__array_len` sentinel. For example, `["u-url", "external"]` becomes `{"0": "u-url", "1": "external", "__array_len": 2}`. The function detects this encoding, iterates the indexed entries, applies the rename to each string, drops entries mapped to `None`, and rebuilds with updated indices. The categorical interpretation: the rename is a morphism in the name-reference algebra, and `MapReferences` applies that morphism to each element stored in a field. References outside the rename map stay unchanged, which corresponds to the morphism acting as identity there. ## Case as coproduct eliminator {#sec-case-arch} `Case` branches on node values: ```rust Case { branches: Vec<CaseBranch>, } pub struct CaseBranch { pub predicate: panproto_expr::Expr, pub transforms: Vec<FieldTransform>, } ``` The evaluator uses the node's current `extra_fields` to build an expression environment via `build_env_from_extra_fields`. Each branch's predicate is evaluated. The first branch returning `Literal::Bool(true)` has its transforms applied; no subsequent branches run. If none match, the node is unchanged. Mathematically, this is the dependent function space $\Pi(x : \text{Node}). \text{FieldTransform}$. The selection happens at runtime, not compile-time, so different nodes can receive entirely different transformations. Each branch's `transforms` list is a full `Vec<FieldTransform>`, so branches can nest `Case` blocks, wrap with `PathTransform`, or use `ComputeField`. This completeness lets `Case` express arbitrary conditional logic. ::: {.callout-note} ## What if no branch matches? The node passes through unchanged. Add a final branch with a predicate that always evaluates to `true` if you need a "catch-all." ::: ## ConditionalSurvival as a refined survival predicate {#sec-conditional-survival-arch} `ConditionalSurvival` is stored separately from `field_transforms` in `CompiledMigration`: ```rust pub conditional_survival: HashMap<Name, panproto_expr::Expr>, ``` It's checked *before* field transforms during the BFS pass in `wtype_restrict`. For each child node: 1. Is the remapped target anchor in `surviving_verts`? If not, skip (structurally pruned). 2. Is there a conditional survival predicate for the *source* anchor? If so, evaluate it. 3. If the predicate returns `false`, push the node back on the BFS queue (so descendants can still be processed through ancestor contraction) but don't add it to the surviving set. 4. If the predicate returns `true` (or no predicate exists), add it to the surviving set and apply field transforms. This two-step design separates structure from value. The schema migration answers: "does this vertex type survive?" The predicate answers: "does this specific node satisfy the condition?" All predicates are evaluated with the same `build_env_from_extra_fields` environment used for `ComputeField` and `Case`. ## build_env_from_extra_fields: variable binding strategy {#sec-build-env} This shared environment builder is used by `ComputeField`, `Case`, and `conditional_survival`. Its three binding strategies cover different storage conventions: **Flat bindings**: every key in `extra_fields` becomes a top-level variable. A field `"level": 2` evaluates as `Expr::Var("level")` → `Literal::Int(2)`. **Qualified bindings for non-structural keys**: most keys (anything except `"attrs"`, `"name"`, `"$type"`, `"parents"`) get a qualified `attrs.key` binding too. This means predicates can reference `attrs.level` without knowing whether `level` is flat or nested. **Nested attrs expansion**: if `extra_fields` contains an `"attrs"` key holding a map, every entry in that map binds under both the qualified name `attrs.key` and (unless shadowed) the unqualified `key`. Predicates written against `attrs.level` work whether `level` is flat or nested. This dual binding exists because different protocols store the same semantic information at different nesting depths. AT Protocol uses top-level keys; some annotation formats use an `attrs` object. Dual bindings let predicates work across both conventions without format-specific logic. ::: {.callout-tip} ## Why both `level` and `attrs.level`? Different protocols nest the same concept differently. Binding both means predicates work across protocols. A predicate like `attrs.level > 2` matches whether `level` is top-level or inside `attrs`. ::: ## value_to_expr_literal: Value-to-Literal conversion {#sec-value-to-literal} `value_to_expr_literal` converts an instance `Value` to a `panproto_expr::Literal`: | `Value` variant | Converts to | |----------------|-----------| | `Value::Bool(b)` | `Literal::Bool(b)` | | `Value::Int(i)` | `Literal::Int(i)` | | `Value::Float(f)` | `Literal::Float(f)` | | `Value::Str(s)` | `Literal::Str(s)` | | `Value::Unknown` with `__array_len` | Comma-separated string of elements | | `Value::Unknown` without `__array_len` | `Literal::Null` | | `Value::Null` and others | `Literal::Null` | The encoded-array case deserves explanation. panproto stores JSON arrays in `extra_fields` as maps because `extra_fields` is a `HashMap<String, Value>` without a dedicated list variant. The array `["u-url", "external"]` becomes `{"0": "u-url", "1": "external", "__array_len": 2}`. When this reaches `value_to_expr_literal`, the `__array_len` sentinel is detected and array elements are collected and joined with commas: `"u-url,external"`. This lets `BuiltinOp::Contains` check membership in the serialized array. The inverse conversion (`expr_literal_to_value`) normalizes float-valued integers: a `Literal::Float` with zero fractional part and magnitude within `i64` becomes `Value::Int` for JSON round-trip fidelity. This handles the common case where arithmetic on integers returns `1.0`. ## Transform ordering: addattrs, computefield, dropattrs {#sec-transform-ordering} `field_transforms` is a `Vec<FieldTransform>` applied in sequence. Order matters because later transforms see the result of earlier ones. The recommended convention for migrations needing to add, compute, and clean up fields: 1. **`AddField` for inputs**: add constant fields that expressions need but source data doesn't carry. 2. **`PathTransform` for nested normalization**: flatten or reorganize before expressions run. 3. **`ComputeField` and `ApplyExpr`**: compute derived fields using the now-complete input set. 4. **`Case`**: apply conditional logic that may depend on computed fields from step 3. 5. **`DropField` / `KeepFields`**: remove intermediate fields used as inputs but not in the output. If you add a `level` field (step 1) because the source stores it elsewhere, compute `name` from `level` (step 3), then drop `level` (step 5), all three transforms reference the same intermediate state. Reversing steps 3 and 5 would compute `name` from a missing `level` variable, leaving `name` unchanged. There's no automatic dependency analysis. Transform authors are responsible for ordering. The builder API appends in registration order, so the first `add_field_default` fires before the first `add_computed_field`, which fires before the first `add_field_drop`. ## Integration with wtype_restrict's BFS pass {#sec-bfs-integration} Field transforms are applied inside the fused BFS pass in `wtype_restrict`, not as a separate post-pass. The execution point is after a node is confirmed to survive (both structural and value-dependent predicates passed) and after its anchor is remapped to the target vertex name. The relevant portion of the BFS loop: ```rust // apply value-level field transforms if any exist for this vertex if let Some(transforms) = migration.field_transforms.get(&child_node.anchor) { apply_field_transforms(&mut new_node, transforms); } new_nodes.insert(child_id, new_node.clone()); ``` The lookup key is `child_node.anchor`, the *source* vertex name. The `field_transforms` map is keyed by source vertex name so migration authors can write `migration.add_field_rename("h1", ...)` against the source schema. The anchor is remapped on the `new_node` clone before field transforms run, so by the time `apply_field_transforms` executes, `new_node.anchor` holds the target name, but the lookup used the original `child_node.anchor`. ::: {.callout-important} `field_transforms` and `conditional_survival` predicates are both keyed by source vertex name and looked up before anchor remapping. However, the node passed to `apply_field_transforms` already has its anchor remapped. If your transforms read `node.anchor` (unusual), they see the target name, not the source. ::: The BFS pass also handles fan reconstruction (step 5 of `wtype_restrict`) as a separate pass operating on the original fan list. Field transforms don't run on fans independently; fans are structural groupings reconstructed after the BFS pass finishes. If a node participating in a fan also has field transforms, those transforms run when the node is added to `new_nodes` during BFS, before fan reconstruction. ## Source locations {#sec-field-transform-source} All types and functions described in this chapter are in `crates/panproto-inst/src/wtype.rs`: - `FieldTransform` enum: lines 81 to 193 - `CaseBranch` struct: lines 195 to 205 - `CompiledMigration::conditional_survival` field: lines 70 to 71 - Builder methods (`add_field_rename`, `add_path_transform`, etc.): lines 207 to 349 - `apply_field_transforms`: lines 801 to 864 - `apply_path_transform`: lines 868 to 886 - `apply_map_references`: lines 888 to 940 - `build_env_from_extra_fields`: lines 947 to 972 - `value_to_expr_literal`: lines 975 to 999 - `expr_literal_to_value`: lines 1005 to 1025 - BFS integration point: lines 742 to 745 - Conditional survival evaluation: lines 723 to 733