23 Field Transform Architecture
FieldTransform operates at the value level—the complement to TheoryTransform, which works at the schema level. Where TheoryTransform modifies vertex and edge kinds, FieldTransform modifies the extra_fields of individual nodes during wtype_restrict.
For the user-facing walkthrough of value-dependent migration, see the tutorial.
23.1 The FieldTransform algebra
FieldTransform is a nine-variant enum in panproto-inst/src/wtype.rs. Each variant expresses one kind of node mutation:
| Variant | What it does | When to use |
|---|---|---|
RenameField |
Renames a field by key | Normalizing field names across versions |
DropField |
Removes a field | Removing deprecated fields |
AddField |
Injects a constant field | Providing defaults for new required fields |
KeepFields |
Retains only named fields | Restricting to target schema fields |
ApplyExpr |
Transforms a field value via expression | Type coercions, string normalization |
PathTransform |
Applies any transform to a nested field | Working with nested objects |
ComputeField |
Derives a field from other fields | Computing aggregates or combinations |
Case |
Conditional logic based on field values | Branching on runtime data |
MapReferences |
Updates string fields that hold vertex names | Propagating vertex renames |
These transforms compose sequentially: apply_field_transforms iterates a Vec<FieldTransform> and applies each in order. Earlier transforms see the original fields; later transforms see results from all preceding ones. This sequential model is why the builder appends to a vector rather than replacing.
23.2 PathTransform as path functor action
PathTransform wraps an inner transform and a path:
PathTransform {
path: Vec<String>,
inner: Box<FieldTransform>,
}It navigates the extra_fields tree by treating each path segment as a map key lookup. At the path’s end, a temporary Node is constructed with the nested map as its extra_fields. The inner transform runs on that node, then the result gets written back.
Think of this as lifting a transform on flat fields to act on a subtree. An empty path means the transform applies directly to the node’s own fields.
The recursion bottoms out when path.len() == 1. For deeper paths, the function recurses with &path[1..] and the nested map. Paths that don’t resolve (missing keys or non-map values) are silently no-ops—the transform produces no change rather than an error. This design avoids failures on data lacking an expected structure.
23.3 MapReferences as functorial action
When a vertex gets renamed in the schema, any fields holding vertex names as strings must be updated. That’s what MapReferences does:
MapReferences {
field: String,
rename_map: HashMap<String, Option<String>>,
}The Option<String> value handles both rename (Some(new_name)) and removal (None). Implementation in apply_map_references handles two cases:
Flat string fields: Value::Str(s) is looked up in the rename map. If found, it’s replaced or removed.
Encoded array fields: panproto stores JSON arrays in extra_fields as maps with an __array_len sentinel. For example, ["u-url", "external"] becomes {"0": "u-url", "1": "external", "__array_len": 2}. The function detects this encoding, iterates the indexed entries, applies the rename to each string, drops entries mapped to None, and rebuilds with updated indices.
The categorical interpretation: the rename is a morphism in the name-reference algebra, and MapReferences applies that morphism to each element stored in a field. References outside the rename map stay unchanged, which corresponds to the morphism acting as identity there.
23.4 Case as coproduct eliminator
Case branches on node values:
Case {
branches: Vec<CaseBranch>,
}
pub struct CaseBranch {
pub predicate: panproto_expr::Expr,
pub transforms: Vec<FieldTransform>,
}The evaluator uses the node’s current extra_fields to build an expression environment via build_env_from_extra_fields. Each branch’s predicate is evaluated. The first branch returning Literal::Bool(true) has its transforms applied; no subsequent branches run. If none match, the node is unchanged.
Mathematically, this is the dependent function space \(\Pi(x : \text{Node}). \text{FieldTransform}\). The selection happens at runtime, not compile-time, so different nodes can receive entirely different transformations.
Each branch’s transforms list is a full Vec<FieldTransform>, so branches can nest Case blocks, wrap with PathTransform, or use ComputeField. This completeness lets Case express arbitrary conditional logic.
The node passes through unchanged. Add a final branch with a predicate that always evaluates to true if you need a “catch-all.”
23.5 ConditionalSurvival as a refined survival predicate
ConditionalSurvival is stored separately from field_transforms in CompiledMigration:
pub conditional_survival: HashMap<Name, panproto_expr::Expr>,It’s checked before field transforms during the BFS pass in wtype_restrict. For each child node:
- Is the remapped target anchor in
surviving_verts? If not, skip (structurally pruned). - Is there a conditional survival predicate for the source anchor? If so, evaluate it.
- If the predicate returns
false, push the node back on the BFS queue (so descendants can still be processed through ancestor contraction) but don’t add it to the surviving set. - If the predicate returns
true(or no predicate exists), add it to the surviving set and apply field transforms.
This two-step design separates structure from value. The schema migration answers: “does this vertex type survive?” The predicate answers: “does this specific node satisfy the condition?” All predicates are evaluated with the same build_env_from_extra_fields environment used for ComputeField and Case.
23.6 build_env_from_extra_fields: variable binding strategy
This shared environment builder is used by ComputeField, Case, and conditional_survival. Its three binding strategies cover different storage conventions:
Flat bindings: every key in extra_fields becomes a top-level variable. A field "level": 2 evaluates as Expr::Var("level") → Literal::Int(2).
Qualified bindings for non-structural keys: most keys (anything except "attrs", "name", "$type", "parents") get a qualified attrs.key binding too. This means predicates can reference attrs.level without knowing whether level is flat or nested.
Nested attrs expansion: if extra_fields contains an "attrs" key holding a map, every entry in that map binds under both the qualified name attrs.key and (unless shadowed) the unqualified key. Predicates written against attrs.level work whether level is flat or nested.
This dual binding exists because different protocols store the same semantic information at different nesting depths. AT Protocol uses top-level keys; some annotation formats use an attrs object. Dual bindings let predicates work across both conventions without format-specific logic.
level and attrs.level?
Different protocols nest the same concept differently. Binding both means predicates work across protocols. A predicate like attrs.level > 2 matches whether level is top-level or inside attrs.
23.7 value_to_expr_literal: Value-to-Literal conversion
value_to_expr_literal converts an instance Value to a panproto_expr::Literal:
Value variant |
Converts to |
|---|---|
Value::Bool(b) |
Literal::Bool(b) |
Value::Int(i) |
Literal::Int(i) |
Value::Float(f) |
Literal::Float(f) |
Value::Str(s) |
Literal::Str(s) |
Value::Unknown with __array_len |
Comma-separated string of elements |
Value::Unknown without __array_len |
Literal::Null |
Value::Null and others |
Literal::Null |
The encoded-array case deserves explanation. panproto stores JSON arrays in extra_fields as maps because extra_fields is a HashMap<String, Value> without a dedicated list variant. The array ["u-url", "external"] becomes {"0": "u-url", "1": "external", "__array_len": 2}. When this reaches value_to_expr_literal, the __array_len sentinel is detected and array elements are collected and joined with commas: "u-url,external". This lets BuiltinOp::Contains check membership in the serialized array.
The inverse conversion (expr_literal_to_value) normalizes float-valued integers: a Literal::Float with zero fractional part and magnitude within i64 becomes Value::Int for JSON round-trip fidelity. This handles the common case where arithmetic on integers returns 1.0.
23.8 Transform ordering: addattrs, computefield, dropattrs
field_transforms is a Vec<FieldTransform> applied in sequence. Order matters because later transforms see the result of earlier ones.
The recommended convention for migrations needing to add, compute, and clean up fields:
AddFieldfor inputs: add constant fields that expressions need but source data doesn’t carry.PathTransformfor nested normalization: flatten or reorganize before expressions run.ComputeFieldandApplyExpr: compute derived fields using the now-complete input set.Case: apply conditional logic that may depend on computed fields from step 3.DropField/KeepFields: remove intermediate fields used as inputs but not in the output.
If you add a level field (step 1) because the source stores it elsewhere, compute name from level (step 3), then drop level (step 5), all three transforms reference the same intermediate state. Reversing steps 3 and 5 would compute name from a missing level variable, leaving name unchanged.
There’s no automatic dependency analysis. Transform authors are responsible for ordering. The builder API appends in registration order, so the first add_field_default fires before the first add_computed_field, which fires before the first add_field_drop.
23.9 Integration with wtype_restrict’s BFS pass
Field transforms are applied inside the fused BFS pass in wtype_restrict, not as a separate post-pass. The execution point is after a node is confirmed to survive (both structural and value-dependent predicates passed) and after its anchor is remapped to the target vertex name.
The relevant portion of the BFS loop:
// apply value-level field transforms if any exist for this vertex
if let Some(transforms) = migration.field_transforms.get(&child_node.anchor) {
apply_field_transforms(&mut new_node, transforms);
}
new_nodes.insert(child_id, new_node.clone());The lookup key is child_node.anchor, the source vertex name. The field_transforms map is keyed by source vertex name so migration authors can write migration.add_field_rename("h1", ...) against the source schema. The anchor is remapped on the new_node clone before field transforms run, so by the time apply_field_transforms executes, new_node.anchor holds the target name, but the lookup used the original child_node.anchor.
field_transforms and conditional_survival predicates are both keyed by source vertex name and looked up before anchor remapping. However, the node passed to apply_field_transforms already has its anchor remapped. If your transforms read node.anchor (unusual), they see the target name, not the source.
The BFS pass also handles fan reconstruction (step 5 of wtype_restrict) as a separate pass operating on the original fan list. Field transforms don’t run on fans independently; fans are structural groupings reconstructed after the BFS pass finishes. If a node participating in a fan also has field transforms, those transforms run when the node is added to new_nodes during BFS, before fan reconstruction.
23.10 Source locations
All types and functions described in this chapter are in crates/panproto-inst/src/wtype.rs:
FieldTransformenum: lines 81 to 193CaseBranchstruct: lines 195 to 205CompiledMigration::conditional_survivalfield: lines 70 to 71- Builder methods (
add_field_rename,add_path_transform, etc.): lines 207 to 349 apply_field_transforms: lines 801 to 864apply_path_transform: lines 868 to 886apply_map_references: lines 888 to 940build_env_from_extra_fields: lines 947 to 972value_to_expr_literal: lines 975 to 999expr_literal_to_value: lines 1005 to 1025- BFS integration point: lines 742 to 745
- Conditional survival evaluation: lines 723 to 733