22 Value-Dependent Migration
You encounter a heading element: it carries the level as a vertex name (h1, h2, h3). Later, you want a single heading vertex with a level attribute. The structural transform to rename is trivial. But where does the level attribute come from? It comes from what the data says—the old vertex name itself. None of panproto’s structural schema operations can express that relationship.
FieldTransform and conditional_survival form the value-level layer. They operate on the node’s field environment at runtime, reading what the data contains and transforming fields accordingly.
22.1 Why structural transforms are not enough
Consider the heading problem in detail. Your source schema has separate vertices for each level: h1, h2, h3. You want to consolidate them into a single heading vertex with an integer level attribute.
The structural part is easy—rename all three to heading. But now every node reads as heading with no level. The missing piece is value-dependent: you need the original vertex name to compute the level value.
Here is a second pattern. Document formats sometimes encode semantic roles in CSS classes. An a element with class: "u-url" is a microformat URL link; one with class: "u-email" is for email. To normalize to a single link vertex type, you must keep only elements whose class contains a recognized token. Whether a node survives depends on what its class field actually holds.
FieldTransform handles the first case (computed fields based on data values). conditional_survival handles the second (dropping nodes that don’t match a value-based predicate).
22.2 PathTransform: navigating nested structures
A node’s extra_fields is a flat map. But the values themselves can be maps (nested objects), forming a tree. Some formats store attributes in nested objects: { "attrs": { "level": 2, "style": "bold" } }.
PathTransform applies any field transform at a specified path within this tree:
// rename "old_attr" -> "new_attr" inside the "attrs" nested object
migration.add_path_transform(
"heading",
&["attrs"],
FieldTransform::RenameField {
old_key: "old_attr".to_owned(),
new_key: "new_attr".to_owned(),
},
);The path is a sequence of keys to navigate. &["attrs"] means: find the attrs field, treat it as a map, and apply the inner transform to that map’s fields. Deeper paths are supported: &["attrs", "styles"] navigates two levels.
An empty path applies the inner transform directly to the node’s own extra_fields.
PathTransform composes with all other variants. You can nest a Case inside a PathTransform for conditional logic at nested levels, or nest a ComputeField inside PathTransform to compute values using both top-level fields and the nested subtree.
22.3 MapReferences: updating vertex-referencing strings
When you rename a vertex, any fields that carry that vertex name as a string go stale. Parent reference arrays in tree-structured formats ("parents": ["root", "body"]) are the canonical example: rename "body" to "main", and those parent reference strings need to follow.
MapReferences applies a rename map to a string field:
let mut rename_map = HashMap::new();
rename_map.insert("body".to_owned(), Some("main".to_owned()));
rename_map.insert("sidebar".to_owned(), None); // drop references to removed vertex
migration.add_map_references("paragraph", "parents", rename_map);For each paragraph node, the parents field is updated:
Value::Str("body")becomesValue::Str("main").Value::Str("sidebar")is removed (mapped toNone).- String values not in the rename map pass through unchanged.
The transform handles both flat strings and encoded arrays. panproto encodes arrays in extra_fields as Value::Unknown maps with an __array_len sentinel. MapReferences detects this and iterates over array elements, renaming or dropping each one. References mapped to None are removed; the array is compacted and its length updated.
This is the functorial action of vertex rename on the name-reference algebra: every structural rename should be accompanied by MapReferences on any field that carries those names.
22.4 ComputeField: expression-based field computation
ComputeField evaluates an expression against a node’s full field environment and stores the result in a target field. Unlike ApplyExpr (which binds a single field’s value), ComputeField makes all extra_fields available as variables.
The heading level example:
use panproto_expr::{Expr, Literal, BuiltinOp};
use std::sync::Arc;
// compute: (concat "h" (int_to_str level))
let expr = Expr::Builtin(
BuiltinOp::Concat,
vec![
Expr::Lit(Literal::Str("h".to_owned())),
Expr::Builtin(
BuiltinOp::IntToStr,
vec![Expr::Var(Arc::from("level"))],
),
],
);
migration.add_computed_field("heading", "name", expr);When a heading node with level: 2 is processed, this expression evaluates to "h2" and stores it in the name field. The variable level is bound from extra_fields["level"].
If level lives in a nested attrs object (extra_fields["attrs"]["level"]), the variable attrs.level is also bound automatically. The same expression using Expr::Var(Arc::from("attrs.level")) works regardless of whether level is flat or nested.
The evaluator runs the expression using panproto_expr::EvalConfig::default(). If evaluation fails (a field is missing, for instance), the field is left unchanged.
22.5 Case: conditional transforms based on runtime values
Case is the dependent function space for field transforms. It applies different transforms depending on what the data says. Each branch is a (predicate, transforms) pair. The first branch whose predicate evaluates to true fires; the rest are skipped.
Given branches \((p_1, \vec{f}_1), \ldots, (p_k, \vec{f}_k)\) and environment \(e\):
\[\text{Case}(e) = \vec{f}_i(e) \quad \text{where } i = \min\{j : p_j(e) = \text{true}\}\]
If no predicate matches, \(e\) passes through unchanged.
The canonical use case is matching against attribute values:
use panproto_expr::{Expr, Literal, BuiltinOp, CaseBranch};
use std::sync::Arc;
let case = FieldTransform::Case {
branches: vec![
CaseBranch {
predicate: Expr::Builtin(
BuiltinOp::Eq,
vec![
Expr::Var(Arc::from("level")),
Expr::Lit(Literal::Int(1)),
],
),
transforms: vec![FieldTransform::ComputeField {
target_key: "name".to_owned(),
expr: Expr::Lit(Literal::Str("h1".to_owned())),
}],
},
CaseBranch {
predicate: Expr::Builtin(
BuiltinOp::Eq,
vec![
Expr::Var(Arc::from("level")),
Expr::Lit(Literal::Int(2)),
],
),
transforms: vec![FieldTransform::ComputeField {
target_key: "name".to_owned(),
expr: Expr::Lit(Literal::Str("h2".to_owned())),
}],
},
// ... and so on for h3
],
};
migration.add_case_transform("heading", vec![/* branches */]);A node with level: 1 gets name set to "h1". A node with level: 2 gets "h2". A node whose level matches none of the predicates passes through unchanged.
Predicates are evaluated with the same variable environment as ComputeField: all extra_fields and attrs.* entries are bound. The comparison builtins (BuiltinOp::Eq, BuiltinOp::Lt, BuiltinOp::Contains, etc.) are all available.
Case branches can contain any sequence of FieldTransform variants, including nested Case blocks and PathTransform wrappers. The transforms in the matching branch execute in order before moving on.
Since Case fires the first matching branch, the order of branches matters. If two predicates overlap (e.g., level < 3 and level == 2), the branch listed first wins. How should you order branches to avoid accidentally shadowing a more specific predicate with a more general one?
Order branches from most specific to most general: put level == 2 before level < 3. The otherwise branch (if present) should always be last, since it matches everything.
22.6 ConditionalSurvival: value-dependent vertex survival
The FieldTransform variants operate on nodes already anchored to surviving vertices. conditional_survival adds a second, value-dependent gate before field transforms run.
If a vertex has a conditional survival predicate, panproto evaluates it against the node’s extra_fields. If it returns false, the node is dropped, treated exactly as if its anchor were not in surviving_verts. Its descendants undergo ancestor contraction just as they would for a structurally pruned node.
// keep only "item" nodes where level == 2
let predicate = Expr::Builtin(
BuiltinOp::Eq,
vec![
Expr::Var(Arc::from("level")),
Expr::Lit(Literal::Int(2)),
],
);
migration.add_conditional_survival("item", predicate);This is the matching pattern at the survival level rather than the transform level. Use add_conditional_survival when you want to keep only nodes of a given type that match a condition. Use add_case_transform when you want all nodes to survive but behave differently depending on their values.
The two interact cleanly: conditional_survival runs first. Nodes that pass then undergo field_transforms in the usual order.
When conditional_survival drops a node, its children undergo ancestor contraction. If a dropped heading node has inline children (strong, em), do those children get re-parented to the dropped node’s parent, or are they dropped recursively?
They are re-parented to the dropped node’s nearest surviving ancestor. The conditional survival predicate removes only the node it targets; surviving descendants are preserved by the standard ancestor contraction mechanism.
22.7 Worked example: heading levels
Normalizing heading-level encodings is the most common use for value-dependent migration. Prosemirror and Pandoc encode heading level as an integer attribute on a single heading vertex. Older HTML-derived formats use separate h1, h2, h3 vertex types.
Migrating from separate vertices to the unified form requires two steps:
- Rename
h1,h2,h3all toheading(threeRenameVertextransforms at the schema level). - Add a
levelattribute to each node with the correct integer value.
Step 2 is value-dependent: the source vertex name determines the value.
use panproto_expr::{Expr, Literal, BuiltinOp, CaseBranch};
use std::sync::Arc;
// the schema migration renames h1/h2/h3 to heading.
// field transforms add the level attribute after remapping.
// for nodes originally anchored to h1:
migration.add_field_default("h1", "level", Value::Int(1));
// for nodes originally anchored to h2:
migration.add_field_default("h2", "level", Value::Int(2));
// for nodes originally anchored to h3:
migration.add_field_default("h3", "level", Value::Int(3));add_field_default uses the source vertex name as the key. The transform is registered against "h1", which is the anchor name at the time the node is processed. The anchor is remapped to "heading" after field transforms run, so the lookup is against the original name.
In the reverse direction (migrating from the unified form back to separate vertices), the structural transforms map heading back to h1, h2, h3. But how does panproto know which heading node becomes h1 versus h2? It doesn’t, unless you add a ConditionalSurvival predicate for each target vertex. The full bidirectional version requires each target vertex to survive only for the matching level value:
// reverse migration: heading -> h1/h2/h3
// structural: RenameVertex("heading", "h1"), RenameVertex("heading", "h2"), ...
// (this is actually a split, handled by the schema-level SplitVertex transform.)
// value-dependent survival gates each rename:
migration.add_conditional_survival(
"heading",
Expr::Builtin(
BuiltinOp::Eq,
vec![
Expr::Var(Arc::from("level")),
Expr::Lit(Literal::Int(1)),
],
),
);The heading example uses one predicate per target vertex. If you needed a single vertex to survive when level == 1 or level == 2 (collapsing h1 and h2 into one target), can you express that with a single ConditionalSurvival predicate, or do you need a different mechanism?
Yes. The predicate is an arbitrary Expr, so you can use BuiltinOp::Or to combine conditions: Expr::Builtin(BuiltinOp::Or, vec![level_eq_1, level_eq_2]). A single ConditionalSurvival predicate with a disjunction is equivalent to (and simpler than) registering multiple predicates or using a Case transform to achieve the same effect.
22.8 Worked example: CSS class-based matching
The pattern of checking whether a list-valued attribute includes a specific token uses BuiltinOp::Contains. panproto encodes JSON arrays in extra_fields as Value::Unknown maps with an __array_len sentinel. When this map is used as an expression variable, it is serialized as a comma-separated string, making Contains work as a membership test.
Consider a microformat migration: keep only a elements whose class array contains "u-url".
// keep only anchor nodes where class contains "u-url"
let predicate = Expr::Builtin(
BuiltinOp::Contains,
vec![
Expr::Var(Arc::from("class")),
Expr::Lit(Literal::Str("u-url".to_owned())),
],
);
migration.add_conditional_survival("a", predicate);If a node has class: ["u-url", "external"], the variable class is bound to the string "u-url,external". Contains("u-url,external", "u-url") returns true, and the node survives. A node with class: ["u-email"] fails the predicate and is dropped.
The comma-separated serialization comes from how value_to_expr_literal converts encoded arrays. Membership tests using Contains work correctly as long as no element contains a comma. For elements that do, use a more precise predicate (checking for "u-url," or ",u-url"), or restructure the data before migration.
The same Contains predicate can drive Case branches instead of survival. If you want all a elements to survive but behave differently based on which microformat token they carry:
let case = FieldTransform::Case {
branches: vec![
CaseBranch {
predicate: Expr::Builtin(
BuiltinOp::Contains,
vec![
Expr::Var(Arc::from("class")),
Expr::Lit(Literal::Str("u-url".to_owned())),
],
),
transforms: vec![FieldTransform::AddField {
key: "link_type".to_owned(),
value: Value::Str("url".to_owned()),
}],
},
CaseBranch {
predicate: Expr::Builtin(
BuiltinOp::Contains,
vec![
Expr::Var(Arc::from("class")),
Expr::Lit(Literal::Str("u-email".to_owned())),
],
),
transforms: vec![FieldTransform::AddField {
key: "link_type".to_owned(),
value: Value::Str("email".to_owned()),
}],
},
],
};
migration.add_case_transform("a", vec![/* branches above */]);Nodes with u-url in their class get link_type: "url". Nodes with u-email get link_type: "email". Nodes matching neither branch are unchanged.