24 Expression Parser & Query Engine
Migrations need more than structural rewiring. They need to compute values: parse a string into an integer, concatenate two name fields, filter nodes by a predicate, choose a default based on the target schema’s constraints. panproto’s expression language handles all of this—a small, pure functional language embedded in two crates (panproto-expr for the AST and evaluator, panproto-expr-parser for the surface syntax) and integrated throughout the instance layer, VCS, and WASM boundary.
For the user-facing walkthrough of value-dependent migration (which uses expressions for field transforms), see the tutorial.
24.1 Crate layout
The expression system is split across two crates with a clear boundary: panproto-expr defines the AST and evaluator (no parser dependency), while panproto-expr-parser defines the surface syntax.
24.1.1 panproto-expr
This crate provides the core computational substrate. Its modules are:
| Module | Contents |
|---|---|
expr.rs |
Expr enum (11 variants), Pattern enum (6 variants), BuiltinOp enum (50 operations) |
literal.rs |
Literal type (Int, Float, Str, Bool, Null, Record, List, Closure) |
eval.rs |
Call-by-value evaluator with step/depth limits (EvalConfig) |
env.rs |
Env type: immutable variable bindings backed by FxHashMap<Arc<str>, Literal> |
builtin.rs |
apply_builtin dispatch for all non-graph builtins |
subst.rs |
substitute, free_vars, pattern_vars for symbolic manipulation |
error.rs |
ExprError enum covering type mismatches, arity errors, unbound variables, and resource limits |
The Expr enum is the canonical AST representation used throughout panproto. All variants derive Serialize/Deserialize, making expressions content-addressable and portable across the WASM boundary:
pub enum Expr {
Var(Arc<str>),
Lam(Arc<str>, Box<Self>),
App(Box<Self>, Box<Self>),
Lit(Literal),
Record(Vec<(Arc<str>, Self)>),
List(Vec<Self>),
Field(Box<Self>, Arc<str>),
Index(Box<Self>, Box<Self>),
Match { scrutinee: Box<Self>, arms: Vec<(Pattern, Self)> },
Let { name: Arc<str>, value: Box<Self>, body: Box<Self> },
Builtin(BuiltinOp, Vec<Self>),
}24.1.2 panproto-expr-parser
This crate provides the Haskell-style surface syntax. It has four modules:
token.rs: A logos-based token enum with 50+ token kinds. Keywords (do, let, where, if, then, else, case, of, guard, not, mod, div, otherwise) are recognized during lexing. Three virtual token kinds (Indent, Dedent, Newline) are reserved for the layout pass.
lexer.rs: Two-pass tokenization. The first pass runs logos for fast regex-based tokenization. The second pass (insert_layout) implements GHC-style layout insertion: when a layout keyword (let, where, do, of) is followed by a newline with increased indentation, an Indent token is inserted. When indentation decreases, Dedent tokens are inserted. Equal indentation within a layout block produces Newline tokens. If the layout keyword is followed by {, layout insertion is suppressed.
parser.rs: A chumsky 1.0 parser combining recursive descent with Pratt precedence parsing. Layout tokens are consumed as block delimiters alongside explicit braces.
pretty.rs: Precedence-aware pretty printer that minimizes parentheses. Designed to round-trip: parse(tokenize(pretty_print(&e))) == e for well-formed expressions.
24.2 The builtinop system
BuiltinOp is a 50-variant enum organized into nine domains:
| Domain | Count | Operations |
|---|---|---|
| Arithmetic | 7 | Add, Sub, Mul, Div, Mod, Neg, Abs |
| Rounding | 2 | Floor, Ceil |
| Comparison | 6 | Eq, Neq, Lt, Lte, Gt, Gte |
| Boolean | 3 | And, Or, Not |
| String | 10 | Concat, Len, Slice, Upper, Lower, Trim, Split, Join, Replace, Contains |
| List | 9 | Map, Filter, Fold, Append, Head, Tail, Reverse, FlatMap, Length |
| Record | 4 | MergeRecords, Keys, Values, HasField |
| Type coercion | 6 | IntToFloat, FloatToInt, IntToStr, FloatToStr, StrToInt, StrToFloat |
| Type inspection | 3 | TypeOf, IsNull, IsList |
| Graph traversal | 5 | Edge, Children, HasEdge, EdgeCount, Anchor |
Every builtin has a fixed arity enforced by BuiltinOp::arity(). The arity is either 1 (unary), 2 (binary), or 3 (ternary; only Slice, Replace, and Fold). The evaluator checks arity before dispatch: if the argument count doesn’t match, evaluation fails with ExprError::ArityMismatch.
Higher-order builtins (Map, Filter, Fold, FlatMap) receive special treatment in the evaluator. Instead of evaluating all arguments eagerly and dispatching to apply_builtin, the evaluator handles them directly: it evaluates the list argument, then iterates over its elements, applying the function argument via apply_closure for each element. This avoids materializing closures as Literal values before use and enables proper step-counting per iteration.
24.2.1 Graph traversal builtins
The five graph builtins (Edge, Children, HasEdge, EdgeCount, Anchor) require an instance context that the standard evaluator does not have. In the standard panproto_expr::eval, these return Literal::Null. They are meaningful only when evaluated through eval_with_instance in panproto-inst.
24.3 Instance-aware evaluation
instance_env.rs in panproto-inst provides eval_with_instance, which intercepts graph builtins and resolves them against a WInstance:
pub fn eval_with_instance(
expr: &Expr,
env: &panproto_expr::Env,
config: &EvalConfig,
instance: &WInstance,
context_node_id: Option<u32>,
) -> Result<Literal, panproto_expr::ExprError>The function pattern-matches on Expr::Builtin(op, args) where op is a graph builtin. It evaluates arguments recursively (through eval_with_instance, so nested graph builtins work), then dispatches to apply_graph_builtin. For all non-graph expressions, it falls through to the standard panproto_expr::eval.
24.3.1 apply_graph_builtin
Each graph builtin resolves against the instance’s arc and node maps:
| Builtin | Signature | Behavior |
|---|---|---|
Edge |
(node_ref, edge_kind) -> value |
Finds the first arc matching the source node and edge kind, returns the target node as a Record literal |
Children |
(node_ref) -> [value] |
Collects all arc targets from the source node |
HasEdge |
(node_ref, edge_kind) -> bool |
Checks if any arc matches |
EdgeCount |
(node_ref) -> int |
Counts outgoing arcs |
Anchor |
(node_ref) -> string |
Returns the node’s schema anchor name |
24.3.2 Node reference resolution
resolve_node_ref converts a Literal to a u32 node ID. It accepts two forms: an integer literal (direct node ID) or the string "self" (resolved to the context_node_id parameter). This allows expressions like edge("self", "body") to traverse from the current node without knowing its numeric ID.
24.3.3 node_to_literal
node_to_literal converts a WInstance node into a Literal::Record for use in expression results. The record includes:
_id: the node’s numeric ID asLiteral::Int_anchor: the node’s schema anchor asLiteral::Str- All entries from
extra_fields, converted viavalue_to_expr_literal(the same conversion used by field transforms; see ?tbl-value-to-literal in the field transforms chapter)
24.4 Polynomial operations (poly.rs)
poly.rs in panproto-inst exposes operations that arise from treating schemas as polynomial functors and instances as their initial algebras (W-types). These operations are derived from the adjoint triple \(\Sigma \dashv \Delta \dashv \Pi\) applied to the polynomial interpretation.
24.4.1 fiber_at_anchor
pub fn fiber_at_anchor(
compiled: &CompiledMigration,
source: &WInstance,
target_anchor: &Name,
) -> Vec<u32>Given a compiled migration \(m : S \to T\) and a target anchor \(a\) in \(T\), returns all source node IDs whose remapped anchor equals \(a\). This is the \(\Delta_f\) operation (pullback along \(f\)) applied to a representable presheaf (a single point).
24.4.2 fiber_decomposition
pub fn fiber_decomposition(
compiled: &CompiledMigration,
source: &WInstance,
) -> FxHashMap<Name, Vec<u32>>Computes fibers for all target anchors simultaneously. The result is a partition of source nodes: every node appears in exactly one fiber.
24.4.3 fiber_with_predicate
Combines fiber_at_anchor with a value predicate. Computes \(f^{-1}(a) \cap \{x \mid \text{pred}(x)\}\). The predicate is evaluated against each node’s extra_fields using build_env_from_extra_fields. Additional variables _value (if present) and _anchor are bound for each node.
24.4.4 group_by
pub fn group_by(
compiled: &CompiledMigration,
source: &WInstance,
) -> FxHashMap<Name, WInstance>Partitions source nodes by their migration image, returning a sub-instance for each target anchor. Internal arcs between nodes in the same fiber are preserved. This is the dependent product \(\Pi_f\) computed explicitly on trees.
24.4.5 join
pub fn join(
left: &WInstance,
right: &WInstance,
left_compiled: &CompiledMigration,
right_compiled: &CompiledMigration,
) -> Vec<(u32, u32)>Given two instances \(A\) and \(B\) with migrations to a shared target \(C\), computes the pullback \(A \times_C B\): all pairs \((a, b)\) where \(a\) and \(b\) map to the same target anchor. The implementation decomposes both instances into fibers, then takes the Cartesian product of fibers at each shared anchor.
24.5 Declarative query engine (query.rs)
query.rs in panproto-inst provides a declarative query interface over W-type instances. An InstanceQuery describes what to find; execute runs the pipeline.
24.5.1 InstanceQuery struct
pub struct InstanceQuery {
pub anchor: Name, // vertex kind to select
pub predicate: Option<Expr>, // filter expression
pub group_by: Option<String>, // partition field
pub project: Option<Vec<String>>, // field subset
pub limit: Option<usize>, // max results
pub path: Vec<Name>, // edge traversal path
}All fields except anchor are optional. The struct derives Serialize/Deserialize, so queries can cross the WASM boundary via MessagePack.
24.5.2 Execution pipeline
execute applies five stages in order:
- Anchor filter: collect all nodes whose
anchormatchesquery.anchor. - Path navigation: if
query.pathis non-empty, follow edges. Each path element is an edge kind;navigate_pathiterates, collecting all arc targets matching the edge kind at each step. The result replaces the candidate set. - Predicate evaluation: for each candidate,
build_node_envconstructs an expression environment from the node’sextra_fields,_anchor,_id, and_value. The predicate is evaluated viapanproto_expr::eval. Nodes whose predicate returns anything other thanLiteral::Bool(true)are excluded. - Limit: truncate results to
query.limit. - Projection:
project_fieldsrestricts each node’s fields to the specified subset, or returns all fields if no projection is specified.
24.5.3 QueryMatch result
pub struct QueryMatch {
pub node_id: u32,
pub anchor: Name,
pub value: Option<FieldPresence>,
pub fields: FxHashMap<String, Value>,
}24.5.4 Helper functions
navigate_path: iterates over a sequence of edge kinds, using the instance’s arc list to follow edges. At each step, the current node set is replaced by all targets reachable via matching arcs.
build_node_env: constructs a panproto_expr::Env by calling build_env_from_extra_fields (the same function used by field transforms) and adding _anchor, _id, and _value bindings.
project_fields: when a projection list is provided, copies only the named fields into the output map. Missing fields are silently omitted.
24.6 The parser in detail
24.6.1 Operator precedence table
The Pratt parser defines nine precedence levels. Higher levels bind more tightly:
| Level | Associativity | Operators | Desugars to |
|---|---|---|---|
| 1 | Left | & (pipe) |
App(rhs, lhs) (reverse application) |
| 3 | Left | \|\| |
Builtin(Or, ...) |
| 4 | Left | && |
Builtin(And, ...) |
| 5 | Right | ==, /=, <, <=, >, >= |
Builtin(Eq\|Neq\|Lt\|Lte\|Gt\|Gte, ...) |
| 6 | Right | ++ |
Builtin(Concat, ...) |
| 7 | Left | +, - |
Builtin(Add\|Sub, ...) |
| 8 | Left | *, /, %, mod, div |
Builtin(Mul\|Div\|Mod, ...) |
| 9 | Prefix | - (unary), not |
Builtin(Neg\|Not, ...) |
| 10+ | Left | juxtaposition (application) | App(func, arg) or Builtin(op, args) |
Postfix operations (.field and ->edge) bind tighter than application. They are parsed as a left-fold over the atom parser, before the Pratt parser sees the expression.
24.6.2 Desugaring rules
The parser transforms several surface syntax forms into core Expr constructors:
List comprehensions desugar to nested FlatMap and Match:
[e | x <- xs, pred]
becomes
flatMap(xs, \x -> match pred { True -> [e]; _ -> [] })
Guards become pattern matches on True/Wildcard, and generators become FlatMap with a lambda binding. Multiple qualifiers are folded right-to-left.
Do-notation desugars to nested FlatMap and Let:
do
x <- source
let y = f x
g x y
becomes
flatMap(source, \x -> let y = f x in g x y)
Bind statements (x <- e) become FlatMap; let statements become Let bindings; bare expressions become FlatMap with a wildcard binding.
Where clauses desugar to Let bindings wrapping the main body:
body where x = e1; y = e2
becomes
let x = e1 in let y = e2 in body
Multi-parameter lambdas desugar to nested Lam:
\x y z -> body
becomes
Lam("x", Lam("y", Lam("z", body)))
Pattern parameters (anything other than Var or Wildcard) generate a fresh variable with an immediate Match.
If/then/else desugars to Match with a Bool pattern:
if cond then a else b
becomes
Match { scrutinee: cond, arms: [(Lit(Bool(true)), a), (Wildcard, b)] }
Infix operators desugar to Builtin applications, as shown in the precedence table.
Edge traversal x -> y desugars to Builtin(Edge, [x, Lit(Str("y"))]). This is parsed as a postfix operation on atoms, making doc -> layers -> annotations a left-associative chain of Edge calls.
24.6.3 Builtin name resolution
When the parser encounters juxtaposition (function application), resolve_application checks whether the function position is a Var whose name matches a builtin. If so, it emits Builtin(op, [arg]) instead of App(Var(name), arg). On subsequent applications, if the Builtin has fewer arguments than its arity, the new argument is appended. This means add 1 2 parses directly as Builtin(Add, [Lit(1), Lit(2)]) without going through curried application.
Both snake_case and camelCase names are accepted for multi-word builtins (e.g., flat_map and flatMap both resolve to FlatMap).
24.7 VCS integration
Expressions participate in panproto’s content-addressed object store via the Object::Expr variant in panproto-vcs.
24.7.1 Storage API
pub fn store_expr(store: &mut dyn Store, expr: &Expr) -> Result<ObjectId, VcsError>;
pub fn load_expr(store: &dyn Store, id: &ObjectId) -> Result<Expr, VcsError>;store_expr wraps the expression in Object::Expr(Box::new(expr.clone())) and calls store.put(&object). The store serializes the object with MessagePack, hashes it with blake3, and stores the bytes keyed by the hash. If an identical expression already exists, the operation returns the existing ID (content-addressing makes storage idempotent).
load_expr retrieves the object by ID and pattern-matches on Object::Expr. If the object exists but has a different type, it returns VcsError::WrongObjectType.
24.7.2 Hash computation
The VCS hashes Object::Expr via hash::hash_expr, which serializes the Expr with MessagePack and computes a blake3 digest. This means two structurally identical expressions always produce the same ObjectId, regardless of when or where they were created.
Expressions are leaf objects in the VCS DAG: they have no outgoing references to other objects. The garbage collector (gc.rs) treats them as terminal nodes during reachability analysis.
24.8 WASM boundary
Three WASM exports in panproto-wasm/src/api.rs expose the expression engine to JavaScript/TypeScript:
24.8.1 parse_expr
#[wasm_bindgen]
pub fn parse_expr(source: &str) -> Result<Vec<u8>, JsError>Takes a surface syntax string, tokenizes it with panproto_expr_parser::tokenize, parses it with panproto_expr_parser::parse, and returns the resulting Expr as MessagePack bytes. Errors from either the lexer or parser are surfaced as JsError.
24.8.2 eval_func_expr
#[wasm_bindgen]
pub fn eval_func_expr(expr_bytes: &[u8], env_bytes: &[u8]) -> Result<Vec<u8>, JsError>Evaluates a functional expression (lambda calculus with builtins) against a provided environment. Both inputs and the output are MessagePack-encoded. The environment is deserialized as Vec<(String, Literal)> and converted to an Env.
This function is distinct from eval_expr (which evaluates GAT terms against a theory). eval_func_expr operates purely in the expression language; eval_expr operates in the schema-level term language.
24.8.3 execute_query
#[wasm_bindgen]
pub fn execute_query(query_bytes: &[u8], instance_bytes: &[u8]) -> Result<Vec<u8>, JsError>Deserializes an InstanceQuery and a WInstance from MessagePack, runs inst::execute_query, and serializes the results back. Each QueryMatch is converted to a JSON object with node_id, anchor, value, and fields keys before MessagePack encoding.
24.9 Source locations
The types and functions described in this chapter are spread across four crates:
crates/panproto-expr/src/:
expr.rs:Expr,Pattern,BuiltinOpenums andBuiltinOp::arity()eval.rs:eval,EvalConfig, step/depth-limited evaluatorenv.rs:Envtypebuiltin.rs:apply_builtindispatch
crates/panproto-expr-parser/src/:
token.rs:Tokenenum,Span,Spannedlexer.rs:tokenize,insert_layout,LexErrorparser.rs:parse,resolve_builtin, Pratt parser, all desugaring functionspretty.rs:pretty_print, precedence-aware serialization
crates/panproto-inst/src/:
instance_env.rs:eval_with_instance,apply_graph_builtin,resolve_node_ref,node_to_literalpoly.rs:fiber_at_anchor,fiber_decomposition,fiber_with_predicate,group_by,joinquery.rs:InstanceQuery,QueryMatch,execute,navigate_path,build_node_env,project_fields
crates/panproto-vcs/src/:
expr.rs:store_expr,load_exprobject.rs:Object::Exprvariant
crates/panproto-wasm/src/:
api.rs:parse_expr,eval_func_expr,execute_query