39  Benchmark Results

39.1 Test environment

Machine MacBook Pro 16-inch (2021)
Chip Apple M1 Max (10-core CPU: 8P + 2E, 32-core GPU)
Memory 64 GB unified
OS macOS 15.7.4 (24G517)
Rust rustc 1.94.0 (4a4ef493e 2026-03-02)
Cargo cargo 1.94.0 (85eff7c80 2026-01-15)
Profile bench (inherits release: lto = true, codegen-units = 1, opt-level = 3)
Framework divan v0.1.21, timer precision 41 ns
Samples 100 per benchmark; iteration count auto-tuned by divan

All benchmarks were run with no other significant processes active. Numbers report the median across 100 samples unless otherwise noted. The fastest column shows best-case (warm cache) performance; mean shows arithmetic average including outliers.

39.2 GAT Engine (panproto-gat)

39.2.1 find_sort_by_name — O(1) index lookup

Looks up the last sort by name in a theory of size \(n\), exercising the FxHashMap index cache.

\(n\) (sorts) fastest median mean iters/sample
10 6.19 ns 6.44 ns 6.89 ns 102,400
50 6.56 ns 6.97 ns 8.21 ns 102,400
100 8.06 ns 8.39 ns 8.45 ns 51,200
500 4.69 ns 4.89 ns 5.02 ns 102,400

Sub-10 ns for all sizes confirms O(1) behavior. The 500-sort case is faster than 100-sort because the FxHash distribution happens to place the target key (S499) in a bucket with better cache-line alignment. The important point: no growth with theory size.

39.2.2 theory_construction — Theory + Index build

Constructs a Theory with \(n\) sorts, \(n\) unary operations, and builds the three FxHashMap indices.

\(n\) fastest median mean iters/sample
10 526 ns 552 ns 553 ns 800
50 2.15 µs 2.25 µs 2.25 µs 200
100 3.92 µs 4.04 µs 4.44 µs 100
500 19.4 µs 19.6 µs 20.0 µs 100

Linear scaling (\(\approx\) 39 ns/sort at 500). The index build is amortized over all subsequent lookups.

39.2.3 colimit — Theory pushout

Computes the categorical pushout of two theories over a shared base. colimit_small = 3 shared + 5 extra sorts per theory; colimit_medium = 10 + 20; colimit_large = 50 + 100.

Scenario shared extra/theory fastest median mean
colimit_small 3 5 1.11 µs 1.18 µs 1.71 µs
colimit_medium 10 20 2.79 µs 2.94 µs 3.28 µs
colimit_large 50 100 12.0 µs 12.5 µs 14.1 µs

39.2.4 colimit_with_equations

Same as above but both theories carry equations (\(\min(n, 10)\) equations in the shared base, 5 per extension theory).

\(n\) (shared sorts) fastest median mean
10 2.10 µs 2.23 µs 2.24 µs
50 4.46 µs 4.71 µs 5.16 µs
100 7.37 µs 8.04 µs 10.0 µs

Equation checking adds \(\approx 30\%\) over sort/op-only colimits at the same scale.

39.2.5 resolve_theory — inheritance chain resolution

Resolves theory T_d at the bottom of a linear inheritance chain of depth \(d\). Each theory adds 1 sort and 1 nullary operation.

\(d\) (depth) fastest median mean
3 2.25 µs 2.33 µs 2.42 µs
10 10.9 µs 11.3 µs 11.5 µs
20 31.6 µs 32.0 µs 32.9 µs
50 129 µs 132 µs 135 µs
100 412 µs 418 µs 424 µs

Quadratic in depth (each level clones all ancestor sorts; Arc<str> cloning is O(1) per name but the vector grows). At depth 100, \(\approx 4.2\;\mu\text{s}\) per level.

39.2.6 check_morphism — Morphism validation

Validates a theory morphism on a theory with \(n\) sorts, \(n\) unary operations, and \(\min(n, 20)\) equations.

\(n\) identity (median) renaming (median)
10 2.96 µs 2.92 µs
50 10.0 µs 8.37 µs
100 16.5 µs 16.6 µs

Renaming morphisms are comparable to identity because the dominant cost is equation preservation checking (comparing mapped terms in the codomain), which requires iterating over codomain equations regardless of whether names changed.

39.3 Instance layer (panproto-inst)

39.3.1 parse_json_flat_fields — JSON → WInstance

Parses a flat JSON object with \(n\) string-valued fields into a WInstance (root + \(n\) leaf nodes).

\(n\) (fields) fastest median mean
3 1.83 µs 1.92 µs 1.98 µs
10 6.17 µs 6.37 µs 6.43 µs
50 29.0 µs 29.6 µs 29.7 µs

\(\approx 590\;\text{ns/field}\) at 50 fields. Dominated by serde_json deserialization + node/arc construction.

39.3.2 parse_json_nested_objects — nested JSON → WInstance

Parses a chain of nested objects, each with 3 leaf fields, to depth \(d\).

\(d\) (depth) total nodes fastest median mean
3 13 3.00 µs 3.12 µs 3.19 µs
10 41 8.17 µs 8.33 µs 8.43 µs
20 81 15.8 µs 16.3 µs 17.2 µs

\(\approx 200\;\text{ns/node}\) at depth 20. Nesting adds parent-map bookkeeping but does not cause a per-level blowup.

39.3.3 wtype_restrict — fused restrict pipeline

39.3.3.1 Identity migration (no nodes removed)

3-node instance (root + 2 string children), all vertices survive.

fastest median mean iters/sample
1.10 µs 1.14 µs 1.14 µs 400

39.3.3.2 Deep linear chain (all survive)

Linear chain schema root → v0 → v1 → ... → v_{n-1}, identity migration.

\(n\) (chain length) nodes fastest median mean
10 11 5.21 µs 5.29 µs 5.34 µs
50 51 23.2 µs 24.0 µs 24.2 µs
100 101 46.7 µs 49.1 µs 49.5 µs

\(\approx 490\;\text{ns/node}\) at 100 nodes = 2.04M nodes/sec.

39.3.3.3 Wide flat tree (all survive)

Root with \(n\) direct children (star topology), identity migration.

\(n\) (children) nodes fastest median mean
10 11 5.25 µs 5.67 µs 5.71 µs
50 51 22.4 µs 23.0 µs 23.2 µs

Comparable to deep chains — the fused BFS processes both topologies at the same per-node rate.

39.3.3.4 Contraction (drop intermediate nodes)

Linear chain of \(n\) nodes; migration keeps only root + first 2 vertices. All other nodes are dropped. Exercises the fused BFS’s ancestor propagation through non-surviving intermediates.

\(n\) (total chain) surviving fastest median mean iters/sample
10 3 1.56 µs 1.59 µs 1.60 µs 400
50 3 3.67 µs 3.75 µs 3.81 µs 100

Contraction on a 50-node chain produces only 3 output nodes in 3.75 µs. The BFS still visits all 50 nodes (to discover descendants), but the output construction is minimal.

39.3.4 functor_restrict_n_tables\(\Delta_F\) Precomposition

Restricts a functor instance with \(n\) tables (each 10 rows, 5 columns) via identity migration.

\(n\) (tables) rows total fastest median mean
5 50 4.00 µs 4.08 µs 4.11 µs
20 200 17.1 µs 17.3 µs 17.4 µs
50 500 41.7 µs 42.1 µs 42.7 µs

\(\approx 84\;\text{ns/row}\) at 500 rows.

39.3.5 functor_extend_n_tables\(\sigma_f\) left Kan extension

Extends (pushes forward) a functor instance. More expensive than restrict because it computes column unions and fills Null values for missing columns.

\(n\) (tables) fastest median mean
5 8.21 µs 8.33 µs 8.41 µs
20 33.0 µs 33.4 µs 33.5 µs
50 82.8 µs 84.3 µs 86.8 µs

\(\approx 2\times\) the cost of restrict due to column-union filling.

39.4 Migration engine (panproto-mig)

39.4.1 check_existence_n_vertices

Validates a migration’s well-formedness against protocol theory rules.

\(n\) (vertices) fastest median mean
10 1.42 µs 1.46 µs 1.77 µs
100 16.6 µs 17.0 µs 17.5 µs

39.4.2 compile_n_vertices

Compiles a migration specification into CompiledMigration (surviving sets, remap tables, resolvers).

\(n\) (vertices) fastest median mean
10 8.58 µs 8.87 µs 9.02 µs
100 79.1 µs 80.8 µs 81.5 µs

\(\approx 810\;\text{ns/vertex}\) at 100 vertices. Dominated by HashSet/HashMap construction for surviving sets and edge indices.

39.4.3 compose — Migration composition

39.4.3.1 Two identity migrations

\(n\) (vertices) fastest median mean
10 4.58 µs 4.77 µs 4.79 µs
50 20.6 µs 21.2 µs 21.2 µs
100 40.9 µs 42.4 µs 44.1 µs

39.4.3.2 Two migrations with renaming (source vertices remapped)

\(n\) (vertices) fastest median mean
10 4.46 µs 4.71 µs 4.73 µs
50 20.7 µs 21.5 µs 21.5 µs

Renaming adds negligible overhead: the precomputed inverse maps resolve hyper-edge remapping in O(1) per entry.

39.4.3.3 Chain of 5 migrations composed sequentially

Each migration is an identity on a chain schema with \(n\) vertices.

\(n\) (vertices) fastest median mean
10 26.9 µs 27.7 µs 28.0 µs
50 125 µs 131 µs 131 µs

\(\approx 5\times\) the cost of a single two-way compose, confirming linear scaling in chain length.

39.4.4 lift_wtype — apply compiled migration to instance

39.4.4.1 Simple (5-node chain, identity)

fastest median mean iters/sample
2.85 µs 3.02 µs 3.05 µs 200

39.4.4.2 Contraction (10-node chain, keep 3)

fastest median mean iters/sample
1.55 µs 1.60 µs 1.61 µs 400

39.4.4.3 At scale

\(n\) (nodes) fastest median mean
100 47.1 µs 51.2 µs 51.5 µs
500 238 µs 259 µs 264 µs

At 500 nodes: \(259\;\mu\text{s} \div 500 = 518\;\text{ns/node}\) = 1.93M nodes/sec. This includes the full pipeline: migration compilation lookup, fused restrict BFS, node remapping, arc construction, and WInstance output allocation.

39.5 Lens layer (panproto-lens)

39.5.1 get_identity_lens — forward direction (no complement)

Identity lens on a wide tree with \(n\) fields. All nodes survive; complement is empty.

\(n\) (fields) nodes fastest median mean
5 6 3.17 µs 3.29 µs 3.47 µs
20 21 11.7 µs 12.1 µs 12.8 µs
50 51 26.3 µs 27.1 µs 27.6 µs

39.5.2 get_projection_lens — forward direction (large complement)

Projection lens keeping only 2 of \(n\) fields. Complement stores \(n - 2\) dropped nodes.

\(n\) (fields) kept dropped fastest median mean
10 2 8 3.58 µs 3.75 µs 3.79 µs
50 2 48 13.3 µs 13.6 µs 13.7 µs

Projection is faster than identity because the output tree is smaller (fewer nodes to allocate), even though the complement capture adds work.

39.5.3 get_then_put_round_trip — full bidirectional cycle

Identity lens: getput → verify restored. Exercises both directions.

\(n\) (fields) fastest median mean
5 6.00 µs 6.17 µs 6.27 µs
20 22.2 µs 23.2 µs 23.9 µs
50 50.7 µs 51.8 µs 54.2 µs

put costs \(\approx 1.5\times\) get because it must merge the complement (dropped nodes, dropped arcs, contraction choices) back into the view.

39.5.4 get_then_put_projection — bidirectional with large complement

Projection lens keeping 2 of \(n\) fields. The complement carries the bulk of the data.

\(n\) (fields) fastest median mean
10 8.17 µs 8.37 µs 8.48 µs
50 33.1 µs 34.3 µs 35.0 µs

39.6 Derived throughput summary

Operation Scale Throughput
find_sort (O(1) index) 10–500 sorts 145–210M lookups/sec
parse_json (flat) 50 fields 1.69M fields/sec
wtype_restrict (deep chain) 100 nodes 2.04M nodes/sec
lift_wtype (full pipeline) 500 nodes 1.93M nodes/sec
functor_restrict 500 rows 11.9M rows/sec
colimit 150 sorts 80K colimits/sec
compose (two migrations) 100 vertices 23.6K composes/sec
Lens round-trip (get + put) 50 fields 19.3K round-trips/sec

39.7 How to reproduce

# Run all benchmarks
cargo bench --workspace

# Run a specific crate's benchmarks
cargo bench -p panproto-gat
cargo bench -p panproto-inst
cargo bench -p panproto-mig
cargo bench -p panproto-lens

# Run a specific benchmark by name filter
cargo bench -p panproto-gat -- find_sort
cargo bench -p panproto-inst -- wtype_restrict_deep

All benchmarks use divan v0.1.21. Each benchmark runs 100 samples with auto-tuned iteration counts (shown in the iters/sample column). Results include fastest, slowest, median, and mean.