17  VCS Engine

The panproto-vcs crate implements schematic version control. Schemas are content-addressed objects stored in a commit DAG, with branches, three-way merge, and data lifting through history.

17.1 Module layout

The VCS engine is organized into focused modules:

crates/panproto-vcs/src/
  lib.rs           -- public API, re-exports
  error.rs         -- VcsError (thiserror, non_exhaustive)
  hash.rs          -- canonical serialization + blake3 content addressing
  object.rs        -- ObjectId, CommitObject, Object enum
  store.rs         -- Store trait, HeadState, ReflogEntry
  fs_store.rs      -- FsStore (.panproto/ on disk)
  mem_store.rs     -- MemStore (HashMap-backed, for tests + WASM)
  refs.rs          -- branches, tags, resolve_ref
  reflog.rs        -- append-only audit log of ref changes
  index.rs         -- staging area + validation pipeline
  dag.rs           -- merge base, path finding, log walk, compose_path
  merge.rs         -- three-way schema merge + conflict detection
  rebase.rs        -- replay commits onto a new base
  cherry_pick.rs   -- apply single commit's migration
  reset.rs         -- soft/mixed/hard HEAD reset
  stash.rs         -- save/restore working state
  bisect.rs        -- binary search for breaking commit
  blame.rs         -- schema element attribution
  auto_mig.rs      -- derive Migration from SchemaDiff
  rename_detect.rs -- heuristic rename detection between versions
  status.rs        -- HEAD vs index vs working file
  repo.rs          -- Repository orchestration (porcelain)
  gc.rs            -- garbage collection

17.2 Object store

ObjectId is a blake3 hash of the canonical MessagePack serialization. Canonical forms use BTreeMap instead of HashMap and exclude precomputed indices (derived data). The store contains:

  • Object::Schema(Schema): a schema snapshot
  • Object::Migration { src, tgt, mapping }: a morphism between two schemas
  • Object::Commit(CommitObject): a point in the DAG
  • Object::Tag(TagObject): an annotated tag storing target object ID, tagger name, timestamp, and message
  • Object::DataSet(DataSetObject): a content-addressed data snapshot bound to a schema version
  • Object::Complement(ComplementObject): complement from data migration for backward restoration
  • Object::Protocol(Protocol): a protocol definition
  • Object::Expr(Expr): a standalone expression
  • Object::EditLog(EditLogObject): a sequence of edits for incremental migration

A CommitObject stores the core schema history:

pub struct CommitObject {
    pub schema_id: ObjectId,
    pub parents: Vec<ObjectId>,       // 0=root, 1=normal, 2=merge
    pub migration_id: Option<ObjectId>,
    pub protocol: String,
    pub author: String,
    pub timestamp: u64,
    pub message: String,
    pub renames: Vec<SiteRename>,
    pub protocol_id: Option<ObjectId>,
    pub data_ids: Vec<ObjectId>,
    pub complement_ids: Vec<ObjectId>,
    pub edit_log_ids: Vec<ObjectId>,  // references to EditLogObject values
}

17.3 Configuration and options

MergeOptions controls merge behavior:

pub struct MergeOptions {
    pub no_commit: bool,   // stage result without committing
    pub ff_only: bool,     // fail if not fast-forwardable
    pub no_ff: bool,       // force merge commit even on fast-forward
    pub squash: bool,      // collapse into a single non-merge commit
    pub message: Option<String>,  // custom commit message
}

CherryPickOptions controls cherry-pick behavior: no_commit stages without committing, record_origin appends the source commit ID to the message.

GcOptions has a single dry_run field. When true, gc::gc reports unreachable objects without deleting them.

17.4 Dag algorithms

All DAG algorithms operate on the commit graph \(G = (V, E)\) where \(V\) is the set of commits and \(E\) is the parent relation.

  • Merge base: Two-frontier BFS. Alternately expand from each side; the first vertex visited by both frontiers is the lowest common ancestor. For commits \(a\) and \(b\), this finds \(\text{LCA}(a, b)\) in \(O(|V|)\) worst case, typically much less.
  • Path finding: BFS backwards from target, reconstruct via parent map. Returns the sequence of commits \((c_0, c_1, \ldots, c_k)\) where \(c_0\) is the source and \(c_k\) is the target.
  • Log walk: Max-heap ordered by timestamp for topological-chronological order. Each commit is visited exactly once (tracked via a HashSet). Merge commits push both parents onto the heap. Complexity: \(O(n \log n)\) for \(n\) commits.
  • Compose path: Given a path \((c_0, \ldots, c_k)\), chain migrations via panproto_mig::compose. The composed migration \(M_{0 \to k} = M_{k-1 \to k} \circ \cdots \circ M_{0 \to 1}\) transforms data from schema \(S_0\) to schema \(S_k\).
  • Is ancestor: BFS reachability check from candidate to target. Returns true if there exists a path in \(G\) from the candidate to the target following parent edges.
NoteMax-Heap Over DFS for Log Walk

DFS visits one branch fully before starting the other, producing non-chronological output for interleaved branches. The timestamp-ordered heap interleaves branches chronologically, matching what users expect from git log.

17.5 Repository operations

Beyond core commits, repo.rs exposes additional operations:

  • Repository::amend(): creates a commit with the same parent(s) as current HEAD but updated schema, migration, and message. The previous HEAD becomes unreachable (collectible by gc).
  • refs::force_delete_branch: deletes a branch regardless of merge status
  • refs::rename_branch: renames a branch ref, updating HEAD if the current branch is renamed
  • refs::create_and_checkout_branch: creates a branch at HEAD and switches to it atomically
  • refs::create_annotated_tag: creates a TagObject in the store and points a tag ref at it
  • refs::create_tag_force: overwrites an existing tag ref
  • stash::stash_apply: re-applies the top stash entry without removing it from the stack
  • stash::stash_show: returns the schema diff between the stash entry and its parent
  • stash::stash_clear: drops all stash entries and removes the refs/stash ref

17.6 Error handling

VcsError is #[non_exhaustive] and includes:

  • ObjectNotFound, RefNotFound, BranchNotFound, TagNotFound
  • BranchAlreadyExists, TagAlreadyExists
  • NotOnBranch, DetachedHead, IndexEmpty, IndexNotEmpty
  • MergeConflict, RebaseConflict, CherryPickConflict
  • FastForwardRequired, NothingToAmend
  • StashEmpty, InvalidRef, CorruptObject
  • RemoteNotSupported

17.7 Three-way merge

Given schemas base, ours, theirs, the merge computes a combined result. Categorically, this is a pushout (colimit) over the shared base.

The algorithm:

  1. diff(base, ours) and diff(base, theirs): the extended diff covers all 13 schema fields (vertices, edges, constraints, hyper-edges, required, NSIDs, variants, orderings, recursion points, usage modes, spans, nominal)
  2. For every element in every field, classify its fate on each side: unchanged, added, removed, or modified
  3. Apply the merge rule: one-sided changes accepted; identical changes deduplicated; incompatible changes flagged as typed conflicts with the base value retained
  4. No tie-breaking: the merge is commutative, meaning merge(base, A, B) == merge(base, B, A)
  5. Rebuild precomputed indices
  6. Derive migrations from ours to merged and theirs to merged via auto_mig::derive_migration
TipCommutativity in Merge

Commutativity means the merge result doesn’t depend on which branch is “ours” vs “theirs.” This eliminates bugs where merge order affects the outcome. It’s a stronger guarantee than git’s text merge, which can produce different conflict markers depending on merge direction.

17.8 Rename detection

The rename_detect module detects renames between schema versions. When auto_mig::derive_migration encounters a vertex absent in one schema and a structurally similar vertex in the other, the detector identifies this as a rename rather than a delete-and-add.

17.8.1 Algorithm

For vertex rename detection:

  1. Collect removed vertices (in old, not in new) and added vertices (in new, not in old)
  2. For each (removed, added) pair, compute structural similarity:
    • Same vertex kind: +0.3
    • Same outgoing edge names: +0.3
    • Same incoming edge names: +0.2
    • Edit distance of vertex ID names ≤ 3: +0.2
  3. Greedy bipartite matching: take highest-scoring pairs first
  4. Return pairs ≥ threshold

For edge label rename detection, the algorithm compares removed and added edges between the same surviving vertex pair, scoring by name edit distance normalized to string length.

17.8.2 Commitobject integration

Detected renames are stored in CommitObject.renames as Vec<SiteRename>. This field is #[serde(default, skip_serializing_if = "Vec::is_empty")], so old commits without renames deserialize correctly.

See Section 18.2 for the full naming and identity architecture.