17 VCS Engine
The panproto-vcs crate implements schematic version control. Schemas are content-addressed objects stored in a commit DAG, with branches, three-way merge, and data lifting through history.
17.1 Module layout
The VCS engine is organized into focused modules:
crates/panproto-vcs/src/
lib.rs -- public API, re-exports
error.rs -- VcsError (thiserror, non_exhaustive)
hash.rs -- canonical serialization + blake3 content addressing
object.rs -- ObjectId, CommitObject, Object enum
store.rs -- Store trait, HeadState, ReflogEntry
fs_store.rs -- FsStore (.panproto/ on disk)
mem_store.rs -- MemStore (HashMap-backed, for tests + WASM)
refs.rs -- branches, tags, resolve_ref
reflog.rs -- append-only audit log of ref changes
index.rs -- staging area + validation pipeline
dag.rs -- merge base, path finding, log walk, compose_path
merge.rs -- three-way schema merge + conflict detection
rebase.rs -- replay commits onto a new base
cherry_pick.rs -- apply single commit's migration
reset.rs -- soft/mixed/hard HEAD reset
stash.rs -- save/restore working state
bisect.rs -- binary search for breaking commit
blame.rs -- schema element attribution
auto_mig.rs -- derive Migration from SchemaDiff
rename_detect.rs -- heuristic rename detection between versions
status.rs -- HEAD vs index vs working file
repo.rs -- Repository orchestration (porcelain)
gc.rs -- garbage collection
17.2 Object store
ObjectId is a blake3 hash of the canonical MessagePack serialization. Canonical forms use BTreeMap instead of HashMap and exclude precomputed indices (derived data). The store contains:
Object::Schema(Schema): a schema snapshotObject::Migration { src, tgt, mapping }: a morphism between two schemasObject::Commit(CommitObject): a point in the DAGObject::Tag(TagObject): an annotated tag storing target object ID, tagger name, timestamp, and messageObject::DataSet(DataSetObject): a content-addressed data snapshot bound to a schema versionObject::Complement(ComplementObject): complement from data migration for backward restorationObject::Protocol(Protocol): a protocol definitionObject::Expr(Expr): a standalone expressionObject::EditLog(EditLogObject): a sequence of edits for incremental migration
A CommitObject stores the core schema history:
pub struct CommitObject {
pub schema_id: ObjectId,
pub parents: Vec<ObjectId>, // 0=root, 1=normal, 2=merge
pub migration_id: Option<ObjectId>,
pub protocol: String,
pub author: String,
pub timestamp: u64,
pub message: String,
pub renames: Vec<SiteRename>,
pub protocol_id: Option<ObjectId>,
pub data_ids: Vec<ObjectId>,
pub complement_ids: Vec<ObjectId>,
pub edit_log_ids: Vec<ObjectId>, // references to EditLogObject values
}17.3 Configuration and options
MergeOptions controls merge behavior:
pub struct MergeOptions {
pub no_commit: bool, // stage result without committing
pub ff_only: bool, // fail if not fast-forwardable
pub no_ff: bool, // force merge commit even on fast-forward
pub squash: bool, // collapse into a single non-merge commit
pub message: Option<String>, // custom commit message
}CherryPickOptions controls cherry-pick behavior: no_commit stages without committing, record_origin appends the source commit ID to the message.
GcOptions has a single dry_run field. When true, gc::gc reports unreachable objects without deleting them.
17.4 Dag algorithms
All DAG algorithms operate on the commit graph \(G = (V, E)\) where \(V\) is the set of commits and \(E\) is the parent relation.
- Merge base: Two-frontier BFS. Alternately expand from each side; the first vertex visited by both frontiers is the lowest common ancestor. For commits \(a\) and \(b\), this finds \(\text{LCA}(a, b)\) in \(O(|V|)\) worst case, typically much less.
- Path finding: BFS backwards from target, reconstruct via parent map. Returns the sequence of commits \((c_0, c_1, \ldots, c_k)\) where \(c_0\) is the source and \(c_k\) is the target.
- Log walk: Max-heap ordered by timestamp for topological-chronological order. Each commit is visited exactly once (tracked via a
HashSet). Merge commits push both parents onto the heap. Complexity: \(O(n \log n)\) for \(n\) commits. - Compose path: Given a path \((c_0, \ldots, c_k)\), chain migrations via
panproto_mig::compose. The composed migration \(M_{0 \to k} = M_{k-1 \to k} \circ \cdots \circ M_{0 \to 1}\) transforms data from schema \(S_0\) to schema \(S_k\). - Is ancestor: BFS reachability check from candidate to target. Returns true if there exists a path in \(G\) from the candidate to the target following parent edges.
DFS visits one branch fully before starting the other, producing non-chronological output for interleaved branches. The timestamp-ordered heap interleaves branches chronologically, matching what users expect from git log.
17.5 Repository operations
Beyond core commits, repo.rs exposes additional operations:
Repository::amend(): creates a commit with the same parent(s) as current HEAD but updated schema, migration, and message. The previous HEAD becomes unreachable (collectible bygc).refs::force_delete_branch: deletes a branch regardless of merge statusrefs::rename_branch: renames a branch ref, updating HEAD if the current branch is renamedrefs::create_and_checkout_branch: creates a branch at HEAD and switches to it atomicallyrefs::create_annotated_tag: creates aTagObjectin the store and points a tag ref at itrefs::create_tag_force: overwrites an existing tag refstash::stash_apply: re-applies the top stash entry without removing it from the stackstash::stash_show: returns the schema diff between the stash entry and its parentstash::stash_clear: drops all stash entries and removes therefs/stashref
17.6 Error handling
VcsError is #[non_exhaustive] and includes:
ObjectNotFound,RefNotFound,BranchNotFound,TagNotFoundBranchAlreadyExists,TagAlreadyExistsNotOnBranch,DetachedHead,IndexEmpty,IndexNotEmptyMergeConflict,RebaseConflict,CherryPickConflictFastForwardRequired,NothingToAmendStashEmpty,InvalidRef,CorruptObjectRemoteNotSupported
17.7 Three-way merge
Given schemas base, ours, theirs, the merge computes a combined result. Categorically, this is a pushout (colimit) over the shared base.
The algorithm:
diff(base, ours)anddiff(base, theirs): the extended diff covers all 13 schema fields (vertices, edges, constraints, hyper-edges, required, NSIDs, variants, orderings, recursion points, usage modes, spans, nominal)- For every element in every field, classify its fate on each side: unchanged, added, removed, or modified
- Apply the merge rule: one-sided changes accepted; identical changes deduplicated; incompatible changes flagged as typed conflicts with the base value retained
- No tie-breaking: the merge is commutative, meaning
merge(base, A, B) == merge(base, B, A) - Rebuild precomputed indices
- Derive migrations from ours to merged and theirs to merged via
auto_mig::derive_migration
Commutativity means the merge result doesn’t depend on which branch is “ours” vs “theirs.” This eliminates bugs where merge order affects the outcome. It’s a stronger guarantee than git’s text merge, which can produce different conflict markers depending on merge direction.
17.8 Rename detection
The rename_detect module detects renames between schema versions. When auto_mig::derive_migration encounters a vertex absent in one schema and a structurally similar vertex in the other, the detector identifies this as a rename rather than a delete-and-add.
17.8.1 Algorithm
For vertex rename detection:
- Collect removed vertices (in old, not in new) and added vertices (in new, not in old)
- For each (removed, added) pair, compute structural similarity:
- Same vertex kind: +0.3
- Same outgoing edge names: +0.3
- Same incoming edge names: +0.2
- Edit distance of vertex ID names ≤ 3: +0.2
- Greedy bipartite matching: take highest-scoring pairs first
- Return pairs ≥ threshold
For edge label rename detection, the algorithm compares removed and added edges between the same surviving vertex pair, scoring by name edit distance normalized to string length.
17.8.2 Commitobject integration
Detected renames are stored in CommitObject.renames as Vec<SiteRename>. This field is #[serde(default, skip_serializing_if = "Vec::is_empty")], so old commits without renames deserialize correctly.
See Section 18.2 for the full naming and identity architecture.