Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Schema version control semantics

In plain terms

panproto-vcs is git, but for schemas. It tracks a history of schemas the way git tracks a history of source files: commits, branches, tags, merges, diffs, blame. The CLI verbs are the same (init, add, commit, branch, merge, log, diff).

Two things make it different from git applied to the schema files themselves:

  1. The diff and merge operate on the schema, not the text. schema diff does not show you a unified diff of the JSON; it shows you what changed structurally: which vertices were added, which edges renamed, which constraints tightened. Merge does not three-way-merge the bytes; it merges the schema graph at the structural level, so you cannot end up with a syntactically valid but semantically broken schema after a merge.
  2. Data and migrations are versioned alongside the schemas. Every commit records a schema snapshot and (optionally) the data instances that conformed to it; migrations between schemas are stored as their own content-addressed objects, paired with the complements needed to invert them. Branches diverge with their data; merges reconcile both.

The merge operation is the place where this gets interesting. Three-way text merge fails when both sides edit the same line. The schema-level analogue is two branches that both add a field with the same name but different types. panproto-vcs has a precise, well-defined operation for resolving this: the schemas are pushed out along their common ancestor. The result is the smallest schema containing both branches’ additions, with the conflict surfaced as an explicit refinement constraint that the user resolves.

The DAG

panproto-vcs is structured exactly like git: a content-addressed DAG of immutable objects.

ObjectWhat it holds
FileSchema / SchemaTree / FlatSchemaA schema at a point in time, in per-file, tree, or migration-endpoint form.
MigrationA morphism between two schemas, identified by their object IDs.
ComplementThe complement data needed to invert a data migration.
DataSetA set of instances conforming to a specific schema.
CstComplementThe format-preserving CST data for byte-identical reconstruction.
Protocol / Theory / TheoryMorphism / Expr / EditLogSupporting objects referenced by commits and migrations.
CommitA pointer to a schema, an optional pointer to data, a parent commit list, an author, a message.
TagAn annotated tag object pointing to another object.
BranchA mutable reference to a commit; lives under .panproto/refs/heads/.

Every object is content-addressed with a blake3 hash of its canonical serialisation. Refs (branches under refs/heads/, tags under refs/tags/) live under .panproto/refs/. Objects live under .panproto/objects/. The structural similarity to .git/ is intentional: the existing mental model transfers.

Merge as pushout

A three-way merge in git is: take base , ours , theirs , and produce a result that contains the changes from relative to and the changes from relative to . When the changes overlap on the same line, conflict.

The schema analogue: , , are schemas; and are both descendants of . The merge result is the pushout of and along :

        B ------> O
        |         |
        |         |
        v         v
        T ------> M

The pushout is the unique smallest schema containing both and and respecting their shared structure from . “Unique smallest” is made precise by a universal property: any other schema that also contains and admits a unique morphism from to .

panproto-vcs does not just compute the pushout: it verifies the universal property. vcs::merge::verify_pushout_universal checks that the merge result mediates uniquely from any alternative cocone, returning the mediator vertex map. If the universal-property check fails, the merge raises PushoutError::UniversalFactorizationFailure rather than producing a wrong result.

For the formal pushout construction, the cocone definition, and exactly what is checked, see Pushouts and merge.

Conflicts

A merge conflict arises when the pushout would introduce an inconsistency: two branches add a field with the same name but incompatible types, or one branch removes a vertex the other branch still references. Conflicts are reported as explicit objects (rather than text markers) and resolved by editing the conflict descriptor.

Data versioning

Commits can carry data instances. When a branch’s schema migrates, the data carried by its commits is automatically lifted forward by the migration’s lens. Branches can therefore diverge in both schema and data; merging both kinds of divergence in one operation is what schema merge does.

A consequence: history rewriting (rebase, amend) on a branch carrying data must lift the data through the rewritten history. panproto-vcs does this; the data is not a passive blob.

Two threads sit directly behind panproto-vcs. The categorical-VCS lineage (Mimram and Di Giusto on patches as morphisms with merge as pushout (Mimram and Giusto 2013), Angiuli and colleagues’ homotopical patch theory (Angiuli et al. 2014), Roundy’s Darcs (Roundy 2005)) supplies the “merge is the pushout of the divergent patches against the common ancestor” semantics and the diagnosis of conflicts as failures of the pushout to exist. The schema-evolution lineage (Curino, Moon, and Zaniolo’s PRISM workbench (Curino et al. 2008) and Litt, van Hardenberg, and Henry’s Cambria [Litt et al. (2020); Litt et al. (2021)]) supplies the engineering vocabulary: schema-modification operators with forward and backward mappings, quasi-inverses for the operators that lose information, and a directed graph of schema versions connected by lenses. panproto-vcs is the four-artifact unification of these lines, with the protocol theory, schema, data, and lens complement committed together into a single content-addressed DAG. See Related work for the full discussion.

See also