10 Schema Version Control

You already use version control for code. panproto extends that idea to schemas themselves: every schema, migration, and commit gets a blake3 hash based on its structure. Two schemas with the same shape produce the same hash, even if created by different teams. Identity comes from content, not names.

Each commit stores a schema graph $G$, the morphism $f: G_{\mathrm{parent}} \to G$ from its parent, and the complement $C$ for backward migration.

10.1 Thinking like git

panproto’s version control works on a familiar vocabulary:

Git concept	panproto equivalent
Blob (file content)	Schema (graph structure)
Diff/patch	Migration (graph morphism $f: G_1 \to G_2$)
Three-way merge	Structural merge (pushout of schemas)
Content-addressing (SHA)	Content-addressing (blake3)
Commit DAG	Schema evolution DAG
Branch	Named pointer to a schema version
Conflict	Structural incompatibility

The key difference: git merges text and hopes the result compiles. panproto merges structure and guarantees data integrity.

10.2 The object store

Storage mirrors git’s layout.

.panproto/
  objects/<hex[0..2]>/<hex[2..]>   # content-addressed objects
  refs/heads/main                   # branch pointers
  refs/tags/v1.0                    # tag pointers
  HEAD                              # current branch
  logs/                             # reflog (audit trail)

10.3 Core commands

The CLI borrows git’s vocabulary where concepts align.

10.3.1 `schema init`

Creates a .panproto/ directory with the object store, refs, and HEAD pointing to main. Unlike git, there is no working tree; the repository tracks one schema at a time.

schema init
schema init my-project
schema init -b develop    # choose a different initial branch name

10.3.2 `schema add`

Stages a schema for the next commit. The command loads the schema JSON, computes a structural diff against HEAD’s schema, auto-derives a migration (the graph morphism), and validates it through the existence checker.

As of v0.6, schema add also type-checks all equations in the schema’s theory and verifies that the auto-derived migration preserves them. Invalid schemas are caught before they enter the repository.

If the diff involves vertex renames or edge contractions that can’t be auto-derived, supply an explicit migration file:

schema add schema.json
schema add schema.json --migration explicit-migration.json
schema add --dry-run schema.json     # preview without writing to the index
schema add --force schema.json       # stage even if validation fails

10.3.3 `schema commit`

Creates a commit object storing the schema’s content hash, the migration’s content hash, parent commit IDs, and metadata. It advances the current branch ref and appends a reflog entry. Every commit carries enough information to migrate instance data forward or backward.

schema commit -m "add verification status field"
schema commit --amend -m "add verification status field (v2)"
schema commit --skip-verify -m "experimental: partial theory"

Equation verification runs again at commit time as a final safety check. The --skip-verify flag bypasses type-checking for experimental work; use it sparingly.

10.3.4 `schema diff`

Computes a structural diff between two schema files. It reports added and removed vertices and edges, kind changes, constraint changes, variant changes, and ordering changes.

Git’s diff operates on lines of text. schema diff operates on the schema graph: it knows that adding a vertex differs from adding an edge, and that removing a coproduct variant is a type error while removing an optional field is lossy.

schema diff old-schema.json new-schema.json
schema diff --stat old-schema.json new-schema.json     # summary statistics
schema diff --staged                                    # diff staged against HEAD
schema diff --detect-renames old-schema.json new-schema.json

The --detect-renames flag uses structural similarity heuristics to identify renamed vertices and edge labels. Detected renames are shown with confidence scores.

10.3.5 `schema merge`

Merges a branch into the current branch by finding the merge base (lowest common ancestor in the DAG), computing diffs from the base to both tips, and applying non-conflicting changes.

Merge diverges most from git. Git operates on lines of text and produces conflict markers. panproto computes the structural merge of the two divergent schemas over their common base.¹

One-sided changes are accepted. Identical changes from both sides are deduplicated. Incompatible changes produce typed conflicts: not text markers, but structured objects like BothModifiedVertex, BothModifiedConstraint, DeleteModifyVertex. Fast-forward merges work as in git.

As of v0.6, the merge algorithm uses pullback-based overlap detection to identify shared structure between divergent schemas.

schema merge feature
schema merge --verbose feature        # show overlap computation details
schema merge --no-commit feature      # leave the result staged
schema merge --ff-only feature        # only allow fast-forward
schema merge --no-ff feature          # force a merge commit
schema merge --squash feature         # squash into a single commit
schema merge --abort                  # abort a conflicted merge
schema merge -m "merge feature into main" feature

10.3.6 `schema lift`

Migrates instance data from one schema version to another. The command finds the path between two refs in the DAG, composes all migrations along the path into a single morphism $f: G_1 \to G_n$, and applies the functorial lift to the record.

Git tracks file history but can’t transform file contents based on structural changes. panproto can, because migrations are graph morphisms with a well-defined action on data.

schema lift --migration mig.json --src-schema v1.json --tgt-schema v2.json record.json

Exercise: Path independence

If the DAG has two distinct paths from commit $A$ to commit $B$, does schema lift produce the same result on both paths? What guarantees this?

Answer

Functoriality. Migration morphisms compose associatively: if $f: A \to B$ and $g: B \to C$, then $g \circ f: A \to C$ is the same regardless of how the composition is parenthesized. The DAG’s commutativity constraint ensures that for any two paths from $A$ to $C$, the composed morphisms are equal. This is a coherence condition on the category of schemas.

10.4 Inspecting history

10.4.1 `schema log`

Walks the commit DAG from HEAD backwards, printing each commit’s ID, author, timestamp, message, and schema hash. For merge commits, shows all parent IDs.

schema log
schema log -n 5
schema log --oneline                  # compact one-line format
schema log --graph                    # visualize branch topology
schema log --all                      # show all branches

10.4.2 `schema show`

Inspects any object in the store by ref name or object ID. For commits, shows the schema ID, parent IDs, migration ID, protocol, author, and message. For schemas, shows vertex and edge counts.

schema show main
schema show v1.0
schema show --stat main               # change summary statistics

10.4.3 `schema blame`

Shows which commit introduced a specific schema element: a vertex, an edge, or a constraint. It walks the DAG backwards from HEAD, comparing each commit’s schema to its parent’s. You can ask “who added the user_status vertex?” or “when was the maxLength constraint on text changed?”

schema blame --element-type vertex user_status
schema blame --element-type constraint "text:maxLength"

10.5 Branching and merging

10.5.1 `schema branch`

Creates, lists, or deletes branches. Branches are lightweight pointers to commit IDs.

schema branch                         # list all branches
schema branch feature                 # create a branch
schema branch -d feature              # delete (safe: must be merged)
schema branch -D feature              # force-delete
schema branch -m old-name new-name    # rename

10.5.2 `schema checkout`

Switches HEAD to the named branch, or detaches HEAD at a specific commit.

schema checkout feature
schema checkout -b feature            # create and switch in one step
schema checkout --detach main         # detach HEAD at a specific ref

10.5.3 `schema rebase`

Replays the current branch’s commits onto another branch. It finds the merge base, collects all commits from the base to HEAD, then replays each one on top of the target via three-way merge.

Each replay step uses structural three-way merge rather than textual merge. This detects schema-level conflicts (like incompatible kind changes) that git rebase would miss.

schema rebase main

10.5.4 `schema cherry-pick`

Applies a single commit’s schema change to the current branch. It extracts the diff between the commit and its parent, then three-way merges it onto HEAD.

schema cherry-pick abc1234...
schema cherry-pick -n abc1234...      # apply without committing
schema cherry-pick -x abc1234...      # record source commit in message

Exercise: Merge commutativity

schema merge claims the structural merge is commutative: merging branch A into B produces the same schema as merging B into A. Does this imply the migration morphisms from the base to each merge result are also identical?

Answer

The schemas are identical, but the morphism provenance can differ. Merging A into B produces composition legs $A \to M$ and $B \to M$. Merging B into A produces legs $B \to M'$ and $A \to M'$. If $M = M'$ (same merged schema), the legs are the same pair of morphisms, just listed in opposite order. The provenance metadata differs, but the mathematical content is identical.

10.6 Recovery

10.6.1 `schema reset`

Moves HEAD to a different commit. Three modes match git:

--soft: moves the ref only; the index is unchanged
--mixed (default): moves the ref and clears the index
--hard: moves the ref, clears the index, and overwrites the working schema

All modes append a reflog entry, so the old position is recoverable.

schema reset --soft v1.0
schema reset --hard HEAD~3

10.6.2 `schema reflog`

Shows the history of mutations to a ref. Every commit, merge, checkout, reset, rebase, and cherry-pick appends an entry recording the old and new ref values. If a reset or rebase moves HEAD past a commit you need, the reflog has the old ID.

schema reflog
schema reflog main
schema reflog --all                   # show reflogs for all refs

10.6.3 `schema bisect`

Binary search for the commit that introduced a breaking change. Given a known-good and known-bad commit, bisect finds the path between them and presents the midpoint for testing. It converges in $O(\log n)$ steps.

schema bisect v1.0 HEAD

10.7 Other commands

10.7.1 `schema tag`

Creates, lists, or deletes tags. Tags are immutable pointers to commits, typically used for schema releases.

schema tag v1.0
schema tag -d v1.0
schema tag -a v1.0 -m "release"       # annotated tag

10.7.2 `schema gc`

Marks all objects reachable from branches, tags, and HEAD, then deletes everything else.

schema gc
schema gc --dry-run                   # preview without deleting

10.8 Quick start

schema init
schema add schema.json
schema commit -m "initial ATProto schema"

schema branch add-verification
schema checkout add-verification
schema add schema-v2.json
schema commit -m "add verification status field"

schema checkout main
schema merge add-verification

Exercise: Complement storage

Each commit stores a complement $C$ for backward migration. If a forward migration $f: G_1 \to G_2$ adds a required field with no default, what does $C$ contain? Can every forward migration be reversed?

Answer

The complement $C$ is empty for this direction. In a forward migration that adds a required field, the new field’s value is provided (via a default), not discarded. The complement records what get threw away, and in the forward direction nothing is thrown away. Reversal of this migration (going backward) would discard the added field, and that direction’s complement would store the discarded value. Not every forward migration can be reversed.

10.9 A note on VCS inspiration

panproto’s schema version control draws on two traditions. The first is git itself: content-addressing, DAG-structured history, lightweight branches, and the reflog. The second is patch-theory version control systems like Pijul and Darcs, which model changes as first-class mathematical objects.²

The payoff is that structure-aware diffs compose better than text diffs. A schema migration composes with another migration to produce a valid migration. A text patch composed with another text patch might produce gibberish. This guarantees data integrity where git’s merge can only guarantee syntactic non-conflict.

10.9.1 Remote commands (planned)

The commands schema remote, schema push, schema pull, schema fetch, and schema clone are reserved for future distributed operations. Currently they return an error indicating that schema repositories are local-only.

The structural merge is a categorical pushout of the two schemas over their common ancestor. For every schema element (vertices, edges, constraints, hyper-edges, variants, orderings, recursion points, usage modes), the merge classifies each side’s change as unchanged, added, removed, or modified. The merge is commutative: swapping the two branches produces an identical result.↩︎
Pijul’s patches form a category where composition is associative. panproto’s migrations are morphisms in a category of schemas. Both systems benefit from the same insight: when your “diffs” are mathematical objects with well-defined composition rules, merge conflicts become algebraic problems rather than heuristic ones.↩︎

# Schema Version Control {#sec-version-control} You already use version control for code. panproto extends that idea to schemas themselves: every schema, migration, and commit gets a blake3 hash based on its structure. Two schemas with the same shape produce the same hash, even if created by different teams. Identity comes from content, not names. Each commit stores a schema graph $G$, the morphism $f: G_{\mathrm{parent}} \to G$ from its parent, and the complement $C$ for backward migration. ## Thinking like git panproto's version control works on a familiar vocabulary: | Git concept | panproto equivalent | |---|---| | Blob (file content) | Schema (graph structure) | | Diff/patch | Migration (graph morphism $f: G_1 \to G_2$) | | Three-way merge | Structural merge (pushout of schemas) | | Content-addressing (SHA) | Content-addressing (blake3) | | Commit DAG | Schema evolution DAG | | Branch | Named pointer to a schema version | | Conflict | Structural incompatibility | The key difference: git merges text and hopes the result compiles. panproto merges structure and guarantees data integrity. ## The object store Storage mirrors git's layout. ``` .panproto/ objects/<hex[0..2]>/<hex[2..]> # content-addressed objects refs/heads/main # branch pointers refs/tags/v1.0 # tag pointers HEAD # current branch logs/ # reflog (audit trail) ``` ## Core commands The CLI borrows git's vocabulary where concepts align. ### `schema init` Creates a `.panproto/` directory with the object store, refs, and HEAD pointing to `main`. Unlike git, there is no working tree; the repository tracks one schema at a time. ```bash schema init schema init my-project schema init -b develop # choose a different initial branch name ``` ### `schema add` Stages a schema for the next commit. The command loads the schema JSON, computes a structural diff against HEAD's schema, auto-derives a migration (the graph morphism), and validates it through the existence checker. As of v0.6, `schema add` also type-checks all equations in the schema's theory and verifies that the auto-derived migration preserves them. Invalid schemas are caught before they enter the repository. If the diff involves vertex renames or edge contractions that can't be auto-derived, supply an explicit migration file: ```bash schema add schema.json schema add schema.json --migration explicit-migration.json schema add --dry-run schema.json # preview without writing to the index schema add --force schema.json # stage even if validation fails ``` ### `schema commit` Creates a commit object storing the schema's content hash, the migration's content hash, parent commit IDs, and metadata. It advances the current branch ref and appends a reflog entry. Every commit carries enough information to migrate instance data forward or backward. ```bash schema commit -m "add verification status field" schema commit --amend -m "add verification status field (v2)" schema commit --skip-verify -m "experimental: partial theory" ``` Equation verification runs again at commit time as a final safety check. The `--skip-verify` flag bypasses type-checking for experimental work; use it sparingly. ### `schema diff` Computes a structural diff between two schema files. It reports added and removed vertices and edges, kind changes, constraint changes, variant changes, and ordering changes. Git's diff operates on lines of text. `schema diff` operates on the schema graph: it knows that adding a vertex differs from adding an edge, and that removing a coproduct variant is a type error while removing an optional field is lossy. ```bash schema diff old-schema.json new-schema.json schema diff --stat old-schema.json new-schema.json # summary statistics schema diff --staged # diff staged against HEAD schema diff --detect-renames old-schema.json new-schema.json ``` The `--detect-renames` flag uses structural similarity heuristics to identify renamed vertices and edge labels. Detected renames are shown with confidence scores. ### `schema merge` Merges a branch into the current branch by finding the merge base (lowest common ancestor in the DAG), computing diffs from the base to both tips, and applying non-conflicting changes. Merge diverges most from git. Git operates on lines of text and produces conflict markers. panproto computes the structural merge of the two divergent schemas over their common base.^[The structural merge is a categorical pushout of the two schemas over their common ancestor. For every schema element (vertices, edges, constraints, hyper-edges, variants, orderings, recursion points, usage modes), the merge classifies each side's change as unchanged, added, removed, or modified. The merge is commutative: swapping the two branches produces an identical result.] One-sided changes are accepted. Identical changes from both sides are deduplicated. Incompatible changes produce typed conflicts: not text markers, but structured objects like `BothModifiedVertex`, `BothModifiedConstraint`, `DeleteModifyVertex`. Fast-forward merges work as in git. As of v0.6, the merge algorithm uses pullback-based overlap detection to identify shared structure between divergent schemas. ```bash schema merge feature schema merge --verbose feature # show overlap computation details schema merge --no-commit feature # leave the result staged schema merge --ff-only feature # only allow fast-forward schema merge --no-ff feature # force a merge commit schema merge --squash feature # squash into a single commit schema merge --abort # abort a conflicted merge schema merge -m "merge feature into main" feature ``` ### `schema lift` Migrates instance data from one schema version to another. The command finds the path between two refs in the DAG, composes all migrations along the path into a single morphism $f: G_1 \to G_n$, and applies the functorial lift to the record. Git tracks file history but can't transform file contents based on structural changes. panproto can, because migrations are graph morphisms with a well-defined action on data. ```bash schema lift --migration mig.json --src-schema v1.json --tgt-schema v2.json record.json ``` :::{.callout-caution} ## Exercise: Path independence If the DAG has two distinct paths from commit $A$ to commit $B$, does `schema lift` produce the same result on both paths? What guarantees this? ::: ::: {.callout-tip collapse=true} ## Answer Functoriality. Migration morphisms compose associatively: if $f: A \to B$ and $g: B \to C$, then $g \circ f: A \to C$ is the same regardless of how the composition is parenthesized. The DAG's commutativity constraint ensures that for any two paths from $A$ to $C$, the composed morphisms are equal. This is a coherence condition on the category of schemas. ::: ## Inspecting history ### `schema log` Walks the commit DAG from HEAD backwards, printing each commit's ID, author, timestamp, message, and schema hash. For merge commits, shows all parent IDs. ```bash schema log schema log -n 5 schema log --oneline # compact one-line format schema log --graph # visualize branch topology schema log --all # show all branches ``` ### `schema show` Inspects any object in the store by ref name or object ID. For commits, shows the schema ID, parent IDs, migration ID, protocol, author, and message. For schemas, shows vertex and edge counts. ```bash schema show main schema show v1.0 schema show --stat main # change summary statistics ``` ### `schema blame` Shows which commit introduced a specific schema element: a vertex, an edge, or a constraint. It walks the DAG backwards from HEAD, comparing each commit's schema to its parent's. You can ask "who added the `user_status` vertex?" or "when was the `maxLength` constraint on `text` changed?" ```bash schema blame --element-type vertex user_status schema blame --element-type constraint "text:maxLength" ``` ## Branching and merging ### `schema branch` Creates, lists, or deletes branches. Branches are lightweight pointers to commit IDs. ```bash schema branch # list all branches schema branch feature # create a branch schema branch -d feature # delete (safe: must be merged) schema branch -D feature # force-delete schema branch -m old-name new-name # rename ``` ### `schema checkout` Switches HEAD to the named branch, or detaches HEAD at a specific commit. ```bash schema checkout feature schema checkout -b feature # create and switch in one step schema checkout --detach main # detach HEAD at a specific ref ``` ### `schema rebase` Replays the current branch's commits onto another branch. It finds the merge base, collects all commits from the base to HEAD, then replays each one on top of the target via three-way merge. Each replay step uses structural three-way merge rather than textual merge. This detects schema-level conflicts (like incompatible kind changes) that git rebase would miss. ```bash schema rebase main ``` ### `schema cherry-pick` Applies a single commit's schema change to the current branch. It extracts the diff between the commit and its parent, then three-way merges it onto HEAD. ```bash schema cherry-pick abc1234... schema cherry-pick -n abc1234... # apply without committing schema cherry-pick -x abc1234... # record source commit in message ``` :::{.callout-caution} ## Exercise: Merge commutativity `schema merge` claims the structural merge is commutative: merging branch A into B produces the same schema as merging B into A. Does this imply the *migration morphisms* from the base to each merge result are also identical? ::: ::: {.callout-tip collapse=true} ## Answer The schemas are identical, but the morphism provenance can differ. Merging A into B produces composition legs $A \to M$ and $B \to M$. Merging B into A produces legs $B \to M'$ and $A \to M'$. If $M = M'$ (same merged schema), the legs are the same pair of morphisms, just listed in opposite order. The provenance metadata differs, but the mathematical content is identical. ::: ## Recovery ### `schema reset` Moves HEAD to a different commit. Three modes match git: - `--soft`: moves the ref only; the index is unchanged - `--mixed` (default): moves the ref and clears the index - `--hard`: moves the ref, clears the index, and overwrites the working schema All modes append a reflog entry, so the old position is recoverable. ```bash schema reset --soft v1.0 schema reset --hard HEAD~3 ``` ### `schema reflog` Shows the history of mutations to a ref. Every commit, merge, checkout, reset, rebase, and cherry-pick appends an entry recording the old and new ref values. If a reset or rebase moves HEAD past a commit you need, the reflog has the old ID. ```bash schema reflog schema reflog main schema reflog --all # show reflogs for all refs ``` ### `schema bisect` Binary search for the commit that introduced a breaking change. Given a known-good and known-bad commit, bisect finds the path between them and presents the midpoint for testing. It converges in $O(\log n)$ steps. ```bash schema bisect v1.0 HEAD ``` ## Other commands ### `schema tag` Creates, lists, or deletes tags. Tags are immutable pointers to commits, typically used for schema releases. ```bash schema tag v1.0 schema tag -d v1.0 schema tag -a v1.0 -m "release" # annotated tag ``` ### `schema gc` Marks all objects reachable from branches, tags, and HEAD, then deletes everything else. ```bash schema gc schema gc --dry-run # preview without deleting ``` ## Quick start ```bash schema init schema add schema.json schema commit -m "initial ATProto schema" schema branch add-verification schema checkout add-verification schema add schema-v2.json schema commit -m "add verification status field" schema checkout main schema merge add-verification ``` :::{.callout-caution} ## Exercise: Complement storage Each commit stores a complement $C$ for backward migration. If a forward migration $f: G_1 \to G_2$ adds a required field with no default, what does $C$ contain? Can every forward migration be reversed? ::: ::: {.callout-tip collapse=true} ## Answer The complement $C$ is empty for this direction. In a forward migration that adds a required field, the new field's value is provided (via a default), not discarded. The complement records what `get` threw away, and in the forward direction nothing is thrown away. Reversal of this migration (going backward) would discard the added field, and that direction's complement would store the discarded value. Not every forward migration can be reversed. ::: ## A note on VCS inspiration panproto's schema version control draws on two traditions. The first is git itself: content-addressing, DAG-structured history, lightweight branches, and the reflog. The second is **patch-theory** version control systems like [Pijul](https://pijul.org/) and [Darcs](http://darcs.net/), which model changes as first-class mathematical objects.^[Pijul's patches form a category where composition is associative. panproto's migrations are morphisms in a category of schemas. Both systems benefit from the same insight: when your "diffs" are mathematical objects with well-defined composition rules, merge conflicts become algebraic problems rather than heuristic ones.] The payoff is that *structure-aware diffs compose better than text diffs*. A schema migration composes with another migration to produce a valid migration. A text patch composed with another text patch might produce gibberish. This guarantees data integrity where git's merge can only guarantee syntactic non-conflict. ### Remote commands (planned) The commands `schema remote`, `schema push`, `schema pull`, `schema fetch`, and `schema clone` are reserved for future distributed operations. Currently they return an error indicating that schema repositories are local-only.

10.1 Thinking like git

10.2 The object store

10.3 Core commands

10.3.1 schema init

10.3.2 schema add

10.3.3 schema commit

10.3.4 schema diff

10.3.5 schema merge

10.3.6 schema lift

10.4 Inspecting history

10.4.1 schema log

10.4.2 schema show

10.4.3 schema blame

10.5 Branching and merging

10.5.1 schema branch

10.5.2 schema checkout

10.5.3 schema rebase

10.5.4 schema cherry-pick

10.6 Recovery

10.6.1 schema reset

10.6.2 schema reflog

10.6.3 schema bisect

10.7 Other commands

10.7.1 schema tag

10.7.2 schema gc

10.8 Quick start

10.9 A note on VCS inspiration

10.9.1 Remote commands (planned)

10.3.1 `schema init`

10.3.2 `schema add`

10.3.3 `schema commit`

10.3.4 `schema diff`

10.3.5 `schema merge`

10.3.6 `schema lift`

10.4.1 `schema log`

10.4.2 `schema show`

10.4.3 `schema blame`

10.5.1 `schema branch`

10.5.2 `schema checkout`

10.5.3 `schema rebase`

10.5.4 `schema cherry-pick`

10.6.1 `schema reset`

10.6.2 `schema reflog`

10.6.3 `schema bisect`

10.7.1 `schema tag`

10.7.2 `schema gc`