28 Project Assembly and Git Bridge
A real codebase is not one file. It is hundreds of files in dozens of directories, each parsed by a different grammar, all woven together by imports and references. The panproto-project crate assembles per-file schemas into a single project-level schema (a categorical coproduct), while panproto-git bridges the gap between panproto’s content-addressed VCS and ordinary git repositories.
28.1 panproto-project
28.1.1 Module map
| Module | Purpose |
|---|---|
lib |
ProjectBuilder, ProjectSchema, coproduct construction |
detect |
Language detection by file extension |
error |
ProjectError with diagnostic support |
28.1.2 Coproduct construction
The project schema is the categorical coproduct of per-file schemas. Each file’s vertices are prefixed with the file path to ensure uniqueness:
src/main.ts::program
src/main.ts::program::$0 (first top-level declaration)
src/utils.ts::program
src/utils.ts::program::$0
README.md (raw_file root vertex)
README.md::line_0
Edges within a file retain their local structure. The coproduct Schema is built by iterating all files’ vertices and edges, applying the prefix.
28.1.3 Fallback behavior
If a language parser fails (e.g., Kotlin’s tree-sitter grammar is ABI-incompatible), the ProjectBuilder falls back to raw_file parsing. This ensures every file in the project is represented, even if structural parsing is unavailable.
28.2 panproto-git
28.2.1 Module map
| Module | Purpose |
|---|---|
import |
import_git_repo: git DAG to panproto-vcs |
export |
export_to_git: panproto-vcs to git trees |
error |
GitBridgeError |
28.2.2 Import pipeline
git2::Repository::revwalkwithTOPOLOGICAL | REVERSEordering (parents before children)- For each commit:
git2::Tree::iterwalks blobs and subtrees - Each blob is passed to
ProjectBuilder::add_file ProjectBuilder::buildproduces theProjectSchema- Schema stored via
Store::put(Object::Schema(...)) CommitObjectcreated with mapped parent IDs, author, timestamp, message- Commit stored via
Store::put(Object::Commit(...))
28.2.3 Export pipeline
- Load
CommitObjectand itsSchemafrom the panproto-vcs store collect_file_fragmentsgroups vertices by file prefix, collectingliteral-valueandinterstitial-*constraints with byte positions- Per-file fragments are sorted and concatenated into source bytes
build_nested_treecreates recursive gitTreeBuilderobjects for proper directory hierarchy- Git commit is created with
parent_mapfor DAG preservation
28.2.4 Design decisions
Why MemStore in tests? Git bridge tests create temporary git repositories via git2::Repository::init in tempfile::tempdir. The panproto side uses MemStore (in-memory, no filesystem) for speed and isolation.
Why not store source bytes in the schema? Binary content can be arbitrarily large. The schema stores a blake3 content hash on chunk vertices; actual bytes are tracked by the VCS object store.