classDiagram
class Panproto {
+load(wasm_path?) Panproto$
+protocol(name) Protocol
+define_protocol(spec) Protocol
+migration(src, tgt) MigrationBuilder
+check_existence(src, tgt, builder) ExistenceReport
+compose(m1, m2) CompiledMigration
+diff(old, new) DiffReport
+close()
+__enter__()
+__exit__()
}
class Protocol {
+name: str
+spec: ProtocolSpec
+schema() SchemaBuilder
+close()
}
class SchemaBuilder {
+vertex(id, kind, opts?) SchemaBuilder
+edge(src, tgt, kind, opts?) SchemaBuilder
+hyper_edge(id, kind, sig, parent) SchemaBuilder
+constraint(vertex, sort, value) SchemaBuilder
+required(vertex, edges) SchemaBuilder
+build() BuiltSchema
}
class BuiltSchema {
+data: SchemaData
+protocol: str
+vertices: Mapping
+edges: Sequence
+close()
}
class MigrationBuilder {
+map(src, tgt) MigrationBuilder
+map_edge(src, tgt) MigrationBuilder
+resolve(src, tgt, edge) MigrationBuilder
+compile() CompiledMigration
}
class CompiledMigration {
+spec: MigrationSpec
+lift(record) LiftResult
+get(record) GetResult
+put(view, complement) LiftResult
+close()
}
Panproto --> Protocol : protocol()
Protocol --> SchemaBuilder : schema()
SchemaBuilder --> BuiltSchema : build()
Panproto --> MigrationBuilder : migration()
MigrationBuilder --> CompiledMigration : compile()
15 panproto: The Python SDK
The panproto PyPI package provides a Python 3.13+ interface to the panproto WASM engine. It mirrors the TypeScript SDK’s fluent, type-safe API while embracing Python idioms: context managers instead of Symbol.dispose, TypedDict subclasses instead of interfaces, PEP 695 type aliases instead of TypeScript unions, and weakref.finalize instead of FinalizationRegistry.
For a quick-reference listing of all public APIs with parameter types, see the tutorial’s API reference appendix.
15.1 Architecture overview
The SDK is organized into eight modules, each with a focused responsibility:
15.2 Project structure
The SDK lives at sdk/python/ and follows a standard src layout:
sdk/python/
├── pyproject.toml # hatchling build, pyright strict, ruff
├── src/
│ └── panproto/
│ ├── __init__.py # Public re-exports
│ ├── _panproto.py # Panproto entry point class
│ ├── _protocol.py # Protocol + 5 built-in specs
│ ├── _schema.py # SchemaBuilder + BuiltSchema
│ ├── _migration.py # MigrationBuilder + CompiledMigration
│ ├── _lens.py # Cambria-style lens combinators
│ ├── _wasm.py # wasmtime loading + WasmHandle
│ ├── _msgpack.py # MessagePack encode/decode helpers
│ ├── _types.py # TypedDicts + PEP 695 type aliases
│ ├── _errors.py # PanprotoError hierarchy
│ └── panproto_wasm_bg.wasm # Bundled WASM binary
└── tests/
All private modules use the underscore prefix (_wasm.py, _types.py, etc.). The public surface is defined entirely in __init__.py via __all__.
15.3 Building and testing
The SDK uses Hatch as its build backend. Runtime dependencies are minimal: wasmtime>=29.0.0 and msgpack>=1.1.0.
# Install in development mode
pip install -e "sdk/python[dev]"
# Run tests
pytest sdk/python/tests/
# Type checking (strict mode)
pyright sdk/python/src/
# Lint + format
ruff check sdk/python/src/
ruff format sdk/python/src/The [dev] extra pulls in pytest>=8.0, ruff>=0.5, and pyright>=1.1.
15.4 WASM interaction layer (_wasm.py)
The _wasm.py module handles WASM binary loading, instance creation, and resource lifecycle. It contains three key components: WasmModule, WasmHandle, and the load_wasm factory.
15.4.1 WasmModule
WasmModule wraps a wasmtime.Instance and exposes the panproto WASM entry points as typed Python methods:
Each entry point method handles the raw call, converts between Python types and WASM integers/byte slices, and wraps any errors in WasmError:
define_protocol(spec: bytes) -> int: Register a protocol, return a handle.build_schema(proto: int, ops: bytes) -> int: Build a schema, return a handle.check_existence(src: int, tgt: int, mapping: bytes) -> bytes: Validate a migration.compile_migration(src: int, tgt: int, mapping: bytes) -> int: Compile a migration, return a handle.lift_record(migration: int, record: bytes) -> bytes: Forward transform.get_record(migration: int, record: bytes) -> bytes: Bidirectional get.put_record(migration: int, view: bytes, complement: bytes) -> bytes: Bidirectional put.compose_migrations(m1: int, m2: int) -> int: Compose two migrations.diff_schemas(s1: int, s2: int) -> bytes: Diff two schemas.free_handle(handle: int) -> None: Release a WASM resource.
15.4.2 WasmHandle
Every WASM-side resource is wrapped in a WasmHandle, which implements the context-manager protocol for use with with statements:
Two layers of safety prevent handle leaks:
- Context manager (
with/ explicit.close()): Deterministic cleanup. Called automatically on block exit or manually by the consumer. weakref.finalize: Safety net. If aWasmHandleis garbage-collected without being closed, the weak-reference callback callsfree_handleto release the WASM resource.
The _free_handle_safe helper wraps the WASM call in contextlib.suppress(Exception) so that finalization never raises, even if the WASM module has already been torn down.
The weakref.finalize callback is a last resort, not a primary mechanism. GC timing is non-deterministic, so relying on it can cause resource exhaustion under load. Always prefer explicit cleanup via with blocks or .close().
15.4.3 load_wasm
The load_wasm factory reads a .wasm binary from disk, creates a wasmtime.Engine, Store, Module, and Linker, instantiates the module with WASI imports, and returns a ready WasmModule:
Unlike the TypeScript SDK’s async init(), the Python factory is synchronous; wasmtime-py loads modules on the calling thread.
15.5 Type system (_types.py)
The _types.py module defines all data-carrying types that cross the MessagePack boundary. It mirrors the Rust structures and the TypeScript SDK interfaces using Python 3.13+ conventions:
| Rust | Python |
|---|---|
HashMap<K, V> |
dict[str, V] |
Option<T> |
T \| None |
Vec<T> |
list[T] |
Result<T, E> |
Return value or raised error |
15.5.1 TypedDicts
All structured data types are TypedDict subclasses, since they represent plain dict structures deserialized from MessagePack. Key types include:
ProtocolSpec: A complete protocol specification with schema theory, instance theory, edge rules, object kinds, and constraint sorts.Vertex,Edge,HyperEdge,Constraint: Schema graph elements.SchemaData: The full snapshot of a built schema (vertices, edges, hyperedges, constraints, required edges).MigrationSpec,LiftResult,GetResult: Migration domain types.ExistenceReport,ExistenceError: Existence checking results.DiffReport,SchemaChange: Schema diff results.
Optional fields use NotRequired from typing:
15.5.2 PEP 695 Type aliases
The module uses type statement syntax (PEP 695) for all type aliases:
The JsonValue alias is a recursive type covering all JSON primitives plus nested sequences and mappings. It replaces Any wherever the SDK must accept or return arbitrary JSON-like data.
Literal union aliases define the closed sets of enum-like strings:
SchemaChangeKind: The ten atomic change kinds ("vertex-added","edge-removed", etc.).Compatibility: Three-way classification:"fully-compatible","backward-compatible","breaking".ExistenceErrorKind: Ten structured error kinds emitted by the existence checker.
15.5.3 Wire format types
Internal wire-format types (prefixed with _) match the exact field names expected by Rust’s serde:
_SchemaOpVertex,_SchemaOpEdge,_SchemaOpHyperEdge,_SchemaOpConstraint,_SchemaOpRequired: The five schema operation variants. Theopfield acts as a serde internally-tagged discriminant.SchemaOp: A PEP 695 union of the five variants.MigrationMapping: Wire-format migration mapping withvertex_map,edge_map, andresolver.
15.6 Messagepack boundary (_msgpack.py)
The _msgpack.py module provides four encoding helpers that wrap the msgpack library:
pack_to_wasm(value): Encode anyPackablevalue to MessagePack bytes for a WASM entry point.unpack_from_wasm(data): Decode MessagePack bytes from WASM. Callers narrow the return type as needed.pack_schema_ops(ops): Encode aSequence[SchemaOp]for thebuild_schemaentry point.pack_migration_mapping(mapping): Encode aMigrationMappingforcompile_migrationorcheck_existence.
The Packable type alias defines the universe of MessagePack-serializable values:
It extends JsonValue with bytes, since MessagePack natively supports binary data (used for opaque complements).
15.7 Protocol definitions (_protocol.py)
A Protocol holds a WASM-side handle to the registered protocol specification and provides the schema() factory:
The define_protocol module-level function serializes a ProtocolSpec to its wire format, sends it to WASM, and wraps the returned handle:
15.7.1 Built-in protocols
Five built-in protocol specs are provided as module-level constants: ATPROTO_SPEC, SQL_SPEC, PROTOBUF_SPEC, GRAPHQL_SPEC, and JSON_SCHEMA_SPEC. They’re auto-registered on first access via panproto.protocol("atproto"). The BUILTIN_PROTOCOLS mapping provides a name-to-spec lookup table:
15.8 Schema Builder (_schema.py)
SchemaBuilder is an immutable fluent builder. Each method returns a new builder instance, leaving the original unchanged. This makes it safe to branch schema definitions:
The builder accumulates SchemaOp objects (matching the Rust BuildOp tagged enum). Each mutation method creates a new instance with the new operation appended. On .build(), it packs the accumulated operations as MessagePack, sends them to WASM, and wraps the returned handle:
15.8.1 BuiltSchema
BuiltSchema is the validated result. It holds both a WasmHandle (for passing to migration functions) and a local SchemaData snapshot (for introspection without crossing the WASM boundary):
The data property returns the full SchemaData TypedDict. Convenience properties protocol, vertices, and edges provide direct access to the most commonly used fields.
15.9 Migration Builder (_migration.py)
MigrationBuilder follows the same immutable fluent pattern. It accumulates vertex mappings, edge mappings, and resolvers:
Three mutation methods build up the migration specification:
map(src_vertex, tgt_vertex): Map a source vertex to a target vertex.map_edge(src_edge, tgt_edge): Map a source edge to a target edge.resolve(src_kind, tgt_kind, resolved_edge): Add a resolver for ancestor-contraction ambiguity.
The compile() method packs the mapping and sends it to the compile_migration WASM entry point. The result is a CompiledMigration with three data-path methods:
lift(record): Forward-only transformation. The hot path: data goes through WASM as MessagePack bytes with no intermediate Python-heap allocation.get(record): Bidirectional get: extract a projected view and an opaque complement.put(view, complement): Bidirectional put: restore a full record from a (possibly modified) view and the complement fromget.
The complement from get() is an opaque bytes object. It captures the data discarded by the forward projection, enabling lossless round-tripping. Treat it as a black box; its internal format is a MessagePack-encoded Rust Complement struct.
15.9.1 Module-level functions
Two additional functions are exported at module level:
check_existence(src, tgt, spec, wasm): Validate that a migration specification satisfies all protocol-derived existence conditions.compose_migrations(m1, m2, wasm): Compose two compiled migrations into one. The resulting migration is equivalent to applying m1 first, then m2.
15.10 Lens API (_lens.py)
The _lens.py module provides three handle classes for bidirectional schema transformations: ProtolensChainHandle for schema-independent lens families, LensHandle for concrete lenses, and SymmetricLensHandle for symmetric bidirectional sync.
ProtolensChainHandle wraps a WASM-side protolens chain and can be instantiated against a concrete schema to produce a LensHandle:
LensHandle wraps a concrete lens with get, put, and law-checking operations:
Key methods:
ProtolensChainHandle.auto_generate(schema1, schema2, wasm): Auto-generate a protolens chain between two schemas.ProtolensChainHandle.instantiate(schema): Instantiate the chain against a concrete schema.ProtolensChainHandle.compose(other): Compose with another chain.ProtolensChainHandle.fuse(): Fuse all steps into a single protolens.ProtolensChainHandle.lift(morphism_bytes): Lift along a theory morphism.LensHandle.auto_generate(schema1, schema2, wasm): Auto-generate and instantiate a lens.LensHandle.get(record): Forward projection: extract view and complement.LensHandle.put(view, complement): Backward put: restore from view and complement.LensHandle.check_laws(instance): Verify GetPut and PutGet laws.
15.11 Error hierarchy (_errors.py)
The SDK defines a four-class error hierarchy, all extending PanprotoError:
PanprotoError: Base exception for all panproto errors. Stores themessageas an attribute.WasmError: Errors from the WASM boundary (load failures, call failures, disposed handles).SchemaValidationError: Schema building errors, with atuple[str, ...]of individual error strings.MigrationError: Migration compilation or composition errors.ExistenceCheckError: Existence check failures, bundling the fullExistenceReport.
All leaf exception classes are decorated with @final to prevent subclassing. Each exception class uses __slots__ for memory efficiency and provides a custom __repr__ for better debug output.
15.12 The panproto class
Panproto is the main entry point. It loads the WASM module and provides the top-level API:
Key methods:
load(wasm_path?): Synchronous factory. Reads the WASM binary (bundled by default) and returns a ready instance.protocol(name): Get or auto-register a built-in protocol ("atproto","sql","protobuf","graphql","json-schema"). Custom protocols must be registered first withdefine_protocol.define_protocol(spec): Register a custom protocol specification.migration(src, tgt): Start building a migration between two schemas.check_existence(src, tgt, builder): Validate a migration against protocol-derived existence conditions.compose(m1, m2): Compose two compiled migrations into one.diff(old, new): Compute a structural diff between two schemas.
Panproto implements the context-manager protocol. When the with block exits, it releases all cached protocol handles:
15.13 Schema enrichment
The SchemaBuilder supports enrichment through constraints, which encode defaults, coercions, and merge policies:
# Add constraints to encode enriched schema properties
with (
protocol.schema()
.vertex("post:body", "object")
.vertex("post:body.text", "string")
.edge("post:body", "post:body.text", "prop", {"name": "text"})
.constraint("post:body.text", "maxLength", "3000") # refinement type
.constraint("post:body.text", "minLength", "1") # lower bound
.build()
) as schema:
assert schema.data["constraints"]["post:body.text"][0]["sort"] == "maxLength"Constraints serve triple duty: they encode refinement types (value bounds), default behaviors (when a field has a known initial value), and coercion hints (when a field can be safely converted between representations). The protocol’s constraint_sorts list determines which constraint sorts are recognized during compatibility checking.
15.14 Migration analysis
The SDK provides analysis capabilities through the diff/classify pipeline:
# Compute structural diff
diff_report = pp.diff(old_schema, new_schema)
# Classify into breaking vs. non-breaking
compat = pp.classify(diff_report)
# Coverage: fraction of source vertices surviving in target
total_changes = len(compat["breaking"]) + len(compat["non_breaking"])
if total_changes > 0:
coverage = len(compat["non_breaking"]) / total_changesFor optic classification, the protolens chain structure reveals whether a migration is an isomorphism (lossless, rename-only), a lens (lossy, drops data), or more complex:
from panproto import ProtolensChainHandle
# Auto-generate protolens
with ProtolensChainHandle.auto_generate(schema1, schema2, wasm) as chain:
# Fuse to analyze the composed transform
with chain.fuse() as fused:
# Serialize to inspect the transform structure
spec = fused.to_json()
# spec["complement_constructor"] == "Empty" means isomorphism
# "DroppedSortData" in complement means lens15.15 GAT Engine access
The SDK provides direct access to the GAT engine for theory construction and composition:
# Create theories via WASM
theory_spec = {
"name": "ThGraph",
"sorts": [
{"name": "Vertex", "params": []},
{"name": "Edge", "params": []},
],
"ops": [
{"name": "src", "inputs": [["e", "Edge"]], "output": "Vertex"},
{"name": "tgt", "inputs": [["e", "Edge"]], "output": "Vertex"},
],
"eqs": [],
}
with pp.create_theory(theory_spec) as th_graph:
# Compose theories via colimit
with pp.colimit_theories(th_graph, th_constraint, shared) as composed:
pass # use composed theory15.16 VCS integration
The SDK exposes version control operations for schemas and data:
# Initialize a repository
with pp.vcs_init("atproto") as repo:
# Stage and commit schema changes
pp.vcs_add(repo, schema)
commit_hash = pp.vcs_commit(repo, "add post schema", "author@example.com")
# Query history
log = pp.vcs_log(repo, count=10)
status = pp.vcs_status(repo)15.17 Usage pattern
A typical SDK session:
from panproto import Panproto
# Initialize (loads WASM)
with Panproto.load() as pp:
# Get a protocol
atproto = pp.protocol("atproto")
# Build schemas
with (
atproto.schema()
.vertex("post", "record", {"nsid": "app.bsky.feed.post"})
.vertex("post:body", "object")
.edge("post", "post:body", "record-schema")
.build()
) as old_schema, (
atproto.schema()
.vertex("post", "record", {"nsid": "app.bsky.feed.post"})
.vertex("post:body", "object")
.vertex("post:body.tags", "array")
.edge("post", "post:body", "record-schema")
.edge("post:body", "post:body.tags", "prop", {"name": "tags"})
.build()
) as new_schema:
# Compile migration
with (
pp.migration(old_schema, new_schema)
.map("post", "post")
.map("post:body", "post:body")
.compile()
) as migration:
# Transform records
result = migration.lift({"text": "hello"})The with statement ensures all WASM handles are freed when the block exits. Nested with blocks (or the parenthesized multi-context form shown above) are the Python equivalent of TypeScript’s using keyword. If you need to manage handle lifetimes more flexibly, call .close() manually or use contextlib.ExitStack.
15.18 Conventions
The Python SDK enforces the following project conventions:
- Python 3.13+: The minimum supported version. The SDK uses PEP 695 type aliases, structural pattern matching, parenthesized context managers, and other 3.13 features.
- PEP 695 type statements: All type aliases use the
typekeyword syntax rather thanTypeAlias. - Strict pyright:
typeCheckingMode = "strict"withreportMissingTypeStubs = false. All public and private functions have complete type annotations. - ruff: Lint and format with
target-version = "py313"andline-length = 99. The enabled rule sets include pyflakes, pycodestyle, isort, pep8-naming, pyupgrade, bugbear, builtins shadowing, comprehensions, simplify, type-checking imports, and ruff-specific rules. - numpy docstrings: All public classes and functions use the numpy docstring convention with
Parameters,Returns,Raises, andExamplessections. - Immutable builders:
SchemaBuilderandMigrationBuilderreturn new instances on every mutation. Internal state uses tuples and frozen dicts. __slots__everywhere: All classes define__slots__for memory efficiency and to catch attribute typos.@finalleaf classes: All concrete classes that shouldn’t be subclassed are decorated with@final.- Underscore-prefixed private modules: All implementation modules use the
_prefix. The public API surface is defined in__init__.pyvia__all__. - MIT license: Consistent with the rest of the panproto project.