15  panproto: The Python SDK

The panproto PyPI package provides a Python 3.13+ interface to the panproto WASM engine. It mirrors the TypeScript SDK’s fluent, type-safe API while embracing Python idioms: context managers instead of Symbol.dispose, TypedDict subclasses instead of interfaces, PEP 695 type aliases instead of TypeScript unions, and weakref.finalize instead of FinalizationRegistry.

Tip

For a quick-reference listing of all public APIs with parameter types, see the tutorial’s API reference appendix.

15.1 Architecture overview

The SDK is organized into eight modules, each with a focused responsibility:

classDiagram
    class Panproto {
        +load(wasm_path?) Panproto$
        +protocol(name) Protocol
        +define_protocol(spec) Protocol
        +migration(src, tgt) MigrationBuilder
        +check_existence(src, tgt, builder) ExistenceReport
        +compose(m1, m2) CompiledMigration
        +diff(old, new) DiffReport
        +close()
        +__enter__()
        +__exit__()
    }

    class Protocol {
        +name: str
        +spec: ProtocolSpec
        +schema() SchemaBuilder
        +close()
    }

    class SchemaBuilder {
        +vertex(id, kind, opts?) SchemaBuilder
        +edge(src, tgt, kind, opts?) SchemaBuilder
        +hyper_edge(id, kind, sig, parent) SchemaBuilder
        +constraint(vertex, sort, value) SchemaBuilder
        +required(vertex, edges) SchemaBuilder
        +build() BuiltSchema
    }

    class BuiltSchema {
        +data: SchemaData
        +protocol: str
        +vertices: Mapping
        +edges: Sequence
        +close()
    }

    class MigrationBuilder {
        +map(src, tgt) MigrationBuilder
        +map_edge(src, tgt) MigrationBuilder
        +resolve(src, tgt, edge) MigrationBuilder
        +compile() CompiledMigration
    }

    class CompiledMigration {
        +spec: MigrationSpec
        +lift(record) LiftResult
        +get(record) GetResult
        +put(view, complement) LiftResult
        +close()
    }

    Panproto --> Protocol : protocol()
    Protocol --> SchemaBuilder : schema()
    SchemaBuilder --> BuiltSchema : build()
    Panproto --> MigrationBuilder : migration()
    MigrationBuilder --> CompiledMigration : compile()

15.2 Project structure

The SDK lives at sdk/python/ and follows a standard src layout:

sdk/python/
├── pyproject.toml          # hatchling build, pyright strict, ruff
├── src/
│   └── panproto/
│       ├── __init__.py     # Public re-exports
│       ├── _panproto.py    # Panproto entry point class
│       ├── _protocol.py    # Protocol + 5 built-in specs
│       ├── _schema.py      # SchemaBuilder + BuiltSchema
│       ├── _migration.py   # MigrationBuilder + CompiledMigration
│       ├── _lens.py        # Cambria-style lens combinators
│       ├── _wasm.py        # wasmtime loading + WasmHandle
│       ├── _msgpack.py     # MessagePack encode/decode helpers
│       ├── _types.py       # TypedDicts + PEP 695 type aliases
│       ├── _errors.py      # PanprotoError hierarchy
│       └── panproto_wasm_bg.wasm  # Bundled WASM binary
└── tests/

All private modules use the underscore prefix (_wasm.py, _types.py, etc.). The public surface is defined entirely in __init__.py via __all__.

15.3 Building and testing

The SDK uses Hatch as its build backend. Runtime dependencies are minimal: wasmtime>=29.0.0 and msgpack>=1.1.0.

# Install in development mode
pip install -e "sdk/python[dev]"

# Run tests
pytest sdk/python/tests/

# Type checking (strict mode)
pyright sdk/python/src/

# Lint + format
ruff check sdk/python/src/
ruff format sdk/python/src/

The [dev] extra pulls in pytest>=8.0, ruff>=0.5, and pyright>=1.1.

15.4 WASM interaction layer (_wasm.py)

The _wasm.py module handles WASM binary loading, instance creation, and resource lifecycle. It contains three key components: WasmModule, WasmHandle, and the load_wasm factory.

15.4.1 WasmModule

WasmModule wraps a wasmtime.Instance and exposes the panproto WASM entry points as typed Python methods:

Each entry point method handles the raw call, converts between Python types and WASM integers/byte slices, and wraps any errors in WasmError:

  • define_protocol(spec: bytes) -> int: Register a protocol, return a handle.
  • build_schema(proto: int, ops: bytes) -> int: Build a schema, return a handle.
  • check_existence(src: int, tgt: int, mapping: bytes) -> bytes: Validate a migration.
  • compile_migration(src: int, tgt: int, mapping: bytes) -> int: Compile a migration, return a handle.
  • lift_record(migration: int, record: bytes) -> bytes: Forward transform.
  • get_record(migration: int, record: bytes) -> bytes: Bidirectional get.
  • put_record(migration: int, view: bytes, complement: bytes) -> bytes: Bidirectional put.
  • compose_migrations(m1: int, m2: int) -> int: Compose two migrations.
  • diff_schemas(s1: int, s2: int) -> bytes: Diff two schemas.
  • free_handle(handle: int) -> None: Release a WASM resource.

15.4.2 WasmHandle

Every WASM-side resource is wrapped in a WasmHandle, which implements the context-manager protocol for use with with statements:

Two layers of safety prevent handle leaks:

  1. Context manager (with / explicit .close()): Deterministic cleanup. Called automatically on block exit or manually by the consumer.
  2. weakref.finalize: Safety net. If a WasmHandle is garbage-collected without being closed, the weak-reference callback calls free_handle to release the WASM resource.

The _free_handle_safe helper wraps the WASM call in contextlib.suppress(Exception) so that finalization never raises, even if the WASM module has already been torn down.

Important

The weakref.finalize callback is a last resort, not a primary mechanism. GC timing is non-deterministic, so relying on it can cause resource exhaustion under load. Always prefer explicit cleanup via with blocks or .close().

15.4.3 load_wasm

The load_wasm factory reads a .wasm binary from disk, creates a wasmtime.Engine, Store, Module, and Linker, instantiates the module with WASI imports, and returns a ready WasmModule:

Unlike the TypeScript SDK’s async init(), the Python factory is synchronous; wasmtime-py loads modules on the calling thread.

15.5 Type system (_types.py)

The _types.py module defines all data-carrying types that cross the MessagePack boundary. It mirrors the Rust structures and the TypeScript SDK interfaces using Python 3.13+ conventions:

Rust Python
HashMap<K, V> dict[str, V]
Option<T> T \| None
Vec<T> list[T]
Result<T, E> Return value or raised error

15.5.1 TypedDicts

All structured data types are TypedDict subclasses, since they represent plain dict structures deserialized from MessagePack. Key types include:

  • ProtocolSpec: A complete protocol specification with schema theory, instance theory, edge rules, object kinds, and constraint sorts.
  • Vertex, Edge, HyperEdge, Constraint: Schema graph elements.
  • SchemaData: The full snapshot of a built schema (vertices, edges, hyperedges, constraints, required edges).
  • MigrationSpec, LiftResult, GetResult: Migration domain types.
  • ExistenceReport, ExistenceError: Existence checking results.
  • DiffReport, SchemaChange: Schema diff results.

Optional fields use NotRequired from typing:

15.5.2 PEP 695 Type aliases

The module uses type statement syntax (PEP 695) for all type aliases:

The JsonValue alias is a recursive type covering all JSON primitives plus nested sequences and mappings. It replaces Any wherever the SDK must accept or return arbitrary JSON-like data.

Literal union aliases define the closed sets of enum-like strings:

  • SchemaChangeKind: The ten atomic change kinds ("vertex-added", "edge-removed", etc.).
  • Compatibility: Three-way classification: "fully-compatible", "backward-compatible", "breaking".
  • ExistenceErrorKind: Ten structured error kinds emitted by the existence checker.

15.5.3 Wire format types

Internal wire-format types (prefixed with _) match the exact field names expected by Rust’s serde:

  • _SchemaOpVertex, _SchemaOpEdge, _SchemaOpHyperEdge, _SchemaOpConstraint, _SchemaOpRequired: The five schema operation variants. The op field acts as a serde internally-tagged discriminant.
  • SchemaOp: A PEP 695 union of the five variants.
  • MigrationMapping: Wire-format migration mapping with vertex_map, edge_map, and resolver.

15.6 Messagepack boundary (_msgpack.py)

The _msgpack.py module provides four encoding helpers that wrap the msgpack library:

  • pack_to_wasm(value): Encode any Packable value to MessagePack bytes for a WASM entry point.
  • unpack_from_wasm(data): Decode MessagePack bytes from WASM. Callers narrow the return type as needed.
  • pack_schema_ops(ops): Encode a Sequence[SchemaOp] for the build_schema entry point.
  • pack_migration_mapping(mapping): Encode a MigrationMapping for compile_migration or check_existence.

The Packable type alias defines the universe of MessagePack-serializable values:

It extends JsonValue with bytes, since MessagePack natively supports binary data (used for opaque complements).

15.7 Protocol definitions (_protocol.py)

A Protocol holds a WASM-side handle to the registered protocol specification and provides the schema() factory:

The define_protocol module-level function serializes a ProtocolSpec to its wire format, sends it to WASM, and wraps the returned handle:

15.7.1 Built-in protocols

Five built-in protocol specs are provided as module-level constants: ATPROTO_SPEC, SQL_SPEC, PROTOBUF_SPEC, GRAPHQL_SPEC, and JSON_SCHEMA_SPEC. They’re auto-registered on first access via panproto.protocol("atproto"). The BUILTIN_PROTOCOLS mapping provides a name-to-spec lookup table:

15.8 Schema Builder (_schema.py)

SchemaBuilder is an immutable fluent builder. Each method returns a new builder instance, leaving the original unchanged. This makes it safe to branch schema definitions:

The builder accumulates SchemaOp objects (matching the Rust BuildOp tagged enum). Each mutation method creates a new instance with the new operation appended. On .build(), it packs the accumulated operations as MessagePack, sends them to WASM, and wraps the returned handle:

15.8.1 BuiltSchema

BuiltSchema is the validated result. It holds both a WasmHandle (for passing to migration functions) and a local SchemaData snapshot (for introspection without crossing the WASM boundary):

The data property returns the full SchemaData TypedDict. Convenience properties protocol, vertices, and edges provide direct access to the most commonly used fields.

15.9 Migration Builder (_migration.py)

MigrationBuilder follows the same immutable fluent pattern. It accumulates vertex mappings, edge mappings, and resolvers:

Three mutation methods build up the migration specification:

  • map(src_vertex, tgt_vertex): Map a source vertex to a target vertex.
  • map_edge(src_edge, tgt_edge): Map a source edge to a target edge.
  • resolve(src_kind, tgt_kind, resolved_edge): Add a resolver for ancestor-contraction ambiguity.

The compile() method packs the mapping and sends it to the compile_migration WASM entry point. The result is a CompiledMigration with three data-path methods:

  • lift(record): Forward-only transformation. The hot path: data goes through WASM as MessagePack bytes with no intermediate Python-heap allocation.
  • get(record): Bidirectional get: extract a projected view and an opaque complement.
  • put(view, complement): Bidirectional put: restore a full record from a (possibly modified) view and the complement from get.
Note

The complement from get() is an opaque bytes object. It captures the data discarded by the forward projection, enabling lossless round-tripping. Treat it as a black box; its internal format is a MessagePack-encoded Rust Complement struct.

15.9.1 Module-level functions

Two additional functions are exported at module level:

  • check_existence(src, tgt, spec, wasm): Validate that a migration specification satisfies all protocol-derived existence conditions.
  • compose_migrations(m1, m2, wasm): Compose two compiled migrations into one. The resulting migration is equivalent to applying m1 first, then m2.

15.10 Lens API (_lens.py)

The _lens.py module provides three handle classes for bidirectional schema transformations: ProtolensChainHandle for schema-independent lens families, LensHandle for concrete lenses, and SymmetricLensHandle for symmetric bidirectional sync.

ProtolensChainHandle wraps a WASM-side protolens chain and can be instantiated against a concrete schema to produce a LensHandle:

LensHandle wraps a concrete lens with get, put, and law-checking operations:

Key methods:

  • ProtolensChainHandle.auto_generate(schema1, schema2, wasm): Auto-generate a protolens chain between two schemas.
  • ProtolensChainHandle.instantiate(schema): Instantiate the chain against a concrete schema.
  • ProtolensChainHandle.compose(other): Compose with another chain.
  • ProtolensChainHandle.fuse(): Fuse all steps into a single protolens.
  • ProtolensChainHandle.lift(morphism_bytes): Lift along a theory morphism.
  • LensHandle.auto_generate(schema1, schema2, wasm): Auto-generate and instantiate a lens.
  • LensHandle.get(record): Forward projection: extract view and complement.
  • LensHandle.put(view, complement): Backward put: restore from view and complement.
  • LensHandle.check_laws(instance): Verify GetPut and PutGet laws.

15.11 Error hierarchy (_errors.py)

The SDK defines a four-class error hierarchy, all extending PanprotoError:

  • PanprotoError: Base exception for all panproto errors. Stores the message as an attribute.
  • WasmError: Errors from the WASM boundary (load failures, call failures, disposed handles).
  • SchemaValidationError: Schema building errors, with a tuple[str, ...] of individual error strings.
  • MigrationError: Migration compilation or composition errors.
  • ExistenceCheckError: Existence check failures, bundling the full ExistenceReport.

All leaf exception classes are decorated with @final to prevent subclassing. Each exception class uses __slots__ for memory efficiency and provides a custom __repr__ for better debug output.

15.12 The panproto class

Panproto is the main entry point. It loads the WASM module and provides the top-level API:

Key methods:

  • load(wasm_path?): Synchronous factory. Reads the WASM binary (bundled by default) and returns a ready instance.
  • protocol(name): Get or auto-register a built-in protocol ("atproto", "sql", "protobuf", "graphql", "json-schema"). Custom protocols must be registered first with define_protocol.
  • define_protocol(spec): Register a custom protocol specification.
  • migration(src, tgt): Start building a migration between two schemas.
  • check_existence(src, tgt, builder): Validate a migration against protocol-derived existence conditions.
  • compose(m1, m2): Compose two compiled migrations into one.
  • diff(old, new): Compute a structural diff between two schemas.

Panproto implements the context-manager protocol. When the with block exits, it releases all cached protocol handles:

15.13 Schema enrichment

The SchemaBuilder supports enrichment through constraints, which encode defaults, coercions, and merge policies:

# Add constraints to encode enriched schema properties
with (
    protocol.schema()
    .vertex("post:body", "object")
    .vertex("post:body.text", "string")
    .edge("post:body", "post:body.text", "prop", {"name": "text"})
    .constraint("post:body.text", "maxLength", "3000")   # refinement type
    .constraint("post:body.text", "minLength", "1")       # lower bound
    .build()
) as schema:
    assert schema.data["constraints"]["post:body.text"][0]["sort"] == "maxLength"

Constraints serve triple duty: they encode refinement types (value bounds), default behaviors (when a field has a known initial value), and coercion hints (when a field can be safely converted between representations). The protocol’s constraint_sorts list determines which constraint sorts are recognized during compatibility checking.

15.14 Migration analysis

The SDK provides analysis capabilities through the diff/classify pipeline:

# Compute structural diff
diff_report = pp.diff(old_schema, new_schema)

# Classify into breaking vs. non-breaking
compat = pp.classify(diff_report)

# Coverage: fraction of source vertices surviving in target
total_changes = len(compat["breaking"]) + len(compat["non_breaking"])
if total_changes > 0:
    coverage = len(compat["non_breaking"]) / total_changes

For optic classification, the protolens chain structure reveals whether a migration is an isomorphism (lossless, rename-only), a lens (lossy, drops data), or more complex:

from panproto import ProtolensChainHandle

# Auto-generate protolens
with ProtolensChainHandle.auto_generate(schema1, schema2, wasm) as chain:
    # Fuse to analyze the composed transform
    with chain.fuse() as fused:
        # Serialize to inspect the transform structure
        spec = fused.to_json()
        # spec["complement_constructor"] == "Empty" means isomorphism
        # "DroppedSortData" in complement means lens

15.15 GAT Engine access

The SDK provides direct access to the GAT engine for theory construction and composition:

# Create theories via WASM
theory_spec = {
    "name": "ThGraph",
    "sorts": [
        {"name": "Vertex", "params": []},
        {"name": "Edge", "params": []},
    ],
    "ops": [
        {"name": "src", "inputs": [["e", "Edge"]], "output": "Vertex"},
        {"name": "tgt", "inputs": [["e", "Edge"]], "output": "Vertex"},
    ],
    "eqs": [],
}

with pp.create_theory(theory_spec) as th_graph:
    # Compose theories via colimit
    with pp.colimit_theories(th_graph, th_constraint, shared) as composed:
        pass  # use composed theory

15.16 VCS integration

The SDK exposes version control operations for schemas and data:

# Initialize a repository
with pp.vcs_init("atproto") as repo:
    # Stage and commit schema changes
    pp.vcs_add(repo, schema)
    commit_hash = pp.vcs_commit(repo, "add post schema", "author@example.com")

    # Query history
    log = pp.vcs_log(repo, count=10)
    status = pp.vcs_status(repo)

15.17 Usage pattern

A typical SDK session:

from panproto import Panproto

# Initialize (loads WASM)
with Panproto.load() as pp:
    # Get a protocol
    atproto = pp.protocol("atproto")

    # Build schemas
    with (
        atproto.schema()
        .vertex("post", "record", {"nsid": "app.bsky.feed.post"})
        .vertex("post:body", "object")
        .edge("post", "post:body", "record-schema")
        .build()
    ) as old_schema, (
        atproto.schema()
        .vertex("post", "record", {"nsid": "app.bsky.feed.post"})
        .vertex("post:body", "object")
        .vertex("post:body.tags", "array")
        .edge("post", "post:body", "record-schema")
        .edge("post:body", "post:body.tags", "prop", {"name": "tags"})
        .build()
    ) as new_schema:

        # Compile migration
        with (
            pp.migration(old_schema, new_schema)
            .map("post", "post")
            .map("post:body", "post:body")
            .compile()
        ) as migration:
            # Transform records
            result = migration.lift({"text": "hello"})
Tip

The with statement ensures all WASM handles are freed when the block exits. Nested with blocks (or the parenthesized multi-context form shown above) are the Python equivalent of TypeScript’s using keyword. If you need to manage handle lifetimes more flexibly, call .close() manually or use contextlib.ExitStack.

15.18 Conventions

The Python SDK enforces the following project conventions:

  • Python 3.13+: The minimum supported version. The SDK uses PEP 695 type aliases, structural pattern matching, parenthesized context managers, and other 3.13 features.
  • PEP 695 type statements: All type aliases use the type keyword syntax rather than TypeAlias.
  • Strict pyright: typeCheckingMode = "strict" with reportMissingTypeStubs = false. All public and private functions have complete type annotations.
  • ruff: Lint and format with target-version = "py313" and line-length = 99. The enabled rule sets include pyflakes, pycodestyle, isort, pep8-naming, pyupgrade, bugbear, builtins shadowing, comprehensions, simplify, type-checking imports, and ruff-specific rules.
  • numpy docstrings: All public classes and functions use the numpy docstring convention with Parameters, Returns, Raises, and Examples sections.
  • Immutable builders: SchemaBuilder and MigrationBuilder return new instances on every mutation. Internal state uses tuples and frozen dicts.
  • __slots__ everywhere: All classes define __slots__ for memory efficiency and to catch attribute typos.
  • @final leaf classes: All concrete classes that shouldn’t be subclassed are decorated with @final.
  • Underscore-prefixed private modules: All implementation modules use the _ prefix. The public API surface is defined in __init__.py via __all__.
  • MIT license: Consistent with the rest of the panproto project.