graph LR
A["Source document<br/>(format A)"]
B["Schema graph<br/>(universal IR)"]
C["Schema graph<br/>(optionally transformed)"]
D["Target document<br/>(format B)"]
A -->|"parse_*()"| B
B -->|"migrate / transform"| C
C -->|"emit_*()"| D
11 Cross-Protocol Translation
Every protocol in panproto has two functions: \(\mathrm{parse}_P\) reads a native schema document into the universal Schema graph \(G\), and \(\mathrm{emit}_P\) writes \(G\) back out to native format. The translation pipeline is \(\mathrm{emit}_B \circ\; m \circ\; \mathrm{parse}_A\), where \(m\) is an optional migration. This is pandoc for schemas: a shared intermediate representation connecting every pair of formats.
11.1 The parse / schema / migrate / emit pipeline
Every translation has four stages:
Parse. The function \(\mathrm{parse}_A\) reads native schema notation and constructs a Schema graph \(G\). Constructs with no graph-level equivalent are dropped or converted to best-effort annotations.
Schema graph. The universal intermediate representation is the same Schema graph used for diffs, migrations, lens derivation, and breaking-change detection. It isn’t a separate format; it’s the standard representation.
Migrate / transform. This stage is optional. Apply a panproto migration to \(G\) between parse and emit to rename a field, flatten a nested type, or restructure a hierarchy. Without this stage, the pipeline is direct format translation; with it, you translate and transform simultaneously.
Emit. The function \(\mathrm{emit}_B\) walks the Schema graph and produces native schema notation, assigning field numbers, choosing appropriate type names, and generating the correct syntax.
Every one of panproto’s 76 built-in protocols has both a parse and an emit function. Every pair of protocols is connected by a potential translation path.
11.2 A real example: Protobuf to GraphQL
You have gRPC services defined in .proto files and want to expose them through a GraphQL API. Tools like grpc-gateway solve this with hand-written glue code. With panproto, the schema translation is structural.
Here is the Protobuf service definition:
syntax = "proto3";
package social.v1;
message UserProfile {
string user_id = 1;
string display_name = 2;
repeated string interests = 3;
ProfileStatus status = 4;
}
enum ProfileStatus {
PROFILE_STATUS_UNSPECIFIED = 0;
ACTIVE = 1;
SUSPENDED = 2;
}
message GetUserRequest {
string user_id = 1;
}
message GetUserResponse {
UserProfile profile = 1;
}
service UserService {
rpc GetUser(GetUserRequest) returns (GetUserResponse);
rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
}panproto emits this as GraphQL SDL:
type UserProfile {
userId: String!
displayName: String!
interests: [String!]!
status: ProfileStatus!
}
enum ProfileStatus {
ACTIVE
SUSPENDED
}
type Query {
getUser(userId: String!): UserProfile
listUsers(limit: Int, cursor: String): UserProfileConnection
}
type UserProfileConnection {
edges: [UserProfileEdge!]!
pageInfo: PageInfo!
}
type UserProfileEdge {
node: UserProfile!
cursor: String!
}The translation pipeline looks like this:
import { parse_proto, emit_graphql, PROTOBUF_SPEC, GRAPHQL_SPEC } from "@panproto/core";
const schema = parse_proto(protoSource, PROTOBUF_SPEC);
const graphqlSdl = emit_graphql(schema, GRAPHQL_SPEC);What happens during translation:
- Message types become GraphQL
typedeclarations. Field names shift fromsnake_casetocamelCase. - Enums carry over directly, minus the
UNSPECIFIEDsentinel (no GraphQL equivalent). - Field numbers are dropped. GraphQL has no wire-format metadata.
repeatedfields become GraphQL list types ([String!]!).- RPC methods become
Queryfields. The request message is flattened into arguments; the response becomes the return type. - Service definitions drive generation of connection/edge types for list endpoints (Relay pagination pattern).
The structural content (type names, field names, field types, enum variants, nesting relationships) survives. What was lost is Protobuf-specific wire metadata. What was gained is GraphQL-specific idiom.
If you translate Protobuf to GraphQL and then GraphQL to OpenAPI, do you get the same result as translating Protobuf directly to OpenAPI? Under what conditions does \(\mathrm{emit}_C \circ \mathrm{parse}_C \circ \mathrm{emit}_B \circ \mathrm{parse}_A = \mathrm{emit}_C \circ \mathrm{parse}_A\)?
Not in general. The intermediate round-trip through format B can lose information. If B’s theory is a strict subset of A’s, the first translation drops what B cannot represent, and the second translation cannot recover it. The equation holds only when B can represent everything that the Schema graph carries from A. Within a theory group (where protocols share theories), the equation holds.
11.3 Another example: ATProto Lexicon to ActivityPub
You need to bridge Bluesky and Mastodon by translating between ATProto’s Lexicon schema format and ActivityPub’s JSON-LD vocabulary.
ATProto Lexicon (Bluesky post schema):
{
"lexicon": 1,
"id": "app.bsky.feed.post",
"defs": {
"main": {
"type": "record",
"key": "tid",
"record": {
"type": "object",
"required": ["text", "createdAt"],
"properties": {
"text": {
"type": "string",
"maxLength": 3000,
"maxGraphemes": 300
},
"createdAt": { "type": "string", "format": "datetime" },
"reply": { "type": "ref", "ref": "#replyRef" },
"embed": { "type": "union", "refs": [
"app.bsky.embed.images",
"app.bsky.embed.video",
"app.bsky.embed.external",
"app.bsky.embed.record",
"app.bsky.embed.recordWithMedia"
]},
"langs": {
"type": "array",
"items": { "type": "string" },
"maxLength": 3
}
}
}
}
}
}Translated to an ActivityPub Note (JSON-LD):
{
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Note",
"attributedTo": null,
"content": { "@type": "xsd:string", "maxLength": 300 },
"published": { "@type": "xsd:dateTime" },
"inReplyTo": { "@type": "@id" },
"attachment": {
"@type": "@set",
"items": { "oneOf": ["Image", "Link"] }
},
"contentMap": {
"@type": "@language",
"items": { "@type": "xsd:string" }
}
}Two function calls handle it:
import { parse_lexicon, emit_activitypub } from "@panproto/core";
const schema = parse_lexicon(lexiconSource, ATPROTO_SPEC);
const apSchema = emit_activitypub(schema, ACTIVITYPUB_SPEC);The translation maps:
text(withmaxLength: 3000bytes andmaxGraphemes: 300grapheme clusters) tocontent(withmaxLength: 300): constraint approximation, since ATProto tracks both byte and grapheme length while ActivityPub has a single length conceptcreatedAttopublished: direct datetime mappingreply.reftoinReplyTo: reference semantics preservedembedunion (images, video, external links, record embeds) toattachmentset withoneOf: union-to-set mappinglangsarray tocontentMaplanguage map: structural reinterpretation
The Bluesky-specific dual constraints are approximated to a single maxLength. The ATProto tid key has no ActivityPub equivalent and is dropped. The core content structure (text, timestamps, replies, embeds, language tags) translates faithfully.
11.4 Round-trip fidelity
Translation is not lossless in general. What survives a round trip (\(\mathrm{parse}_A \to \mathrm{emit}_B \to \mathrm{parse}_B \to \mathrm{emit}_A\)) depends on how much structural territory the two formats share.
Two sources of information loss exist.
Parse loss. When \(\mathrm{parse}_A\) reads a source document, constructs with no graph-level equivalent are dropped. SQL CHECK constraints containing arbitrary expressions may be captured only as opaque annotations.
Emit loss. When \(\mathrm{emit}_B\) writes a target document, constructs in the Schema graph with no target-format equivalent are dropped or approximated. Hyperedges are dropped when emitting to a format with no hyperedge concept.
| Translation | What survives | What is typically lost |
|---|---|---|
| JSON Schema to TypeScript | Field names, types, optionality | Constraint keywords (maxLength, pattern) |
| SQL DDL to Parquet | Column names, types, nullability | Foreign keys, CHECK constraints, defaults |
| GraphQL SDL to OpenAPI | Type names, field names, types, interfaces | Directives, resolver-level semantics |
| Protobuf to FlatBuffers | Message names, field names, types | Field numbers, service definitions |
| Protobuf to GraphQL | Type names, field names, types, enums | Field numbers, wire encoding, streaming RPCs |
| ATProto Lexicon to ActivityPub | Content structure, timestamps, references | Grapheme constraints, record keys, NSIDs |
The middle column corresponds to the structural skeleton: the graph of types and relationships. The right column corresponds to format-specific decorations outside the Schema graph’s scope.
11.5 The theory group advantage
The theory architecture explains why some translations are lossless and others are not.
Every protocol has a pair of theories: a schema theory and an instance theory (Chapter 9). Protocols with closely related theories produce higher-fidelity translations. This gives “structural comparability” a precise meaning.
panproto’s built-in protocols cluster into theory groups based on their schema and instance theories:
| Group | Schema theory family | Instance theory | Representative protocols |
|---|---|---|---|
| A | \(\text{ThCategory}\) | \(\text{ThFunctor}\) | CQL |
| B | \(\text{ThHypergraph} + \text{ThConstraint}\) | \(\text{ThFunctor}\) | SQL DDL, Cassandra, DynamoDB |
| C | \(\text{ThGraph} + \text{ThConstraint} + \text{ThMulti}\) | \(\text{ThWType}\) | JSON Schema, ATProto, Avro, Protobuf, Thrift, FlatBuffers |
| D | \(\text{ThGraph} + \text{ThConstraint} + \text{ThMulti} + \text{ThInterface}\) | \(\text{ThWType}\) | GraphQL SDL, OpenAPI, AsyncAPI |
| E | \(\text{ThGraph} + \text{ThConstraint} + \text{ThMulti}\) | \(\text{ThFunctor}\) | Parquet, Arrow, DataFrame |
Within-group translation is structurally lossless. Two Group C protocols (Avro and Protobuf) share the same schema theory. A Schema graph that is a valid Avro schema is, structurally, also a valid Protobuf schema. The translation reduces to remapping surface syntax: Avro’s "type": "record" becomes Protobuf’s message. The graph structure itself doesn’t change.
Cross-group translation involves structural mismatch. Converting from Group B (SQL, hypergraph) to Group C (Protobuf, multigraph) loses hyperedge structure: foreign keys, composite unique constraints, and multi-column primary keys can’t be expressed in Group C’s multigraph theory. Converting in the other direction loses tree-shaped instance semantics: SQL rows are flat, and nested document structure doesn’t survive without denormalization decisions the engine can’t make.
graph LR
A["Group A<br/>CQL"]
B["Group B<br/>SQL, Cassandra, DynamoDB"]
C["Group C<br/>JSON Schema, Avro,<br/>Protobuf, FlatBuffers"]
D["Group D<br/>GraphQL, OpenAPI,<br/>AsyncAPI"]
E["Group E<br/>Parquet, Arrow"]
C <-->|lossless| C
D <-->|lossless| D
B <-->|lossless| B
C <-.->|"structural loss<br/>(hyperedges dropped)"| B
C <-.->|"structural loss<br/>(no interfaces)"| D
D <-.->|"structural loss<br/>(hyperedges dropped)"| B
B <-.->|"structural loss<br/>(instance theory mismatch)"| E
C <-.->|"structural loss<br/>(instance theory mismatch)"| E
Group membership is determined by which building-block theories a protocol composes (Chapter 9). When two protocols share the same building blocks, their Schema graphs have the same shape, and translation is relabeling. When the building blocks differ, translation must bridge a structural gap, and information is lost there.
Group C and Group D differ only by \(\text{ThInterface}\). Does a Group D protocol always embed losslessly into a “Group D without interfaces” representation? Or does the presence of \(\text{ThInterface}\) in the colimit change the structure of sorts that existed before?
Not always. The presence of \(\text{ThInterface}\) in the colimit can introduce new equations and sort identifications. If a Group D schema uses interface types (e.g., a GraphQL Node interface implemented by User and Post), removing \(\text{ThInterface}\) loses the subtyping relationships. The User and Post types would appear as unrelated vertices, erasing the information that they share a common interface.
11.5.1 A note on “lossless”
“Structurally lossless at the schema level” means the graph structure of the schema survives the round trip. It does not mean the round trip produces identical source bytes, or that all protocol-specific metadata is preserved. Protobuf field numbers aren’t part of the multigraph structure; they’re a surface feature of the wire format. A Protobuf-to-Avro-to-Protobuf round trip won’t preserve field numbers; the emit step assigns fresh ones. The schema graph (message names, field names, types, relationships) will be intact.
If your translation requirements include preserving wire-format metadata, carry it as an opaque annotation through the Schema graph. The Schema graph supports arbitrary key-value metadata on vertices and edges; parse functions use this to preserve information meaningful to the source format. A custom emit wrapper can then read and apply these annotations selectively.
11.6 Comparison with pandoc
The pandoc analogy is worth examining in detail, because the structural parallel is exact in some places and breaks down in others.
What’s the same. Both pandoc and panproto translate between multiple formats using a shared intermediate representation. pandoc’s IR is a document abstract syntax tree. panproto’s IR is the Schema graph \(G\). In both systems, all translation goes through the IR; neither system converts directly between pairs of formats.
Both face the same fundamental limitation: information that can’t be represented in the IR is lost. pandoc can’t round-trip Word documents with custom styles and tracked changes. panproto can’t round-trip SQL schemas with arbitrary CHECK expressions.
What’s different. pandoc’s IR is a fixed, hand-designed document AST. Adding a new element type requires changing pandoc’s source code. panproto’s IR isn’t fixed; its structure is determined by the protocol theories. Adding a new structural concept means extending a theory via composition (Chapter 9), not patching the IR definition.
The second difference is in the use of the IR. In pandoc, the IR exists solely for conversion. In panproto, the Schema graph is the primary representation for all operations: diff, migration, existence checking, lens derivation, breaking-change detection, and translation all operate on the same object.
| pandoc | panproto | |
|---|---|---|
| Translates between | Document formats | Schema formats |
| Intermediate representation | Pandoc document AST | Schema graph \(G\) |
| IR defined by | Fixed Haskell data types | Protocol theories (extensible) |
| IR used for | Conversion only | Conversion, diff, migration, lenses, … |
| Loss mechanism | AST has no element for \(X\) | Schema graph has no vertex/edge for \(X\) |
| Format-specific extensions | Custom AST metadata | Theory-specific constraint annotations |
pandoc’s fixed IR means every format author targets the same stable structure. panproto’s extensible IR means theories can grow. If protocol A extends the IR with a new sort via composition, does protocol B (written before that extension) need to be updated? What guarantees backward compatibility of the shared IR?
No. Protocol B targets the building-block theories it was composed from. A new sort added by protocol A via composition lives in A’s theory, not in the shared building blocks. The shared IR (the building-block theories) is stable; extensions happen in individual protocol theories. Backward compatibility of the shared IR is guaranteed by the colimit construction: adding new sorts to a component theory does not alter the existing sorts or operations in the shared base.
11.7 Further reading
The functorial data migration framework underlying cross-protocol translation is due to Spivak (2012). The theory group structure is a consequence of the compositional protocol definitions described in Lynch et al. (2024). The connection between theory morphisms and information-preserving translations is developed in Cartmell (1986).
Cross-protocol translation works because panproto gives every schema element a structural identity. That identity is protocol-relative: two elements in different protocols may refer to “the same thing” under different names. The next chapter formalizes how naming correspondences are established, tracked, and composed.