1 The Schema Migration Problem
2 The Schema Migration Problem
Every schema language has the same problem: data outlives the schema it was written against. The moment a schema changes, a gap opens between old data and new expectations, and something must close it. That something is a migration—a transformation derived from the relationship between the two schema versions.
panproto starts from the observation that schema languages, despite their surface differences, all describe the same kind of structural object: named containers holding typed, constrained members connected by edges. Because the structure is shared, the migration machinery can be shared too.
2.1 Four schemas for a post
To make the pattern concrete, here is the same entity—a social media post—described in four different schema languages.
2.1.1 ATProto Lexicon
Lexicon is a JSON-based schema language with protocol-specific concepts: namespaced identifiers (NSIDs), record types, and grapheme-cluster counting.
{
"lexicon": 1,
"id": "app.bsky.feed.post",
"defs": {
"main": {
"type": "record",
"description": "A post in a social feed.",
"key": "tid",
"record": {
"type": "object",
"required": ["text", "createdAt"],
"properties": {
"text": {
"type": "string",
"maxLength": 3000,
"maxGraphemes": 300
},
"createdAt": {
"type": "string",
"format": "datetime"
},
"tags": {
"type": "array",
"items": {
"type": "string",
"maxLength": 640,
"maxGraphemes": 64
},
"maxLength": 8
}
}
}
}
}
}The NSID app.bsky.feed.post identifies this record type across the AT Protocol network. Inside the record block, the shape language looks a lot like JSON Schema: a required array, a properties object, type annotations on each property. The one thing that is distinctly Lexicon is maxGraphemes, which counts Unicode grapheme clusters rather than bytes or code points.
2.1.2 SQL DDL (PostgreSQL)
CREATE TABLE posts (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
text VARCHAR(3000) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
tags TEXT[]
);SQL brings concepts the other formats lack: identity (PRIMARY KEY), server-side defaults (DEFAULT now()), and write-time enforcement (NOT NULL). The TEXT[] type is PostgreSQL’s array syntax; most other databases would model tags as a join table, which is itself a schema design decision with migration consequences.
2.1.3 Protocol Buffers
syntax = "proto3";
package social.feed;
import "google/protobuf/timestamp.proto";
message Post {
string text = 1;
google.protobuf.Timestamp created_at = 2;
repeated string tags = 3;
}Protobuf’s distinguishing feature is field numbers (= 1, = 2, = 3). They decouple the wire encoding from field names, which means you can rename a field without breaking existing consumers. The trade-off is that you can never reuse a field number. There is no built-in constraint language; if you want maxLength validation, you reach for an external tool like Buf and its protovalidate toolchain.
2.1.4 GraphQL SDL
"""A post in a social feed."""
type Post {
text: String!
createdAt: DateTime!
tags: [String!]
}GraphQL describes API surfaces, not storage. The ! suffix means non-nullable (GraphQL’s analog of required). [String!] is a nullable list of non-nullable strings; nullability is a type-level concern here, not a separate constraint annotation. There is no maxLength in the SDL. Validation happens in resolvers or custom directives.
2.3 Building schemas in panproto
Here is the same post schema built with panproto’s TypeScript SDK for two protocols. The point is not the API surface (which you can look up in the reference); the point is the shape of the construction.
2.3.1 ATProto
import { Panproto } from '@panproto/core';
// start snippet init
const panproto = await Panproto.init();
const atproto = panproto.protocol('atproto');
// end snippet init
// start snippet build-schema
const postSchema = atproto.schema()
.vertex('post', 'record', { nsid: 'app.bsky.feed.post' })
.vertex('post:body', 'object')
.vertex('post:body.text', 'string')
.vertex('post:body.createdAt', 'string')
.edge('post', 'post:body', 'record-schema')
.edge('post:body', 'post:body.text', 'prop', { name: 'text' })
.edge('post:body', 'post:body.createdAt', 'prop', { name: 'createdAt' })
.constraint('post:body.text', 'maxLength', '3000')
.constraint('post:body.text', 'maxGraphemes', '300')
.constraint('post:body.createdAt', 'format', 'datetime')
.build();
// end snippet build-schemaSchema types become vertices. The fields connecting them become edges. Constraints are annotations on vertices. The builder API mirrors Lexicon’s own structure: vertices for the record, the inner object, and each property’s type; edges connecting them; constraints attached where they belong.
2.3.2 SQL
import { Panproto } from '@panproto/core';
const panproto = await Panproto.init();
// start snippet build-schema
const sql = panproto.protocol('sql');
const postTable = sql.schema()
.vertex('posts', 'table')
.vertex('posts.id', 'integer')
.vertex('posts.text', 'string')
.vertex('posts.created_at', 'string')
.edge('posts', 'posts.id', 'column', { name: 'id' })
.edge('posts', 'posts.text', 'column', { name: 'text' })
.edge('posts', 'posts.created_at', 'column', { name: 'created_at' })
.constraint('posts.text', 'maxLength', '3000')
.build();
// end snippet build-schemaSame shape. Different vocabulary. The word “table” replaces “record,” “column” replaces “property,” but the construction logic—declare vertices, connect them with edges, attach constraints—is identical. What varies is the vocabulary of vertex kinds, edge kinds, and constraint types.
That vocabulary is exactly what we will formalize next. The next chapter asks: what, precisely, do these graph structures have in common, and how can we describe the variation between schema languages formally enough for an engine to compute with it?