Research paper from ChatGPT 4o-mini-high "deep research"
We are embarking on a challenging project to create a new hypermedia web protocol. This protocol isn’t a simple single-use API – it spans multiple roles and contexts. Each node in the network can play different roles (for example, acting as a server, a client, a paid service provider, an archiver, or an indexer), and depending on its role, it will expose different APIs. Our goal is to define a unified way to describe all these APIs and the data they handle, so that implementers (in TypeScript, Go, and other languages) can all follow the same strict specification.
In essence, we need a robust type system and an API protocol layer that covers all aspects of our hypermedia network. This system should cover both how data is structured and how it’s exchanged (including real-time updates). The challenge is that our network touches many mediums and patterns:
Multiple Roles, Multiple APIs: A node’s behavior and available endpoints differ if it’s a client, server, indexer, etc. We need a way to specify each role’s API while reusing common data types across them. A unified schema will prevent inconsistencies as the same data types will appear in various APIs.
Shared Data Types for TypeScript and Go: We have a codebase in both TS and Go. We want to define data models and interfaces once and generate or enforce them in both languages. A single source of truth for types will keep front-end and back-end in sync and reduce errors.
Strictly Structured, Signed Data: Our network stores permanent data in IPFS (using DAG-CBOR encoding), and this data is cryptographically signed. This means we need a well-defined schema for these stored objects – every piece of content should conform to a schema for validity. The data format should be self-describing if possible, so that anyone inspecting an object can figure out its type and validate it against the proper schema. Ideally, one should be able to follow a reference from the data to its schema or documentation easily.
Self-Documenting and Readable: We want the format and protocol to be easy for developers to read and write. This implies using human-readable structures (e.g. JSON-like schema or descriptors) and including metadata that makes the data self-documenting. For instance, if an object has a certain type or field, one should be able to discover what that means (through an ID or link to its schema or an explanation).
Real-Time “Push” Communication: This is not just a request/reply web API. The system needs real-time capabilities – servers and nodes must be able to push data updates immediately to others. Whether it’s via web sockets, subscriptions, or another mechanism, our protocol layer must support streaming updates and asynchronous events. This is true both for node-to-node communication (e.g. propagating new content across the network) and for client-server communication (e.g. live updates to a user’s app).
Automated Documentation: Given the complexity, we want to generate documentation automatically from the schemas. If we define our types and endpoints formally, we should be able to produce human-friendly docs (web pages, READMEs, etc.) that explain the hypermedia data formats and APIs. This will greatly help developers understand and use the protocol.
Browser-Friendly and Offline-Friendly: Client-side web support is a must – browsers should be able to use this protocol (meaning we may need HTTP or WebSocket compatibility, not a binary that only native apps can use). Also, considering the decentralized nature (IPFS), nodes might be offline or data might be fetched from peers. We might even consider embedding schema information into IPFS itself so that if you encounter some data and you’re offline, you could still retrieve its schema from the network. In other words, the system should play well with a distributed environment where not everything is served from a central server.
Current Stack Constraints: We currently use IPLD (InterPlanetary Linked Data) with DAG-CBOR for data serialization and storage. DAG-CBOR is a binary JSON-like format that ensures content is content-addressable (hashes of data are consistent) and is a preferred codec in IPFS. This choice is essentially fixed for our permanent data storage – we aren’t planning to change the data encoding away from DAG-CBOR. We also use gRPC today for communication between front-end and back-end components. gRPC gives us a way to define services and uses Protocol Buffers for data encoding. However, it’s not necessarily the best fit for a web client (which might have trouble with gRPC’s binary protocol or need special proxies), and it doesn’t inherently provide the kind of self-documenting hypermedia interface we’re aiming for. We will examine whether we continue with gRPC or switch to something else.
Considering all the above, we’re essentially looking at a very meta project – one that defines how all other pieces talk and what data they exchange. It’s a bit hard to explain because it’s a layer of abstraction above the actual features. One way to see it is that we’re designing an Interface Definition Language (IDL) and a protocol specification for our hypermedia network. This will serve as the foundation for bringing documentation, robustness, and sanity to the evolving system.
What should we call this? It could be described as a schema-driven hypermedia protocol framework. We haven’t decided on a catchy name yet, but it might help to think of it as the “hypermedia schema and API layer” for our IPFS+libp2p network. (For now, we’ll focus on choosing the right technology; a good name can come once we know what it’s built on!)
Evaluating Potential Solutions and Ecosystems
Given these requirements, the big question is: Is there an existing ecosystem or standard that fulfills these needs, or do we need to create a custom solution? We will examine several candidate technologies and approaches one by one:
The AT Protocol Lexicon – A schema and API system from the Bluesky/AT Protocol project, which might align closely with our needs.
GraphQL – A popular query language and type system known for strong typing, flexibility, and real-time support via subscriptions.
gRPC and Protocol Buffers – Our current RPC framework, offering strict types and streaming, and how it might be adapted or improved (e.g. via new tools) to meet our goals.
OpenAPI (Swagger) and JSON Schema – The mainstream way to specify RESTful APIs and data models, with a huge ecosystem of tools (and the possibility of pairing it with AsyncAPI for real-time aspects).
IPLD Schemas or a Custom IDL – Rolling our own schema language, possibly building on IPLD’s existing schema system, to tailor exactly to our environment.
Combination or Other Niche Approaches – (For completeness, considering mixing strategies or other lesser-known frameworks).
Let’s break down how each option stacks up against our requirements:
AT Protocol Lexicons (Bluesky’s Schema System)
One promising existing solution is the AT Protocol’s Lexicon system. The AT Protocol (developed by Bluesky) is a decentralized social networking protocol, and it introduced Lexicons as a way to define both the data types (records) and the API methods (procedures/queries/subscriptions) in a single schema language. This approach sounds very much like what we need: it’s designed for an open network where different parties need to agree on data formats and behaviors.
What Lexicon Offers: Lexicons are written in JSON and are similar to JSON Schema or OpenAPI, but with extensions specific to the AT Protocol’s needsatproto.com. Each lexicon has a unique ID (a namespaced identifier) and can define: record types (the schema for stored objects), procedures (RPC endpoints for actions, usually POST), queries (read-only endpoints, typically GET), and subscriptions (real-time event streams over web sockets)atproto.comatproto.com. In other words, a single lexicon file can describe a piece of the protocol – for example, a lexicon might define a data model for a “Post” record and also the “getPost” API to fetch it, etc.
Crucially, Lexicon enforces that data can be self-describing. In AT Protocol, objects often carry a $type field that tells you which schema (lexicon) they conform toatproto.com. This means if one node hands some data to another, the receiving side can look at $type and know how to interpret and validate that object (by referring to the lexicon definition of that type)atproto.com. This is exactly the kind of self-documenting approach we envision – any data can point to an explanation of what it is. Lexicon’s design explicitly notes that records should include the $type field, since records might be circulated outside of their original context and “need to be self-describing.”atproto.com.
For APIs, Lexicon defines a lightweight RPC mechanism called XRPC (basically RESTful calls under the hood). Endpoints are identified by names like com.example.getProfile, and they correspond to HTTP paths (e.g. GET /xrpc/com.example.getProfile)atproto.com. The lexicon schema for an endpoint specifies its parameters, input schema, output schema, and even possible error codesatproto.comatproto.com. There’s also support for subscriptions (server-sent events over websockets), where a lexicon can define the message types that stream outatproto.com. This aligns with our real-time requirement.
Why it Aligns with Our Needs: The Lexicon system was built to solve interoperability in a decentralized network – it’s meant to allow different implementations to agree on behavioratproto.com. That is very much our problem too. It’s also not as over-generalized as something like RDF; Lexicon is meant to be pragmatic and even supports code generation for static types and validationatproto.com. In fact, the Bluesky team states that lexicons “enable code-generation with types and validation, which makes life much easier”atproto.com. Indeed, they have built tools to generate TypeScript interfaces and client libraries directly from lexicon filesatproto.comatproto.com. For example, if you have a lexicon for com.example.getProfile, you can generate a TS client method so that calling it feels like a normal function call with typed return valuesatproto.com. This is great for our goal of having type-safe interfaces in TS and Go – we could write schemas once and generate stubs/clients in both languages.
Another huge plus: Lexicon builds on IPLD and uses DAG-CBOR for binary representation. The AT Protocol data models can be represented in JSON or in CBOR (content-addressable) formatproto.com. Bluesky’s repo sync and events actually use content-addressable records and CAR files (which are from the IPFS world). This means lexicon is natively compatible with the idea of storing data in IPFS. Our current storage (IPFS DAG-CBOR) would fit right in, since lexicon-defined objects can be encoded as DAG-CBOR and carry their $type for interpretationatproto.comatproto.com. They even have a special type for cid-link to represent IPFS content addressesatproto.com. All this suggests that adopting Lexicon could let us keep using IPFS for data and have the schemas to validate those DAG-CBOR objects.
Tooling and Ecosystem: The AT Protocol ecosystem provides some tooling already. For instance, there’s a TypeScript lexicon parser and code generator (lex-cli) that Bluesky uses to produce its client librariesdocs.bsky.appatproto.blue. They also have a Python SDK that was auto-generated from lexicons: it includes models, XRPC client, and even a “firehose” (streaming) client – and it’s explicitly built to allow custom lexicons, not just Bluesky’satproto.blue. In the Python SDK docs, they encourage using the code generator for your own lexicon schemas and mention that the SDK provides utilities for CID, NSID, AT URIs, DAG-CBOR, CAR files, etcatproto.blue. This indicates a mature approach where a lot of the heavy lifting (parsing schemas, creating data structures, handling content addressing) is already handled. For Go, there might not be an official lexicon codegen yet, but since Bluesky’s reference implementation had components in Go, it’s possible similar tools exist or can be built.
Potential Downsides: Lexicon is quite new and specific to the AT Protocol. It’s essentially a bespoke solution for that ecosystem. Adopting it would mean learning its schema definition style and perhaps extending or modifying it for any unique needs of our project. The community around it is smaller compared to something like GraphQL or OpenAPI. However, the concepts in lexicon (JSON schema-like definitions) are familiar enough, and it’s stated that lexicons could be translated to JSON Schema or OpenAPI if neededatproto.com. The main question is whether it covers everything we need. From what we see: it covers data records, it covers RPC endpoints, it covers real-time streams, it supports content-addressable data, and it was built with codegen and multi-language use in mind. That checks almost all our boxes.
In summary, AT Proto’s Lexicon provides a ready-made “schema and API language” tailored for a decentralized, content-addressed network with multi-language support. It yields self-describing data ($type fields) and has existing tools for code generation and documentation. This could be an excellent starting point or even the foundation of our system, sparing us from reinventing a schema language from scratch.
GraphQL
Next, let’s consider GraphQL, a very popular technology for APIs. GraphQL is essentially a query language and schema definition system for APIs that was open-sourced by Facebook. It lets you define types (with fields and their types) and operations (queries, mutations, subscriptions) in a schema. Clients can request exactly the data they need with flexible queries, and the system can provide powerful tooling thanks to its strict schema and introspection capabilities.
Strong Typing and Single Schema: GraphQL’s type system could give us a unified view of our data. We can define object types that represent our records (e.g., a Post type with fields like id, content, createdAt), and also define the entry points for operations. GraphQL schemas often have “Query” type for read operations and “Mutation” type for write operations. They also support subscriptions for real-time updates (typically implemented over WebSocket). For instance, we could allow a subscription like onNewPost that pushes new posts to subscribers. This matches our need for push-based data flow.
One of GraphQL’s biggest strengths is how self-documenting it is. The schema is part of the server, and GraphQL includes an introspection system that allows clients (or tools) to query the schema itself. In practice, this means you can ask the server “what queries do you support, what types do you have, what fields do they have, and what do those fields mean?”adhithiravi.medium.comadhithiravi.medium.com. GraphQL APIs are required to provide this introspection, making them effectively self-documenting APIs. Developers can use tools like GraphiQL or Apollo Explorer to browse the API and see descriptions. This addresses our automated documentation goal: with GraphQL, documentation UIs can be generated on the fly from the live schema, and tools can even do autocompletion and code generation based on the introspective schemaadhithiravi.medium.com. In short, GraphQL’s introspection