atproto by example part 1: records and views

Samuel · 5 Mar 2025 · 11 min read · 3.8k views

Note: this blog post is aimed at a technical audience

AT Protocol (hereafter referred to as “atproto”) is a groundbreaking new technology by Bluesky (the company) used to build Bluesky (the social app). At this point, a lot has been said about why this is being build and how it works at a high level, but I think it’s long overdue for someone to get their hands dirty and figure out how to build something from the ground up with it.

This post is not going to cover how it works as a whole from the top down. Instead, this series will attempt to build a bottom-up understanding, by explaining and implementing the nitty-gritty. atproto has many parts, many of which may seem strange in isolation, but they all fit together to form a cohesive whole. If you want to get a better understanding of how it works at a high level, I recommend Dan Abramov’s Web Without Walls talk - you’ll probably be pretty lost if you haven’t seen it, so consider it required watching.

https://www.youtube.com/watch?v=F1sJW6nTP6E

I am also not going to start from the absolute beginning. This should be considered a sequel to Quick start guide to building applications on AT Protocol, which is the official guide to building an atproto demo app. We are going to take the Statusphere example app and extend it, using more advanced atproto concepts.

Understanding lexicon

In the statusphere example, we make a lexicon schema to describe records of type xyz.statusphere.status. It looks something like this:

{
  "lexicon": 1,
  "id": "xyz.statusphere.status",
  "defs": {
    "main": {
      "type": "record",
      "key": "tid",
      "record": {
        "type": "object",
        "required": ["status", "createdAt"],
        "properties": {
          "status": {
            "type": "string",
            "minLength": 1,
            "maxGraphemes": 1,
            "maxLength": 32
          },
          "createdAt": {
            "type": "string",
            "format": "datetime"
          }
        }
      }
    }
  }
}

In summary, it describes a record, which is of type object. This object has two properties, a “status” which is a single grapheme long, and a “createdAt” date, both of which are required.

This schema describes the shape of records one might encounter on the network. An example might be:

{
  "$type": "xyz.statusphere.status",
  "status": "🦋",
  "createdAt": "2025-03-03T18:43:46.740Z"
}

Here’s a question. Is this a valid status record?

{
  "$type": "xyz.statusphere.status",
  "status": "🦋",
  "createdAt": "2025-03-03T18:43:46.740Z",
  "comment": "a butterfly :)",
  "labels": {
    "$type": "com.atproto.label.defs#selfLabels",
    "values": [
      { "val": "graphic-media" }
    ]
  }
}

The answer is yes! All records are open unions. This means that a record might have more content attached to it than described in the lexicon you have. It’s this way for a few reasons:

The lexicon might evolve in the future to have more fields. Open unions allow forward compatibility - an app will only use the fields it knows about, but will still consider future records that might have more fields to be valid.
Extensibility. Other developers might want to attach extra data to records that use your lexicon (although make sure you namespace your fields!).
You don’t own the data. All the data is out there, in the atmosphere, being published and owned by the end users. Lexicons merely describe the minimum viable shape of the data that your application will accept and consider valid.

In the above example, it could be that a future version of the statusphere app has added comments and self-labels. Your statusphere app however only needs the status and createdAt fields to be present (and match the format) for the object to be considered valid, even though there’s extra stuff in there - just ignore it and use the status.

Another case is when a field might have multiple variants, such as the embeds on Bluesky posts. You can see in the schema here, we define what kind of records you can use in the embed: images, video, etc:

https://github.com/bluesky-social/atproto/blob/main/lexicons/app/bsky/feed/defs.json#L16-L25

However, this is an open union, so there could be another kind of embed in there instead - and the post would still be valid. An example of this is when we added video embeds. We added the new record type to the union so that the new versions of the app would know about the new possible variant, but old apps already out there in the wild still wouldn’t know what it was. However, since it’s an open union, they can just ignore the strange new record and render the post without an embed.

Views

Lexicons can define more than records though. A helpful concept in atproto is “views” - records are the raw data type that’s out there in the real world but might be quite minimal, whereas views are processed data that your app may produce for ease of use. An example of this in bsky.app would be the difference between a app.bsky.feed.post record and a app.bsky.feed.defs#postView - a post record can be nothing but a text field and a date, whereas a PostView is a post record that has been processed by the Bluesky AppView and had helpful metadata attached, such as counting the number of likes and reposts, and attaching the author’s name and avatar.

Statusphere statuses are pretty rough to work with directly! Already, we store them in the database in a non-lexicon format, since we need to save who it is by. Let’s describe a new object called a #statusView, which describes what a status might look like after processing.

{
  "lexicon": 1,
  "id": "xyz.statusphere.defs",
  "defs": {
    "statusView": {
      "type": "object",
      "required": ["uri", "status", "profile", "createdAt"],
      "properties": {
        "uri": { "type": "string", "format": "at-uri" },
        "status": {
          "type": "string",
          "minLength": 1,
          "maxGraphemes": 1,
          "maxLength": 32
        },
        "createdAt": { "type": "string", "format": "datetime" },
        "profile": { "type": "ref", "ref": "#profileView" }
      }
    },
    "profileView": {
      "type": "object",
      "required": ["did", "handle"],
      "properties": {
        "did": { "type": "string", "format": "did" },
        "handle": { "type": "string", "format": "handle" }
      }
    }
  }
}

Note that:

these are not "type": "record", they are "type": "object". This means that we can generate a helper type using codegen, but it won’t generate the helper methods to save a record of this type to the network, like it would for a record - it’s purely for internal use.
We also have to define a #profileView, and reference it in the profile field. This is because lexicon doesn’t allow you to define nested objects.

We can then generate a StatusView from a status we get from the database like so:

import { XyzStatusphereDefs } from '@statusphere/lexicon'
import { AppContext } from '#/context'
import { Status } from '#/db'

export async function statusToStatusView(
  status: Status,
  ctx: AppContext,
): Promise<XyzStatusphereDefs.StatusView> {
  return {
    uri: status.uri,
    status: status.status,
    createdAt: status.createdAt,
    profile: {
      did: status.authorDid,
      handle: await ctx.resolver
        .resolveDidToHandle(status.authorDid)
        .catch(() => 'invalid.handle'),
    },
  }
}

Queries and Procedures

Lexicon also allows you to define Queries and Procedures. This lets you automatically codegen API routes from your lexicons, on both frontend and backend. The Statusphere example doesn’t need to do this, because it’s a simple express app with APIs that use FormData and HTML. However, a more complex app might want a strict API contract between their internal services, or even between their services and external third-party services (like Bluesky feed generators). Lexicon lets you define these too! Let’s consider an API to fetch the list of most recent statuses. We can define parameters, like a limit for the number of statuses we want, and then define the API response. In this case, we can use our new xyz.statusphere.defs#statusView, which means our frontend can use the helpful extra metadata we added to the record!

{
  "lexicon": 1,
  "id": "xyz.statusphere.getStatuses",
  "defs": {
    "main": {
      "type": "query",
      "description": "Get a list of the most recent statuses on the network.",
      "parameters": {
        "type": "params",
        "properties": {
          "limit": {
            "type": "integer",
            "minimum": 1,
            "maximum": 100,
            "default": 50
          },
        }
      },
      "output": {
        "encoding": "application/json",
        "schema": {
          "type": "object",
          "required": ["statuses"],
          "properties": {
            "cursor": { "type": "string" },
            "statuses": {
              "type": "array",
              "items": {
                "type": "ref",
                "ref": "xyz.statusphere.defs#statusView"
              }
            }
          }
        }
      }
    }
  }
}

Procedures are a similar story - where queries are GET requests, procedures are POST requests. Let’s define one for sending statuses:

{
  "lexicon": 1,
  "id": "xyz.statusphere.sendStatus",
  "defs": {
    "main": {
      "type": "procedure",
      "description": "Send a status into the ATmosphere.",
      "input": {
        "encoding": "application/json",
        "schema": {
          "type": "object",
          "required": ["status"],
          "properties": {
            "status": {
              "type": "string",
              "minLength": 1,
              "maxGraphemes": 1,
              "maxLength": 32
            }
          }
        }
      },
      "output": {
        "encoding": "application/json",
        "schema": {
          "type": "object",
          "required": ["status"],
          "properties": {
            "status": {
              "type": "ref",
              "ref": "xyz.statusphere.defs#statusView"
            }
          }
        }
      }
    }
  }
}

Code generation

Now we have these shiny new lexicons, what can we do with them? The answer is the @atproto/lex-cli package, which allows code generation from lexicons into both frontend and backend code. This is how @atproto/api is generated from the lexicons, for example.

It’s worth noting at this point that I have refactored the Statusphere example into having a separate frontend and backend. The frontend now uses react/vite, and the backend is still express but now uses JSON APIs rather than sending HTML. Take a look:

https://github.com/mozzius/statusphere-react

You’ll note that lexicons sit at the root of the monorepo. pnpm lexgen triggers the code generation, which uses lex gen-api to generate a client-side package in @statusphere/lexicon, and lex gen-server to generate backend code in @statusphere/appview. Let’s have a look what it generated from our new lexicons, and how it helps.

Client codegen

I packaged the frontend codegen into @statusphere/lexicon, so it could be shared. In /packages/client/src/lib/api.ts, we create a StatusphereAgent which has the API routes we defined earlier. For example:

const {data} = agent.xyz.statusphere.getStatuses({limit: 10})

If you’ve used @atproto/api before, you get it. Basically, we can add extra methods to the atproto agent, as defined by our custom lexicons.

Backend codegen

Here’s where it gets interesting. lex gen-server generates a XRPC Server using the @atproto/xrpc-server. These are the matching backend routes for the agent on the frontend. We can define typesafe route handlers for each of our queries and procedures, then add them to our Express server. Here’s the pattern I used, which is similar to the Bluesky AppView (although feel free to plumb the data through differently).

// src/index.ts
import express from 'express'
// this is codegen'd by lex gen-server
import { createServer } from '#/lexicon'
import API from '#/api'

// other setup stuff, such as the ingestor

const app = express()
app.use(express.json())
app.use(express.urlencoded({ extended: true }))

// Create our XRPC server
let server = createServer({
  validateResponse: env.isDevelopment,
  payload: {
    jsonLimit: 100 * 1024, // 100kb
    textLimit: 100 * 1024, // 100kb
    // no blobs
    blobLimit: 0,
  },
})

server = API(server, ctx)

app.use(server.xrpc.router)

// add other routes, start up the server

This is where we add all the individual route handlers to the Express server:

// src/api/index.ts
import { AppContext } from '#/context'
import { Server } from '#/lexicons'
import getStatuses from './lexicons/getStatuses'
import getUser from './lexicons/getUser'
import sendStatus from './lexicons/sendStatus'

export default function (server: Server, ctx: AppContext) {
  getStatuses(server, ctx)
  sendStatus(server, ctx)
  getUser(server, ctx)
  return server
}

And here’s an example of a route handler - in this case, for xyz.statusphere.getStatuses

// src/api/lexicons/getStatuses.ts
import { AppContext } from '#/context'
import { Server } from '#/lexicons'
import { statusToStatusView } from '#/lib/hydrate'

export default function (server: Server, ctx: AppContext) {
  server.xyz.statusphere.getStatuses({
    handler: async ({ params }) => {
      // Fetch data stored in our SQLite
      const statuses = await ctx.db
        .selectFrom('status')
        .selectAll()
        .orderBy('indexedAt', 'desc')
        .limit(params.limit)
        .execute()

      return {
        encoding: 'application/json',
        body: {
          statuses: await Promise.all(
            statuses.map((status) => statusToStatusView(status, ctx)),
          ),
        },
      }
    },
  })
}

This is super helpful for ensuring your frontend and backend are synced - the source of truth is coming from your lexicons. Adding a new route is a matter of defining a new lexicon, running codegen, and adding a new handler, which is super simple since it’s all already typesafe on frontend and backend.

Homework

Fork https://github.com/mozzius/statusphere-react and extend it with a new XRPC query! Maybe try adding a query to get the most recent status of a given user, or even get a user’s status history. Play around with adding more detailed lexicon view objects - for example, an emojiStats object which defines how many times an emoji has been posted.