standard.horse is a little tool I'm working on to work with standard.site documents. It's a pretty interesting example of permissionless interoperability, which is one of the core principles of atproto. Before I dive into the details, I think some background is needed. Let's take it back a step.
What is standard.site?
Wait, let's go further back.
What is a blog?
Okay, you've written some words, and now you want to subject internet users to them. The personal blog is the perfect place for that, so you wrap it in some HTML and put it online.
Congratulations, you now have a blog.
Frankly, if you want to stop here, please feel welcome to. This is still allowed. More power to you.
Reaching an audience
The problem with a free-floating blog is nobody is going to read it. Sure, you can self-promote it, or set up an email list, but even if you build a readership they're going to have to travel to your website to see if you've posted anything new. We need distribution.
The solution is the venerable RSS protocol, beloved by many, unknown to anyone under the age of 25. RSS specifies a structured format where you can list your blogposts, allowing generic software to pull in all sorts of different blogs across the internet into a comprehensible view. This lets people build a feed of their favourite blogs, and their reader software can go pull in all the latest posts automatically. Neat!
I learned embarrassingly recently that this is how podcasts work! That's why podcasters always say "find our podcasts wherever you get your podcasts", rather than pointing to you a specific silo'd platform. An underappreciated Open Web W.
RSS is not all sunshine and roses though. RSS is pleasingly simple, but at a cost. It's pull-based, rather than push-based, so readers have to go scrape every blog every time they want to check for updates. It's ephemeral, so if a blog goes offline, it's dead, and there's no concept of history. To RSS, it's either listed right now or it doesn't exist.
The solution to ephemerality and general awkwardness is to listen to every RSS feed ever since the beginning of time and cache everything, so you have a consistent index you can search through etc. This is Google Reader and you can find more information about it at killedbygoogle.com.
Enter atproto, stage left
Here's the part where I try and explain atproto for the millionth time without boring everyone to death. Bluesky designed atproto to solve the challenge of decentralised social at scale, and due to it's heritage the microblogging modality takes centre stage. However, an interesting perspective of how this works is that of RSS Done Right.
If you were redesigning RSS from scratch, what would you want to see? I think some nice features would be:
- Being able to see every post ever made, so you can search them
- Push-based rather than pull-based, so you can just listen to updates rather than having to re-scrape everything constantly.
- Every host having a consistent set of APIs so you can paginate etc.
- Self-verifying data, so you can trust data you get from untrusted sources.
And well, that's atproto.
- The network is backfillable. For every user, you can get an archive file of their data, so you didn't need to be listening since the start of time to get everything
- Hosts push data out via websockets
- Hosts have nice CRUD APIs you can paginate etc
- All data is signed, so rather than having to subscribe to websockets from every single host on the internet, you can just subscribe to a relay service which does that for you and you can verify that they're not tampering with the data
I think what I'm getting at is that anyone can start a new Google Reader at any time, and they'll still be able to catch up on all the old data as if they were there from the start. Pretty cool stuff.
Wait, weren’t we talking about blogs?
Another nice feature of atproto is that data is both strictly typed by it's schema language, and also anyone can publish their own schemas. So while Bluesky specifies the shape of Bluesky-shaped data, anyone else is free to make new kinds of shapes for their own modalities.
Now, if RSS is a good way to get distribution for your blog posts, and atproto is better RSS, then shouldn't we put blog posts on atproto?
It wasn't long before people started to build blogging platforms on atproto. However, when each app wrote their own schemas, we suddenly found that a protocol that promised to break down the walls between silos was inadvertantly causing them instead! Rather than every blog being interoperable, every blog was locked into the schema format defined by their provider. Not great, and kinda defeating the whole point!
What is standard.site?
A trio blogging services (Leaflet, pckt and Offprint) got together to solve this. The pitch is:
What if we all share one generic "blog" schema, and then each build on top of it?
And thus standard.site was born.
What's key to understanding it is that it is as non-presciptive as possible – each service is different, all with various features and mutual incompatibilities. Rather than strictly defining the platonic ideal blog post format with every feature accounted for (a fool's errand), it is more like a metadata wrapper into which each platform puts their own content.
The atproto schema language uses "open unions" a lot – basically a hole where you can put any other kind of content
So each service publishes a site.standard.document record with a standardized title, description and tags. But into the unspecced content field they're free to use whichever content representation they'd like. Leaflet puts in a pub.leaflet.content record, which is this modular block system. The others specify their own formats. This allows all three services to innovate in their own lanes while sharing a common record format.
And because you don't need permission to do any of this, you can write your own records with your own content format! This blog here uses markpub.at which is a lightweight wrapper around a big block of markdown text. I don't use any of these providers, instead my blog just fetches the records from my PDS and renders the markdown.
Enter standard.horse, stage right
Once I finished building my blog, having had a great time picking out fonts etc, I immediately found the drawback of not using one of these blog providers: I don't have a nice editing interface to write posts with. I have to manually curl JSON records, which is known in the industry as "sub-par UX".
So I made an editing interface for my blog. You log in, it fetches your standard.site publications and documents, and you can edit them or publish new ones. Since it's my blog, I just pointed an off-the-shelf markdown editor at the markpub content. This is standard.horse! Fun stuff, I had a blast making it.
Atprotomaxxing
When I announced it, it immediately became apparent that I was one of the only sickos out there hand-rolling their standard.site documents using markpub. Most people would log in, see that they can't edit their leaflets because it doesn't understand their richtext format, and close it, which is fair enough.
And thus I thought to myself, at the end of the day all these richtext formats from these other providers, they're all just blogs at the end of the day – can we just translate them to Markdown, edit them using my Markdown editor, then translate them back?
Turns out the answer is yes – several claude tokens later and we had translation later between each of the provider's formats to Markdown and back, with a test suite ensuring that documents survive the round trip intact. So now you can edit and publish documents for any of the Big Three formats all within standard.horse!
The beautiful part is that I am not touching any of these services directly at all. I'm just writing to my own PDS in a schema that they'll understand, and they'll ingest it like any other post. This is permissionless interoperability and it's pretty damn cool.
That's all for today. Please let me know how you find standard.horse if you try it, and let me know if there's any more richtext schemas I should support!
Also, this post is scarily long, I should add auto-saving drafts…
Until next time!