Building a distributed Twitter

With the goings-on at Twitter HQ, Brent Simmons started thinking about a distributed Twitter. Now considering he's the author of NetNewsWire, a great RSS reader, I'm sure this has crossed his mind as well, but I'd like to lay out a possible distributed Twitter redesign based on RSS before you:

RSS is ideal. It's XML, so it's extensible. It is widely supported. There are libraries for reading it for pretty much every programming language. And it was intended to be polled for new, current information. It also deals in items, which can be what each Tweet will be. And finally, at its simplest form, an RSS feed is just a text file on a server, so implementations can be very simple, and can happen on CDNs and other "stupid" web servers, if needed. I'll first go into the technical infrastructure, and then I'll illustrate how this would actually look to the end-user.
illustration of a few accounts and a directory

How do we store our tweet database in a distributed fashion?

Every user would have an RSS feed on a web server somewhere. This feed contains all the tweets they posted. The URL of this feed is that user's "user ID", the globally-unique name that identifies that user, and what other users need to subscribe to their feed. This RSS feed would contain one tweet per feed item (in the description). @mentions would be encoded as links, with a special attribute indicating this is an account name. If you reply to a tweet, the RSS item would contain an additional in-reply-to attribute in this link that holds the GUID of the feed item for the referenced tweet.

So, for example:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title>RSS Example</title>
<description>This is an example of an RSS feed</description>
<link>http://www.domain.com/link.htm</link>
<lastBuildDate>Mon, 28 Aug 2006 11:12:55 -0400 </lastBuildDate>
<pubDate>Tue, 29 Aug 2006 09:00:00 -0400</pubDate>

<item>
<title></title>
<description>&lt;a href="http://www.example.com/ulistweets.rss" feedaccount="yes" inreplytoitem="17576"&gt;@uli&lt;/a&gt; Remember to bring along that Dr. Who boxset</description>
<link>http://www.domain.com/link.htm</link>
<guid isPermaLink="false"> 1102345</guid>
<pubDate>Tue, 29 Aug 2006 09:00:00 -0400</pubDate>
</item>

</channel>
</rss>

In this example, the GUID is a well-formed number, but of course, like with every RSS feed, it could be an arbitrary string (or even the URL of the blog post this item was generated from, if someone uses a blogging engine to generate their tweet RSS feed).

How do I 'follow' someone?

At the most basic form, following someone in this distributed Twitter system simply means subscribing to their RSS feed. You need their URL, add it to your feed reader and you see what tweets they send. But of course the typical user will have a specialized "distributed Twitter" interface for this: A script on a server somewhere that provides a Twitter-like web interface. It will keep a list of the URLs of all the people you follow, will let you add new ones, and will show you your "personal timeline", i.e. an aggregated list of the tweets of everyone you follow and your own. Even better, it can even vend this personal timeline as RSS again, so you can use any RSS reader to view your personal timeline.

And of course it will have a little form where you can type in your message, and it will add that message to your personal RSS feed.

Writing tweets

For the user, writing a tweet would pretty much work like before. But behind the scenes, we would need a bit more smarts: We'd need some way to turn a short name like uli into the actual URL, and we need to encode this in the RSS feed somehow. Dave Winer had a great idea here: Why not just use DNS? If I write @firstname.lastname.com, it would automatically know to go to the server firstname.lastname.com and look there for an RSS file, say, microblog.rss. It would then encode the target user's name into the RSS item's description as <a href="http://firstname.lastname.com/microblog.rss" feedaccount="yes">@firstname.lastname.com</a>. That way, every search engine and RSS reader sees it as a link, to users it looks almost like before, and it uniquely identifies that user across the entire internet. The feedaccount attribute helps dedicated 'distributed Twitter' clients recognize these links as account/tweet links (different from other links in a post).

Even better: We could integrate various existing services by providing subdomains on their servers. So my Twitter account would be uliwitness.twitter.com, and my App.net account would be uli.app.net. And the scripts on those servers could even know about their home and let me use a shorter version of the name to reference people on the same server (i.e. @uliwitness or @uli). Since the RSS contains a full link, people on other servers reading these tweets will know which @uli is meant, and their clients could even rewrite the linked text so the user always sees the same name for the same account.

So, you see, even with an RSS-based, distributed Twitter, you can still follow people, reference them quickly and easily, and view your aggregated timeline. Moreover, since there is a script generating your timeline from the other users' full post histories, you will even keep the ability to filter your timeline. Be it to ignore tweets by someone, even when they are retweeted by someone you're following, or to filter out replies to messages from users you're not following, or whatever else you can think of.

What about retweets, DMs and protected accounts?

Retweets work essentially the same way as replies. The Retweet item would contain a copy of the message, and the user name following the "RT" prefix would be a link pointing back at the tweet that is being retweeted. That way you don't have to hit each account whose message has been retweeted (because you already have a copy), but you can. And again, the user just chooses "Retweet" in their script's web interface, and all the magic will happen behind the scenes.

Protected accounts are easy as well: You don't want those messages just lying around, readable for everyone who happens to figure out the URL, so we use HTTP AUTH to prevent access. Whenever someone wants to follow our protected account, we have to approve them anyway. So we just have the script generate a username/password for them and that gets sent back as a reply in a standardized form (e.g. as a DM).

DMs are essentially like protected accounts. A DM can be implemented as a special, additional RSS feed to which the sender posts direct messages to you. Since you have to follow someone for them to be able to DM you, whenever you check if they have new messages, you can check if they have left new DMs for you. The only problem here is how to generate and distribute the username and password. We don't want to have to generate and keep on file a username/password pair for any person that follows me to have, just in case I want to DM them one distant day. Moreover, so far following happened completely without me having to do anything, it was just someone else occasionally hitting my RSS feed. Distributing passwords means I need a script on the server and a database all of a sudden, not just a text file or two.

Asynchronous cryptography to the rescue! To implement DMs, what we can do is publish a public key with every RSS feed. When I want to DM someone, I grab their public key, and encrypt their message using their public key. The only way to read this message is to use the private key, which only the destination account's owner has. Of course, the little script that I use to follow someone, show me my timeline etc. will take care of all that transparently behind the scenes, so that, again, I just click "DM" to send a direct message.

But, wouldn't using RSS unduly burden popular tweeters?

They would have to make sure their hoster can cope with their bandwidth demands, yes. But that's how the web already works. You register a domain. Either you have enough bandwidth, or you need to spend money on bandwidth and load balancing. There would be inexpensive hosters for small users (like WordPress.com or Tumblr), and there would be big hosters for big companies with lots of traffic (like Akamai or S3 or Slicehost or whatever...).

Beyond that, I've intentionally chosen RSS as the lowest common denominator. There is nothing keeping you from implementing HTTP 304 "Not Modified" status codes or PuSH (as someone in the comments suggests) or similar standards to allow people to get change notifications without having to download the entire feed and maybe even to be able to just get the changed feed items. Similarly, you can keep a list of all feeds your users subscribe to at the moment and periodically pull copies of all messages into your database, for faster search and presentation of the personal timeline. It would almost be like NNTP, where messages get distributed and cached locally across a network of servers. Just that your script and your users decide which accounts (with their tweets, DMs etc.) you look at. But even if you don't do that, every client should be able to deal with a simple, stupid RSS feed at the least. It's well-supported, well-known and well-understood.

So. Distributed Twitter.

Yeah. It's definitely a possibility. Most features will map across fairly easily, and if we manage to set up an independent directory and a search engine, the remaining features would be very possible as well. For those who care, here's a rough implementation of parts of this design that I knocked out in PHP a while ago: Chirp distributed Twitter server