~icefox/scalar

A small chat protocol, inspired by Gemini
2 months ago
~icefox/scalar

New ticket tracker added

2 months ago

#Scalar

Scalar is an experiment CBOR-based minimalist chat protocol, intended to be to IRC/XMPP/Matrix something like what Gemini is to HTTP. Basically, take a complicated system (XMPP/Matrix) and make it simple, and/or take a old and simple system (IRC) and make it up to date with a more modern design.

This is basically an attempt to see what something like this could look like. This README file is as much specification as exists, and there's a minimal, vaguely-incomplete-but-working server and client written in Rust. I don't actually know anything about production-grade chat protocols or network servers, so don't expect that, and my goal was to have something that Worked within a weekend or so.

I don't really intend to keep this maintained long term, as there are other things I have to do with my time that are more important to me. But hopefully this is universal/interesting/useful enough to serve as a place to start if someone else wants to do something similar.

#Goals

  • Make it possible to implement a full-ish client in a weekend, using a decent language and a few protocol libraries.
  • Have useful chat rooms for up to a few hundred users without it being gratuitously inefficient.
  • Make abuse hard -- should be easy for operators/users to prevent spam, malicious bots, etc.

#Anti-goals

  • Gemini's level of minimalism is pretty extreme IMO. I like the dream but I'm not sure I can be quite that hardcore.
  • Persistence. The servers may choose to store messages and allow a backlog to be searched or whatever, like Discord/Matrix/etc, or they may be as transient as IRC and provide basically no history. I think both are useful approaches and it shouldn't be hard to make it possible to do either.
  • Large-scale efficiency. I'm not thinking about millions of users or anything like that.
  • Transmitting large data (arbitrary files), or low-latency data (voice, video).
  • Encryption or forward secrecy should be handled by the transport protocol, not the message protocol. IE, use SSL or QUIC. OR, it should be bundled into messages themselves, a la OTR.
  • Text based. Text is actually NOT easy to parse or handle. Sorry.
  • HTTP/JSON API. It's complicated and not actually great for pubsub.
  • This is not a full federated your-server-talks-to-every-other-server type thing. That doesn't seem actually very useful for transient chat. Your client can generally get a direct connection to any server on the Internet, that's what the Internet is for. Modern routing, CDN tech and so on makes it unnecessary. A "home server" type arrangement doesn't do much that you can't do with just, you know, a client.

#Design, or, how we achieve those goals

Travels over TCP. QUIC would be nice someday maybe.

The protocol consists of messages, each with a fairly fixed layout and contents and a specific set of valid responses. Each message is prefixed with a two-byte little-endian length, and has a maximum size of 16 KB - 2 bytes. 16 KB is a fine size for text, generally, and also means that if we someday want to make it variable-sized we can extend it to LEB128. Fixed size small buffers are a nice design constraint that makes other things simpler and more robust.

Messages are encoded with CBOR. Specifics of message format are currently Undefined, but CBOR is pretty compact and pretty simple (ish) and pretty flexible. One can argue about the details but I don't see any huge downsides, just tradeoffs. One of the annoying things about CBOR is the variable-length encoding means it's hard to know what the heck you're doing with this until you've actually encoded your message. But in practice it's pretty workable. Good enough for experimentation, certainly.

The state machine for communication should hopefully never be more than one request+reply long. There should be something like two pieces of state per connection: what the connection's user ID is, and what topics its subscribed to. Similarly, each request should only ever have one valid kind of reply.

The server stores basically nothing about the client when the client is not connected. It's up to the client to remember what topics its interested in, etc.

Unimplemented: Clients have an "ID server" that serve as a point of authority for their authentication, and optionally provide other useful features for the client such as storing metadata or forwarding messages while the client is offline. The necessary part is basically, when a user connects to example.com and says "I am icefox@example2.com, here's an authentication token", example.com connects to example2.com and says "here's the auth token given to me by icefox@example2.com, is it legit?" and example2.com says yea or nay. That way example.com doesn't need to know anything at all about icefox@example2.com, it just needs to know whether example2.com is vaguely trustworthy. How to achieve that is still TBD, but a fediverse/web-of-trust-like model may be doable.

#Current state

The server and client work, to a very minimal degree. Basically a proof of concept with zero features or convenience. There's a lot of things that are not actually done yet, either in the spec or the implementation. These are:

  • Authentication. So far it's just a placeholder for a client saying "this is my username", no checking is done.
  • Message ordering or catch-up. I want this to be done by hashes on messages, but that is not yet done.
  • Error handling needs formalization
  • Wire protocol is currently "whatever cbor_serde decides to give us". Needs to improve but is Good Enough to experiment with for now.
  • Protocol for topic metadata, topic listing etc. are all currently "this would be nice someday"
  • Presence notifications are currently Undefined. Conceptually it's just subscribing to topics, where the topic is "what is the state of this user ID", but a bit of formalization around that would be nice.
  • Topics are currently only chat rooms and the format of everything is kinda loosey-goosey.
  • The protocol lib needs tightening up.
  • The code in general needs robust-ifying.
  • Anything marked TODO needs to be done.

#Actual protocol

#Model

Basically a pubsub model. You have clients and servers. Servers expose topics. Clients connect to a server and ask to subscribe to topics, or publish messages to topics. Servers broadcast messages to clients that are subscribed to the relevant topics.

A topic can be a chat room channel, a client ID, system metadata, or other such things. Basically it's just a location that messages can go to, or can be forwarded from.

#Data types

ID: A unique user. Basically an email address style username@example.com

Channel: #channelname@example.com. Can we omit the @example.com if it's redundant? Maybe we shouldn't. What if you ask baz.com to subscribe you to #channelname@example.com? It can't, obviously. So is the domain ever not redundant?

Reserved ID: !name@example.com. Can never be used for a username or channel, and so a useful ID for system-generated messages, topics, etc.

Topic: Channel, ID, or other yet-to-be-defined things that label specific types of metadata associated with an ID or channel. Consider URL's? I... dunno about that. Though you may have metadata such as presence, direct messages etc. associated with an ID or channel? No, KISS.

Datetime: Datetime's are ALL, ALWAYS in UTC time zone, and it's up to the client to translate it into a local time zone. The use of this in message types is discouraged as it is very subjective, but is sometimes necessary or useful.

Text: All text is UTF-8 encoded. Invalid UTF-8 is an error.

Blob?: Now that I think of it, not sure there's anything terribly wrong with sticking small binary data in a message or such? CBOR can distinguish the two, though it's a bit of a pain. Consider it.

TODO: Topics (including ID's) may have metadata attached of some kind. That way topics and ID's can always be easy-to-parse-and-handle ASCII, but users of non-Latin languages can put Unicode in the metadata section and clients can display that as a presentation name instead of a machine-readable ID.

TODO: Topics/channels/ID's need a fixed maximum length, to make framing messages easier.

#Messages

TODO: I'm not sure whether presence notifications are an actual goal, but they're actually really easy with a pubsub model, so they'll probably end up existing sooner or later.

Message types are as follows:

#Message

The general purpose "user saying something" message, sent from an ID to a topic and presumably re-broadcast by the server to other clients subscribed to that topic.

TODO: A server may refuse a message. For example, most people probably shouldn't be able to send messages to channels they're not subscribed to. But admins should be able to.

  • From: ID
  • To: Topic
  • Payload: Text. (Blob?)

Optional fields to think about:

  • Time: Optional datetime? Purely advisory, since different servers/clients may disagree about what time it is or what time a message is sent.
  • In-response-to: A SHA256 hash of a previous message. Can be used to create an unambiguous ordering of messages, easing things like detection and resumption of network hiccups, or making tree-style conversations possible. How it's interpreted or presented is entirely up to the client, though I expect some defacto standards to exist.

TODO: Optional content-type for the data?

#Topics

Server replies with all the topics it knows about.

TODO: This needs to be defined better. The client needs to be able to ask the server how many topics it has, then ask for topics 0..n, n+1..m, m+1.. etc so that the server's replies always fit into the message size limit. This also becomes much simpler if topics have a fixed max length.

TODO: There MAY be a filter parameter of some kind. Simplest is simple text. Or, can this just be a meta-topic? I dunno.

#Sub

Subscribes to a topic

  • Topic: Channel, user presence, etc

#Unsub

  • Same as Sub, can also have an "all" topic which is basically "close connection"(?).

#Ping/pong

Used for latency/timeout testing. Either server or client may send a ping message, and the other must respond with a pong containing the same datetime as the original ping.

  • Time: A datetime in UTC

#Event

Something has happened on a topic, and a user is being notified about it. This may be a message, a presence change, etc.

  • From: ID
  • To: Topic
  • Payload: Text (Blob?)

#Catchup

A client asking for "events that happened on a topic that I've missed". The client can ask for actual timestamps, or they can give hashes, or they can ask for last N events.

A server does not have to support this. It is perfectly legit for the server to just say "I got nothin, sorry", though it MUST distinguish "I got nothing" from "there is nothing to get".

A more complicated system of negotiation of what the server actually has vs. what the client wants is possible, but out of scope.

#Ident

A client sends this to identify themselves to a server.

A server also uses this to authenticate themself with a client???

There's some kind of authentication dance here that then happens, details are TODO. Basically a server has to present a challenge that the the client responds to, and the server for the particular ID's domain then verifies. However the fuck it works, I'm sleepy.

#State machine

"C:Foo" means "client sends Foo message", "S:Bar" same with "server sends Bar".

  • CONNECT -> C:Ident -> S:Ident -> anything else
  • anything -> C:Bye -> DISCONNECT
  • anything -> S:Bye -> DISCONNECT
  • ERROR -> Bye (optional, may contain an error message) -> DISCONNECT

Most interactions are client-initiated:

  • C:Sub -> S:Meta
  • C:Unsub -> Nothing?
  • C:Message -> S:Event (echo to client, so the client knows it works)
  • C:Topics -> ???
  • C:Catchup -> ??? Nothing in particular? Just a bunch of events?
  • C:Ping -> S:Pong

Server can only really initiate a few things:

  • S:Ping -> C:Pong
  • S:Event -> nothing

We don't really have an "ok" message, now do we. I'm not sure we need it, the protocol itself should be pretty stateless

Things to consider:

  • C:SetMeta -> S:Meta
  • C:Topics -> Server needs to send multiple lists, and needs some kind of bound. So, client needs to get a count, then ask for a limited range of topics and get a chunk

#Meta-information

The protocol is only one thing, the fluff and infrastructure around it is also important.

Domains will use DNS SRV records to identify servers. The appropriate SRV record for example.com will be _scalar._tcp.example.com. TCP might evetually be QUIC as well, need to work on that and see how things shake out.

There is no canonical port. Ports are just defined in SRV records.

TODO: Currently all that is unimplemented :3

#Meta-topics

Things that aren't channels or user id's, but still absorb or provide information. Specified by a reserved ID. Often optional?

TODO: Note that TOPICS broadcast messages, while often what we are interested in with these is querying state. For example, "who are the ID's listening to this topic?" Think about this more.

#S2S thoughts

Need a S2S protocol too I guess. Proooooobably the same as the C2S one as much as possible. Clients can send messages directly to each other or ask servers to relay for them, basically. Servers can then relay to one client or to a set of clients.

Maybe we don't really need an S2S protocol since any client can generally talk to any server? Or do we? I dunno, if alice@foo.com subscribes to presence notifications for bob@bar.com then foo.com really doesn't need to talk to bar.com, alice's client just calls up bar.com and says "hey let me know when bob@bar.com is about". All the heavy lifting is done by alice's client, which seems fine. The only thing foo.com needs to be able to do is confirm alice's identity, maybe. Mayyyyyyybe it would be useful for a server can serve as a proxy or reconnector sometimes, if necessary? Log messages, if bar.com decides it doesn't want to log much stuff but alice@foo.com is interested in it?

In mastodon, a user's home server aggregates and indexes events for you. There, events are global and persistent. Here I'm thinking you don't need it to bother, because you explicitly list the topics you're interested in.

...I'm an idiot. If a user on a.example.com publishes to a topic and a user on b.example.com is subscribed to it, messages need to get from one server to the other somehow. However, for now, defining it to be single server only is okay.

Another case to think about is direct messaging. if foo@example1.com wants to send a message to bar@example2.com, but they are not logged on to any of the same servers, then it is the job of the example1.com server to tell the example2.com server to forward the message. Currently this is out of scope but it should happen someday.

#Notes

Consider both QUIC and TCP encodings.

Ok, good idea, now make it actually minimal. The pubsub model is pretty interesting I gotta admit.

A user is an identity. A connection is an instance of that identity logged in from somewhere. Multiple connections can coexist. Pubs/subs are done by connection, not by user; basically, the client remembers your follow list, not the server. A server may store some configuration information for a client, if they both want it to, but that's kinda out of scope here.

A server is in charge of authenticating a user's identity. If alice@foo.com wants to join #topic@bar.com then bar.com calls up foo.com with some challenge or token that alice@foo.com's client can provide, and foo.com says whether or not it's legit.

Need to look more at IRC, XMPP, Matrix. Then pare it down way too far, because that will probably be just about far enough. SIMPLE, based on SIP, might also be worth looking at.

Transferring large binary data is kinda out of scope. It can be transferred in chunks, if people want. Easier method is to just put it somewhere shared, whether that's a web site or IPFS or whatever, and transmit an identifier. Streaming data, such as audio and video, is drastically out of scope. Maybe make it able to include a content-type for its data, though.

A client does need a way to ask a server for a list of events that happened since some time. That's the main thing it does, catch up its internal state to match the state of servers.

The federation problem can be expressed one way as: If a server becomes too big you need multiple servers grouped together logically. How do those servers share their state with each other? The answer is... I'm not sure it matters. You can have SRV records for multiple servers in whatever domain, or another load balancing mechanism. In IRC this is a problem because identity is unique to each network, but in this system identity is global and attached to a server, like Mastodon. But "federation" doesn't really need to happen, all communication is point to point, and frankly transient; any permanence is up to the user doing this.

So then there's two arguments for a server-to-server protocol: One, efficiency, because N users on two different servers can talk to each other more efficiently if the servers act as relays. I am not actually sure if that matters, so for now I should just leave it out. The second argument would be robustness, on the assumption that if we have alice@foo.com and bob@bar.com, foo.com and bar.com are more reliable than alice's and bob's individual clients. alice may trust foo.com to have better uptime than bar.com. A client talking to one server which is aggregating info for it may have better performance characteristics than a client talking to many servers.

Fuck it, let's just try it out.

Now that I think of it, a server really does two main things for a user that is on that server: Provide proof of authentication, and relay messages going to that user. There's some other things that the server MAY do for a user: store state/metadata, and maybe someday relay stuff if there's an S2S protocol.

#Things not to do YET.

#Meta-topics

  • !wall@domain: Broadcasts a message to all users and channels.
  • !motd@domain: Responds to a query/trigger of some kind with server-specific information set by the server software. May not be a topic, since it's not really subscribed to or published to, but rather a specific message type?

TODO: These are useful but also kinda redundant.

#S2S protcol

Right now I'm just saying that this is single-server-per-domain only.

#Prior art

Other chat protocols:

Inspiration:

  • Gemini: https://gemini.circumlunar.space/
  • oldschool Telnet-only talkers
  • IRC, except IRC's wire protocol is pretty awful and there's a lot of dysfunctional hairy nonsense about it.

#License

This document is in the public domain.

All source code in the rrg/ directory is licensed under GPL v2, just for the lulz. See the file rrg/LICENSE for details.