Scalar is an experiment CBOR-based minimalist chat protocol, intended to be to IRC/XMPP/Matrix something like what Gemini is to HTTP. Basically, take a complicated system (XMPP/Matrix) and make it simple, and/or take a old and simple system (IRC) and make it up to date with a more modern design.
This is basically an attempt to see what something like this could look like. This README file is as much specification as exists, and there's a minimal, vaguely-incomplete-but-working server and client written in Rust. I don't actually know anything about production-grade chat protocols or network servers, so don't expect that, and my goal was to have something that Worked within a weekend or so.
I don't really intend to keep this maintained long term, as there are other things I have to do with my time that are more important to me. But hopefully this is universal/interesting/useful enough to serve as a place to start if someone else wants to do something similar.
Travels over TCP. QUIC would be nice someday maybe.
The protocol consists of messages, each with a fairly fixed layout and contents and a specific set of valid responses. Each message is prefixed with a two-byte little-endian length, and has a maximum size of 16 KB - 2 bytes. 16 KB is a fine size for text, generally, and also means that if we someday want to make it variable-sized we can extend it to LEB128. Fixed size small buffers are a nice design constraint that makes other things simpler and more robust.
Messages are encoded with CBOR. Specifics of message format are currently Undefined, but CBOR is pretty compact and pretty simple (ish) and pretty flexible. One can argue about the details but I don't see any huge downsides, just tradeoffs. One of the annoying things about CBOR is the variable-length encoding means it's hard to know what the heck you're doing with this until you've actually encoded your message. But in practice it's pretty workable. Good enough for experimentation, certainly.
The state machine for communication should hopefully never be more than one request+reply long. There should be something like two pieces of state per connection: what the connection's user ID is, and what topics its subscribed to. Similarly, each request should only ever have one valid kind of reply.
The server stores basically nothing about the client when the client is not connected. It's up to the client to remember what topics its interested in, etc.
Unimplemented: Clients have an "ID server" that serve as a point of
authority for their authentication, and optionally provide other useful
features for the client such as storing metadata or forwarding messages
while the client is offline. The necessary part is basically, when a
user connects to
example.com and says "I am
here's an authentication token",
example.com connects to
example2.com and says "here's the auth token given to me by
email@example.com, is it legit?" and
example2.com says yea or nay.
example.com doesn't need to know anything at all about
firstname.lastname@example.org, it just needs to know whether
vaguely trustworthy. How to achieve that is still TBD, but a
fediverse/web-of-trust-like model may be doable.
The server and client work, to a very minimal degree. Basically a proof of concept with zero features or convenience. There's a lot of things that are not actually done yet, either in the spec or the implementation. These are:
cbor_serdedecides to give us". Needs to improve but is Good Enough to experiment with for now.
Basically a pubsub model. You have clients and servers. Servers expose topics. Clients connect to a server and ask to subscribe to topics, or publish messages to topics. Servers broadcast messages to clients that are subscribed to the relevant topics.
A topic can be a chat room channel, a client ID, system metadata, or other such things. Basically it's just a location that messages can go to, or can be forwarded from.
ID: A unique user. Basically an email address style
#email@example.com. Can we omit the
it's redundant? Maybe we shouldn't. What if you ask
subscribe you to
#firstname.lastname@example.org? It can't, obviously. So
is the domain ever not redundant?
!email@example.com. Can never be used for a username or
channel, and so a useful ID for system-generated messages, topics, etc.
Topic: Channel, ID, or other yet-to-be-defined things that label specific types of metadata associated with an ID or channel. Consider URL's? I... dunno about that. Though you may have metadata such as presence, direct messages etc. associated with an ID or channel? No, KISS.
Datetime: Datetime's are ALL, ALWAYS in UTC time zone, and it's up to the client to translate it into a local time zone. The use of this in message types is discouraged as it is very subjective, but is sometimes necessary or useful.
Text: All text is UTF-8 encoded. Invalid UTF-8 is an error.
Blob?: Now that I think of it, not sure there's anything terribly wrong with sticking small binary data in a message or such? CBOR can distinguish the two, though it's a bit of a pain. Consider it.
TODO: Topics (including ID's) may have metadata attached of some kind. That way topics and ID's can always be easy-to-parse-and-handle ASCII, but users of non-Latin languages can put Unicode in the metadata section and clients can display that as a presentation name instead of a machine-readable ID.
TODO: Topics/channels/ID's need a fixed maximum length, to make framing messages easier.
TODO: I'm not sure whether presence notifications are an actual goal, but they're actually really easy with a pubsub model, so they'll probably end up existing sooner or later.
Message types are as follows:
The general purpose "user saying something" message, sent from an ID to a topic and presumably re-broadcast by the server to other clients subscribed to that topic.
TODO: A server may refuse a message. For example, most people probably shouldn't be able to send messages to channels they're not subscribed to. But admins should be able to.
Optional fields to think about:
TODO: Optional content-type for the data?
Server replies with all the topics it knows about.
TODO: This needs to be defined better. The client needs to be able to ask the server how many topics it has, then ask for topics 0..n, n+1..m, m+1.. etc so that the server's replies always fit into the message size limit. This also becomes much simpler if topics have a fixed max length.
TODO: There MAY be a filter parameter of some kind. Simplest is simple text. Or, can this just be a meta-topic? I dunno.
Subscribes to a topic
Used for latency/timeout testing. Either server or client may send a ping message, and the other must respond with a pong containing the same datetime as the original ping.
Something has happened on a topic, and a user is being notified about it. This may be a message, a presence change, etc.
A client asking for "events that happened on a topic that I've missed". The client can ask for actual timestamps, or they can give hashes, or they can ask for last N events.
A server does not have to support this. It is perfectly legit for the server to just say "I got nothin, sorry", though it MUST distinguish "I got nothing" from "there is nothing to get".
A more complicated system of negotiation of what the server actually has vs. what the client wants is possible, but out of scope.
A client sends this to identify themselves to a server.
A server also uses this to authenticate themself with a client???
There's some kind of authentication dance here that then happens, details are TODO. Basically a server has to present a challenge that the the client responds to, and the server for the particular ID's domain then verifies. However the fuck it works, I'm sleepy.
"C:Foo" means "client sends Foo message", "S:Bar" same with "server sends Bar".
Most interactions are client-initiated:
Server can only really initiate a few things:
We don't really have an "ok" message, now do we. I'm not sure we need it, the protocol itself should be pretty stateless
Things to consider:
The protocol is only one thing, the fluff and infrastructure around it is also important.
Domains will use DNS SRV records to identify servers. The appropriate
SRV record for
example.com will be
might evetually be QUIC as well, need to work on that and see how things
There is no canonical port. Ports are just defined in SRV records.
TODO: Currently all that is unimplemented :3
Things that aren't channels or user id's, but still absorb or provide information. Specified by a reserved ID. Often optional?
TODO: Note that TOPICS broadcast messages, while often what we are interested in with these is querying state. For example, "who are the ID's listening to this topic?" Think about this more.
Need a S2S protocol too I guess. Proooooobably the same as the C2S one as much as possible. Clients can send messages directly to each other or ask servers to relay for them, basically. Servers can then relay to one client or to a set of clients.
Maybe we don't really need an S2S protocol since any client can generally talk to any server? Or do we? I dunno, if firstname.lastname@example.org subscribes to presence notifications for email@example.com then foo.com really doesn't need to talk to bar.com, alice's client just calls up bar.com and says "hey let me know when firstname.lastname@example.org is about". All the heavy lifting is done by alice's client, which seems fine. The only thing foo.com needs to be able to do is confirm alice's identity, maybe. Mayyyyyyybe it would be useful for a server can serve as a proxy or reconnector sometimes, if necessary? Log messages, if bar.com decides it doesn't want to log much stuff but email@example.com is interested in it?
In mastodon, a user's home server aggregates and indexes events for you. There, events are global and persistent. Here I'm thinking you don't need it to bother, because you explicitly list the topics you're interested in.
...I'm an idiot. If a user on
a.example.com publishes to a topic and
a user on
b.example.com is subscribed to it, messages need to get from
one server to the other somehow. However, for now, defining it to be
single server only is okay.
Another case to think about is direct messaging. if
wants to send a message to
firstname.lastname@example.org, but they are not logged
on to any of the same servers, then it is the job of the
server to tell the
example2.com server to forward the message.
Currently this is out of scope but it should happen someday.
Consider both QUIC and TCP encodings.
Ok, good idea, now make it actually minimal. The pubsub model is pretty interesting I gotta admit.
A user is an identity. A connection is an instance of that identity logged in from somewhere. Multiple connections can coexist. Pubs/subs are done by connection, not by user; basically, the client remembers your follow list, not the server. A server may store some configuration information for a client, if they both want it to, but that's kinda out of scope here.
A server is in charge of authenticating a user's identity. If
email@example.com wants to join
bar.com calls up
foo.com with some challenge or token that
firstname.lastname@example.org's client can
foo.com says whether or not it's legit.
Need to look more at IRC, XMPP, Matrix. Then pare it down way too far, because that will probably be just about far enough. SIMPLE, based on SIP, might also be worth looking at.
Transferring large binary data is kinda out of scope. It can be transferred in chunks, if people want. Easier method is to just put it somewhere shared, whether that's a web site or IPFS or whatever, and transmit an identifier. Streaming data, such as audio and video, is drastically out of scope. Maybe make it able to include a content-type for its data, though.
A client does need a way to ask a server for a list of events that happened since some time. That's the main thing it does, catch up its internal state to match the state of servers.
The federation problem can be expressed one way as: If a server becomes too big you need multiple servers grouped together logically. How do those servers share their state with each other? The answer is... I'm not sure it matters. You can have SRV records for multiple servers in whatever domain, or another load balancing mechanism. In IRC this is a problem because identity is unique to each network, but in this system identity is global and attached to a server, like Mastodon. But "federation" doesn't really need to happen, all communication is point to point, and frankly transient; any permanence is up to the user doing this.
So then there's two arguments for a server-to-server protocol: One, efficiency, because N users on two different servers can talk to each other more efficiently if the servers act as relays. I am not actually sure if that matters, so for now I should just leave it out. The second argument would be robustness, on the assumption that if we have email@example.com and firstname.lastname@example.org, foo.com and bar.com are more reliable than alice's and bob's individual clients. alice may trust foo.com to have better uptime than bar.com. A client talking to one server which is aggregating info for it may have better performance characteristics than a client talking to many servers.
Fuck it, let's just try it out.
Now that I think of it, a server really does two main things for a user that is on that server: Provide proof of authentication, and relay messages going to that user. There's some other things that the server MAY do for a user: store state/metadata, and maybe someday relay stuff if there's an S2S protocol.
!wall@domain: Broadcasts a message to all users and channels.
!motd@domain: Responds to a query/trigger of some kind with server-specific information set by the server software. May not be a topic, since it's not really subscribed to or published to, but rather a specific message type?
TODO: These are useful but also kinda redundant.
Right now I'm just saying that this is single-server-per-domain only.
Other chat protocols:
This document is in the public domain.
All source code in the
rrg/ directory is licensed under GPL v2, just
for the lulz. See the file
rrg/LICENSE for details.