It is an orchestration system based on Redis that consists of zero or more agents and one base station. The base station can issue commands to agents to be executed in parallel using Redis pub-sub mechanism. The base station is started for each command and terminates once the output was collected.
The main purpose of this system is to trigger a reconfiguration on multiple servers using another configuration management system, but any command can be implemented using shell scripts or any programming language.
Redis is used as a communication hub where agents will subscribe for commands from the base station and reply with the output of the command back to a Redis list for the base station to consume and display. Redis has to be reachable by the base station and all agents, but there is no need for the base station or agents to be able to communicate directly with each other.
Commands that the base station may request to be executed are any executable (or script) located in the
run-directory. The standard output and error streams are collected and provided to the base station via Redis.
For security reasons the script will only get the identity of the agent as the first argument - no custom arguments can be passed from the base station.
The security model is designed with the requirement that there is no handshake done during runtime and all cryptographic keys are pre-exchanged out-of-band using a different system. Also, agents are treated as a single communication end-point for the commands, to avoid the need of having per-agent encryption or prior knowledge of the presence of the agents on the network. This allows the agents to freely join and leave the network.
Communication between the base station and agents is encrypted end-to-end and based on the libsodium crypto box - Redis only transports encrypted data.
There are two static key pairs used:
The public base station key is used to verify signatures on the commands and therefore authenticate and authorize its execution. The public network key is used for encryption of the commands and authentication of the agents. The authentication is based on the ability of the agents to decrypt the request.
The base station will generate a per-session (command request/agent response) ephemeral key pair that is used to derive shared secret used to encrypt the data for the agents with the use of the network public key (libsodium crypto box with one ephemeral key pair). The public part of this session key is used to encrypt messages back to the base station. To ensure that the responses come from agents that have access to the network private key, a random challenge is exchanged as part of the command and reply and verified by the base station.
Additionally to prevent replay and reorder attacks a sequence numbering scheme is used for command and response messages.
The integrity is provided by libsodium crypto box authenticated encryption (all messages) as well as the base station signature (commands only).
There are some limitations to this scheme:
Warning: Due to the atypical requirements the system uses a custom cryptographic design that has not been designed or audited by a professional cryptographer! The above-stated assumptions and guarantees may not hold in theory or may be implemented incorrectly!
cb-keygen to generate two pairs of keys and distribute these keys using another system.
cb command can be then used to set up the agents using
agent subcommand and to send commands using
The connection string for Redis can be specified with the
Each party should have a unique identifier string assigned with the
--identity option. Also, all parties must use option
--channel specifying the same string to be able to communicate.
Agents will run executables from their working directory unless another directory is set with the
Additionally, agents may be assigned zero or more tags with the
--tags option. One or more of these tags can be specified when issuing commands to limit which agents will execute it. All tags specified on the command will have to be present on the agent for it to execute the command. If no tags are specified for the command, all agents will execute it.
The base station will keep a track of the agents it has previously seen responses from (unless
--no-discovery switch is specified) and record the encounters in the
discovery.toml file located in the default application cache directory. Newly discovered agents or missing agents will be logged as part of the error output stream and reported at the end of the run. If the
--fail-missing flag is set, the process will exit with an error if any previously seen agents have not replied.
For more details use
--help switch on
cb-keygen commands and their subcommands.
There are three configurable wait duration settings. They control for how long (in seconds) the station will wait for responses from agents during different stages of processing:
-H, --hello-wait- duration in seconds to wait for the first reply (0 = infinity); default: 2 seconds,
-R, --reply-wait- duration in seconds to wait for the next reply when some agents are active (0 = infinity); default: 120 seconds,
-M, --minimum-wait- minimum duration in seconds to wait for replies after sending the request (0 = 1); default: 4 seconds.
The hello wait duration will expire if no agents reply. This can happen when there were no agents on the channel, no agents matched requested tags or agents were busy.
The reply wait duration is used to extend the waiting period after some agents did reply and are running a command. This wait duration will expire if no agents reported any command output or final status messages for its duration. This wait duration may need to be extended for commands that have long periods when no output is produced.
Minimum wait duration is used to give enough time for agents to send their first reply after all active agents have already completed their commands.
Commands will exit with status
1 on setup errors.
The base station will exit with different status codes depending on problems encountered during the runtime.
Runtime problems will add a value to the return code so that the list of problems can be extracted from it using bitwise operations:
+2- if the
--fail-missingswitch is used and there were missing agents detected,
+4- command execution-related errors were detected (non-zero exit status, abort),
+8- station reply wait duration expired (see
+16- agent related errors were detected (e.g. bad command).