~shulhan/karajo

HTTP workers and manager with web user interface

d29a611 all: implement UI to trigger hook manually

~shulhan pushed to ~shulhan/karajo git

a month ago

508bbb4 Release karajo v0.5.0 (2022-08-10)

~shulhan pushed to ~shulhan/karajo git

a month ago
= karajo
Shulhan <ms@kilabit.info>
14 June 2022
:toc:
:sectanchors:
:sectlinks:

Module karajo implement HTTP workers and manager, similar to cron but works
only on HTTP.

karajo has the web user interface (WUI) for monitoring the jobs that run on
URL http://127.0.0.1:31937/karajo by default and can be configurable.

A single instance of karajo is configured through code or a configuration file
using ini file format.

Features,

* Running job on specific interval
* Preserve the job states on restart
* Able to pause and resume specific job
* Receiving hook and running commands
* HTTP APIs to programmatically interact with karajo

Work flow on karajo,

----
                +-----+      +-------+
  INTERNET <--- | Job | ---> | Hook  | <--- INTERNET
                +-----+      +-------+
                                 |
                                 |
                                 v
                        +-----------------+
                        | Commands / Call |
                        +-----------------+
----

[#config]
== Configuration

This section describe the file format when loading karajo environment from
file.

There are three configuration sections: one to configure the server, one to
configure the logs, and another one to configure one or more jobs to be
executed.

Default values,

* `DefaultMaxRequests`: 1

[#config_server]
=== Server

This section has the following format,

----
[karajo]
name = <string>
listen_address = [<ip>:<port>]
http_timeout = [<duration>]
dir_base = <path>
dir_public = <path>
secret = <string>
max_hook_running = <number>
----

`name`:: Define the name of the service.
+
--
It will be used for title on the web user interface, as log prefix, for file
prefix on the jobs state, and as file prefix on log files.
If this value is empty, it will be set to "karajo".
--

`listen_address`:: Define the address for WUI, default to ":31937".

`http_timeout`:: Define the HTTP timeout when executing the job, default to 5
minutes.
The value of this option is using the Go time.Duration format, for example,
30s for 30 seconds, 1m for 1 minute.

`dir_base`:: Define the base directory where configuration, job state, and log
stored.
+
--
This field is optional, default to current directory.
The structure of directory follow the UNIX system,

----
$DirBase
|
+-- /etc/karajo/karajo.conf
|
+-- /var/lib/karajo/hook/$Job.ID
|
+-- /var/log/karajo +-- /hook/$Hook.id
|                   |
|                   +-- /job/$Job.ID
|
+-- /var/run/karajo +-- /job/$Job.ID
----

Each job log stored under directory /var/log/karajo/job and the job state
under directory /var/run/karajo/job.
--

`dir_public`:: Define a path to serve to the public.
+
--
While the WUI is served under "/karajo", a directory dir_public will be served
under "/".
A dir_public can contains sub directory as long as its name is not
"karajo".
--

`secret`:: Define the secret to authorize the incoming request through
signature.
+
--
Each request sign the payload (query string or body) with HMAC + SHA-256 using
this secret.
The signature then sent in HTTP header "x-karajo-sign" as hex.
This field is required.
--

`max_hook_running`:: Define the maximum hook running at the same time.
This field is optional default to 1.


[#config_hook]
===  Hook

Hook is the HTTP endpoint that run a function or list of commands upon
receiving request.

The hook configuration have the following format,

----
[hook "name"]
description = <string>
path = <string>
secret = <string>
log_retention = <number>
command = <string>
...
command = <string>
----

`name`:: Define the hook name.
The hook name is used for logging, normalized to ID.
This field is required and should unique between Hook.

`description`:: Define the hook description.
It could be plain text or simple HTML.

`path`:: Set HTTP path where Karajo will listen for request.
+
--
The `path` is automatically prefixed with "/karajo/hook", it is not
static.
For example, if it set to "/my", then the actual path would be
"/karajo/hook/my".
This field is required and unique between Hook.
--

`header_sign`:: Define custom HTTP header where the signature is read.
Default to "x-karajo-sign" if its empty.
+
--
For example, to receive WebHook from GitHub, one can set this value to
"X-Hub-Signature-256".
--


`secret`:: Define a string to check signature of request.
+
--
Each request sign the body with HMAC + SHA-256 using this secret.
The signature then sent in HTTP header "x-karajo-sign" as hex.
This field is required.
--

`log_retention`:: Define the maximum number of logs to keep in storage.
This field is optional, default to 5.

`command`:: This option can be defined multiple times.
It contains command to be executed, in order from top to bottom.
+
--
The following environment variables are available inside the command:

* KARAJO_HOOK_COUNTER: contains the current hook counter.
--


[#config_job]
=== Job

This section has the following format,

----
[job "name"]
description = <string>
secret = <string>
interval = <duration>
max_requests = <number>

http_method = [GET|POST|PUT|DELETE]
http_url = <URL>
http_request_type = [query|form|json]
http_header = <string ":" string>
http_timeout = <duration>
http_insecure = <bool>
----

`name`:: Define the job name.
Each job must have unique name or only the first one will be processed.

`description`:: Define the job description.
It could be plain text or simple HTML.

`secret`:: Define a string to sign the request query or body with
HMAC+SHA-256.
The signature is sent on HTTP header "x-karajo-sign" as hex string.
This field is optional.

`interval`:: Define the duration when job will be repeatedly executed.
+
--
This field is required, if not set or invalid it will set to 30 seconds.
If one have job that need to run less than 30 seconds, it should be run on
single program.
--

`max_requests`:: Define the Maximum number of requests executed by karajo.
This field is optional default to DefaultMaxRequests.

`http_method`:: Define the HTTP method used to request job execution.
Its accept only GET, POST, PUT, or DELETE.
This field is optional, default to GET.

`http_url`:: Define the HTTP URL where the job will be executed.
This field is required.

`http_request_type`:: Define the header Content-Type to be set on
request.
+
--
Its accept,

* query: no header Content-Type to be set, reserved for future use;
* form: header Content-Type set to "application/x-www-form-urlencoded";
* json: header Content-Type set to "application/json".

The type "form" and "json" only applicable if the method is POST or PUT.
This field is optional, default to query.

Each Job execution send the parameter named `_karajo_epoch` with value is
current server Unix time.
If the request type is `query` then the parameter is inside the query URL.
If the request type is `form` then the parameter is inside the body.
If the request type is `json` then the parameter is inside the body as JSON
object, for example `{"_karajo_epoch":1656750073}`.
--

`http_header`:: Define optional HTTP headers that will send when executing the
"http_url".
This option can be declared more than one.

`http_timeout`:: Define the HTTP timeout when executing the job.
+
--
If its zero, it will set from the Environment.HttpTimeout.
To make job run without timeout, set the value to negative.
The value of this option is using the Go time.Duration format, for example,
30s for 30 seconds, 1m for 1 minute, 1h for 1 hour.
--

`http_insecure`:: Can be set to true if the "http_url" is HTTPS with unknown
certificate authority.


[#http_api]
== HTTP APIs

The karajo service is a HTTP server.
Its provide HTTP APIs to interact with the system.
The following sub-sections describe each HTTP APIs request and response.

All HTTP response is encoded in the JSON format, with the following wrapper,

----
{
        "code": <number>,
        "message": <string>,
        "data": <array|object>
}
----

* `code`: the response code, equal to HTTP status code.
* `message`: the error message that describe why request is fail.
* `data`: the dynamic data, specific to each endpoint.

[#http_api_schemas]
=== Schemas

[#schema_environment]
==== Environment

JSON format,

----
{
	"Hooks": {<Hook.Name>: <Hook>, ...},
	"Jobs": {<Job.Name>: <Job>, ...},

	"Name": <string>,
	"ListenAddress": <string>,
	"DirBase": <string>,
	"DirPublic": <string>,

	"HttpTimeout": <number>,
	"IsDevelopment": <boolean>
}
----

* `Hooks`: list of Hook.
* `Jobs`: list of Job.

* `Name`: the karajo server name.
* `ListenAddress`: the address where karajo HTTP server listening for request.
* `DirBase`: The path to directory used as working directory.
* `DirPublic`: The path to directory served to public.

* `HttpTimeout`: default HTTP timeout for job in nano-second.
* `IsDevelopment`: true if current karajo server run for testing.


[#schema_hook]
==== Hook

JSON format,

----
{
	"ID": <string>,
	"Name": <string>,
	"Description": <string>,
	"Path": <string>,
	"LogRetention": <number>,
	"LastStatus": <"success"|"fail">,
	"Commands": [<string>, ...],
	"Logs": [<HookLog>, ...]
}
----


[#schema_hooklog]
==== HookLog

JSON format,

----
{
	"HookID": <string>,
	"Name": <string>,
	"Status": <string>,
	"Content": <base64>,
	"Counter": <number>
}
----

* `HookID`: the ID of hook that own the log.
* `Name`: the Name of log in the format `HookID.Counter.Status`.
* `Status`: the status of hook, its either "success" or "fail".
* `Content`: the content of log.
* `Counter`: the log number.


[#http_api_schema_job]
====  Job

JSON format,

----
{
	"LastRun": <string>,
	"Status": <string>,
	"NextRun": <string>,
	"Log": [<string>, ...],

	"ID": <string>,
	"Name": <string>,
	"Description": <string>,

	"HttpMethod": <string>,
	"HttpUrl": <string>,
	"HttpRequestType": <string>,
	"HttpHeaders": [<string>],
	"HttpTimeout": <number>,

	"Interval": <number>,
	"MaxRequests": <number>,
	"NumRequests": <number>,
	"HttpInsecure": <boolean>
}
----

* `LastRun`: date and time when the job last run, in the format RFC3339,
* `Status`: status of the last job running, its either "started, "success",
  "failed", or "paused".
* `NextRun`: date and time when the next job will be executed, in the format
  RFC3339.
* `Log`: job logs as array.

* `ID`: unique job ID
* `Name`: human representation of job name.
* `Description`: job description, can be HTML.

* `HttpMethod`: the HTTP method used to invoke.
* `HttpUrl`: the URL where job will be executed.
* `HttpRequestType`: the request type for HTTP.
* `HttpHeaders`: list of string, in the format of HTTP header "Key: Value",
  which will be send when invoking the job at `HttpUrl`.
* `HttpTimeout`: number of nano-seconds when the job will be considered to be
  timeout.

* `Interval`: a period of nano-seconds when the job will be executed.
* `MaxRequests`: maximum number of job can be requested at a time.
* `NumRequests`: current number of job running.
* `HttpInsecure`: true if the HttpUrl use HTTPS schema with self-signed
  certificate.


[#http_api_environment]
=== Get environment

Get the current karajo environment.

**Request**

----
GET /karajo/api/environment
----

**Response**

On success, it will return the Environment object,

----
{
	"code": 200,
	"data": <Environment>
}
----

[#http_api_hook_log]
=== Get hook log

HTTP API to get the hook log by its ID and counter.

**Request**

----
GET /karajo/api/hook/log?id=<hookID>&counter=<logCounter>
----

Parameters,

* `hookID`: the hook ID
* `logCounter`: the log number.

**Response**

On success, it will return the
link:#HookLog[HookLog]
object as JSON.


[#http_api_job]
=== Get job detail

HTTP API to get specific job information by its ID.

**Request**

----
GET /karajo/api/job?id=<string>
----

Parameters,

* `id`: the job ID.

**Response**

On success, it will return the Job schema.

On fail, it will return

* `400`: for invalid or empty job ID


[#http_api_job_log]
=== Get job logs

Get the last logs from specific job by its ID.

**Request**

----
GET /karajo/api/job/logs?id=<string>
----

Parameters,

* `id`: the job ID.

**Response**

On success it will return list of string, contains log execution and the
response from executing the `HttpUrl`.

On fail, it will return

* `400`: invalid or empty job ID.


[#http_api_job_pause]
=== Pause the job

Pause the job execution by its ID.

**Request**

The request is authorization using signature.

Format,

----
POST /karajo/api/job/pause?id=<id>
x-karajo-sign: <query signature>
----

Parameters,

* `id`: the job ID.

**Response**

On success it will return the Job schema with field `Status` set to `paused`.

On fail it will return

* `400`: invalid or empty job ID.


[#http_api_job_resume]
=== Resume the job

HTTP API to resume paused job by its ID.

**Request**

The request is authorization using signature.

Format,

----
POST /karajo/api/job/resume?id=<id>
x-karajo-sign: <query signature>
----

Parameters,

* `id`: the job ID.

**Response**

On success it will return the Job schema related to the ID with field
`Status` reset back to `started`.


[#example]
== Example

Given the following karajo configuration file named `karajo.conf` inside the
`testdata` directory:

----
include::testdata/karajo.conf[]
----

NOTE: For web viewer, see the actual file in testdata/karajo.conf in this
repository.

Run the `karajo` program,

----
$ karajo -config testdata/karajo.conf
----

And then open http://127.0.0.1:31937/karajo in your web browser to see the job
status and logs.


[#license]
== License

Copyright 2021, M. Shulhan (ms@kilabit.info).

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program.  If not, see <http://www.gnu.org/licenses/>.


[#links]
== Links

link:CHANGELOG.html[CHANGELOG]

https://git.sr.ht/~shulhan/karajo[Source code repository^].