# Borg state configuration

The purpose of this document is to explain the concepts, organization,
and implementation of the state configuration part of borg.

At a high level, defining and applying state configuration has five
steps:

- The client defines a graph describing the state to be applied. At a
  minimum, each node in the graph has a *type* corresponding to the
  kind of resource (file, package, user, etc.) the node represents, a
  *name* that can be used to refer to it (in particular, by other
  nodes), and a (possibly empty) *requirements* list, containing the
  names of the nodes the node under consideration depends on.

- The client serializes the graph and sends it to a borglet (or many
  borglets).

- The borglet receives the graph, and deserializes it. The borglet
  then traverse the graph in topological order. For each node, the
  borglet obtains a list of actions it must take in order to make the
  machine's actual state conform to the state the node describes.

- The borglet then executes each of the actions in order. If any
  action fails, no further actions are taken.

- Regardless, a description of actions taken is then returned to the
  client. In the event of failure, an error message is also returned,
  and the action that failed is indicated.

What follows will follow roughly that order.

## Defining graphs and nodes

A graph is represented as a map with two keys, `:node` and `:nodemap`.

The value of the `:node` key is a node. A node is a map with the
following structure:

```clojure
{:provides :node-name
 :requires list-of-keywords
 :produces-ops list-of-ops
 :attrs {:type "string identifying the type"
         :attr1 value1
         :attr2 value2
         ...}}
```

Both the `:requires` key and the `:attrs` map can be empty. The
`:provides` key identifies the name of the node and should always be a
keyword. It is an error for a graph to contain two nodes with the same
name [not currently checked]. The `:requires` key identifies which, if
any, other nodes this node depends on.

The `:nodemap` key contains the actual specification of the graph, and
consists of a map whose keys are names of nodes and whose values are
the corresponding nodes.

The `borg.state.util` namespace contains two macros for making
defining nodes more convenient.

`node-type` can be used to define a function that creates a node with
a particular type and attributes. It validates the attributes using
validators defined by `validator-maker`.

`node-type` takes as its first argument the name of the function to
create (which will also serve as the type of the node) and as its
remaining arguments key/value pairs describing the attributes nodes of
that type have. Functions defined with `node-type` take one or more
positional arguments specified in the macro call, followed by a map
containing the attributes for the specific node being created. For
instance:

```clojure
(node-type process :arguments [pathname] :required-attrs [user group env args] :optional-attrs [block?]
           :produce-ops [service-down ensure-running])
```

This defines a function, `process`, whose first argument is a pathname
(presumably to the binary to be executed), and whose second argument
is a map that must contain the keys `:user`, `:group`, `:env`,
`:args`, and `:provides` (which is not specified, but is always
required), and can optionally also contain the keys `:block?` and
`:requires` (again, not specified, but always imlicitly allowed).

If a validator called `env` (e.g.) has been defined with
`validator-maker`, then the `:env` key in the map passed to `process`
will be validated and possibly transformed. (See the docstring for
`borg.state.util/validator-maker` for more.) For instance, we might
wish to ensure that the environment is specified as a map, in which
case we can call `(validator-maker (env map?))`. Multiple validators
can be specified at once using `make-validators`:

```clojure
(make-validators (env map?)
                 (block? (some-fn true? false? nil?))
                 (user (some-fn string? keyword?) name))
```

This creates validators for `:env` (it must be a map), for `:block?`
(it must true, false, or nil), and for `:user` (it must be a string or
keyword, and will be converted to a string).

NB: The return value from a function defined by `node-type` is a
*graph* in the sense specified at the beginning of this section---a
map containing a `:node` key and a `:nodemap` key. It is *not* just
the node itself.

### Specifying dependencies

When specifying the dependencies of a node under the `:requires` key
of the argument to the node-creating function, either the keyword name
of the required node *or* the graph for the node can be provided.

That is, given the following:

```clojure
(node-type file ...)
(node-type directory ...)
(def d (directory ... {:provides :some-directory ...}))
```

There are two ways to declare a file that depends on `d`: the
`:requires` list can contain `:some-directory` or `d` itself. If a
graph is passed in as a dependency, it is assumed that the dependency
is the node found in the `:node` key, and the returned graph will
contain, in its `:nodemap` map, both nodes.

```clojure
(def f (file ... {:provides :some-file :requires [d]}))
```

If, on the other hand, the requirement is specified using just the
name, the node-creating function won't have access to the required
node itself, so the graphs must be merged. `borg.state.core` has a
`merge-graphs` function which does this. If it is called with two
arguments, it also ensures that, if one of its arguments *n* requires
the argument *m*, *n* is the node found in the `:node` key of the
resulting graph.

```clojure
(def f (file ... {:provides :some-file :requires [:some-directory]}))
(def complete-graph (merge-graphs f d))
```

## Serializing graphs

Graph serialization is done in two phases:

1.  Each node has the multimethod `to-wire` (defined in
    borg.state.graph) called on it. Multimethod dispatch is based on
    the type of the node.
   
    The default implementation of `to-wire` returns its argument
    unchanged. An implementation other than the default is necessary
    if:
   
    - the client has access to something that the borglet doesn't. For
    instance, the `file` node type can be given the path to a local
    file for its template argument, and `to-wire` reads and renders the
    contents of the file, which may not exist on the borglet.

    - the node contains an attribute that can't be serialized to JSON.
    For instance, the `file` node type takes as an attribute a function
    called to generate the context used to render templates. Since
    functions can't be serialized, it needs to be removed from the node
    before being sent to the borglet.

2.  The graph is processed into lists of nodes that do not depend on
    each other (i.e. lists that all belong at the same "layer" of a
    topological sort). These lists are what is actually sent to the
    borglet.

## Deserializing graphs

Graph deserialization is basically just the inverse of serialization:

1.  Each node has the multimethod `from-wire` called on it. As with
    `to-wire` dispatch is by node type and by default returns its
    argument unchanged. `from-wire` can be used to add attributes
    specific to the receiving borglet to a node so that further
    processing can act as if those attributes were there all along.

2.  The mapping of node names to nodes is reconstructed from the
    topological layers.

## Computing actions to take

A state node describes the way some one aspect of the borglet's
machine should look, but that does not mean that only one thing needs
to be done to get the machine into that state. The multimethod
`check-node` is used to map from a state node to a (possibly empty)
list of *actions* to be taken to get the machine into the state
described by the node. `check-node` dispatches on node type and takes
three arguments, the state node itself, a map of node names to nodes
(in case a node needs to examine its dependencies), and a list of maps
of node names to actions already computed by `check-node` for children
of the present node.

The return value of `check-node` can take two forms:

1.  a (possibly empty) list of actions to take. This is transformed
    into a map with a single key, `:actions`.

2.  a map of the form `{:actions actions :shutdown-first shutdown}`,
    where `actions` is a (possibly empty) list of actions to take, and
    `shutdown` is a *single* action to be executed before traversing
    the graph and executing actions as returned in (1) or in the
    `:actions` key.

    shutdown-first actions are executed sequentially in the opposite
    order of the execution of the other actions returned by
    check-node. That is, given a dependency relation `A --> B --> C`,
    where the arrows indicate the depends-on relation, if `A`, `B`,
    and `C` all return maps with both `:actions` and `:shutdown-first`
    keys, the shutdown actions will be run in the order `A, B, C`,
    whereas the actions specified by the `:actions` key will be run in
    the order `C, B, A`.

A map with this form is referred to as an "action spec" below.

### Defining actions

In order to support dry-run execution and to facilitate reporting of
what happened/would have happened on the borglet's machine,
determining what needs to be done and actually doing it are separated.
`check-node` consequently returns representations of something to be
done, which are executed at a separate stage.

A macro, `defop`, is provided to facilitate creating functions
returning representations with the expected form. The arguments to
`defop` are the same as the arguments to `defn` (except that only one
arity is allowed, and destructuring in the argument list should be
avoided). `defop` creates a function which, when called, returns a map
containing a map of argument name to argument values, the name of the
function itself, the function body as a list, and a function of zero
arguments that actually executes the body provided to `defop`. It also
defines a function that accepts a *single* argument, a state node, and
destructures its attributes map. Thus `(defop write [pathname
contents] (spit pathname contents))` expands to

```clojure
(defnodefn write [pathname contents]
    {:args (zipmap [:pathname :contents] [pathname contents])
     :body '(fn [] (spit pathname contents))
     :fn (fn [] (spit pathname contents))
     :op 'write})
```

Which expands in turn to

```clojure
(do
  (defn write [pathname contents]
      {:args (zipmap [:pathname :contents] [pathname contents])
       :body '(fn [] (spit pathname contents))
       :fn (fn [] (spit pathname contents))
       :op 'write})
  (defn node-write [{{:keys [pathname contents]} :attrs :as G_1234}]
      {:args (zipmap [:pathname :contents] [pathname contents])
       :body '(fn [] (spit pathname contents))
       :fn (fn [] (spit pathname contents))
       :op 'write})
```

All that is really required for something to be an "action" is that it
be a map with an `:fn` key whose value is a nullary function and an
:op key whose value is a descriptive name, but the other attributes in
the maps created by `defop` provide some helpful information.

### Action return values

Actions are expected to return a map with a `:status` key whose value
is either `:ok` or `:error`. If the status is `:ok`, the map *can*
optionally also have a key called `:log` containing output or tracing
information from the run; if the status is `:error`, the map should
also contain a `:reason` key whose value describes why the action
failed.

## Executing actions

The borglet receives from the client a list of lists of nodes such
that each node in a given list is independent of the others in that
list. That list is converted by `check-node` into a list of maps of
node name to action specs (as described in "Computing actions to
take"). A map of node name to action specs will be referred to as an
"action map". The action specs in the values of a given action map are
independent of each other and are run in parallel, while the specific
actions in a given action spec do (or are assumed to) depend on each
other, and are run sequentially.

That is: given inputs `[[node1 node2] [node3 node4]]` (and assuming
that the nodes are named `node`, etc.), we might get the action maps
`[{node1 {:actions [action1 action2]} node2 {:actions [action3 action4]}}
  {node3 {:actions [action5]} node4 {:actions [action6 action7]}}]`

Within each action spec, if any action returns an error status, no
further actions are processed and an error structure is returned
detailing the step the borglet stopped on, the actions that were
successfully executed, the actions that were planned to be executed,
and the reason for failure. If no action returns an error status, a
success structure is returned that simply contains a log of the
completed actions.

If the result of running any action map is an error, no further
processing stops and an error structure is returned to the client:
this structure records the action-lists that failed (there may be more
than one, e.g. if both `action1` and `action4`, which are processed in
parallel, fail) and those that completed successfully within a given
list, as well as the log of previous action lists. (That is: if
`action6` fails, the return value will indicate that the failure
occurred after processing `[action1 action2]` and `[action3 action4]`,
while processing `[action5]` and `[action6 action7]`, that `[action5]`
succeeded, and that `action6` failed and `action7` was never
attempted.) Otherwise, a success structure is returned which simply
contains a log of all the actions taken.
