Conception.md 7.2 KB

Conception

actor

actor is an entity with its own lifetime, whose job is to react to incoming messages. Here "react" means to produce side-effects like logging, blinking LEDs etc, or to send new messages.

Here is a diagram of normal actor life cycle, where round ellipse is actor state and box is actor method.

actor lifcycle

Usually, actor reacts on user-defined messages, when it is in operational state. An actor have to subscribe on particual message type(s) to react on the message. This subscription is performed via initialize() method overriding. As there are framework messages, an user-defined actor must invoke ActorBase::initialize() method to allow the framework do that.

Actor does not react on the messages, when it is in the off state. The ActorBase::advance_stop() method invocation does that. Actor can override the advance_stop() method and postpone the ActorBase::advance_stop() invokation when it is ready, i.e. until the underlying hardware is off.

Likewise the advance_init() method can be overrided and the ActorBase::advance_init() can be postponed, until underlying hardware is ready.

The advance_start() method plays a bit different role: it is singalled from its supervisor (see below), that everything is ready to start (i.e. all other actor's siblings, belonging to the same supervisor) and it is safe to send a message to other actors as they are operational. In other words, it is the cross-actor synchronization point. When overriding, the ActorBase::advance_start() must be invoked to change the state.

All advance_* methods are invoked automatically upon receiving rotor-light messages, it is incorrect to invoke them manually.

actor is designed to be recycleable, i.e. when it is shut down (off state), it can be started again. If an actor has defaults values, which are changed during its lifetime, the overriden initialize() method is the proper place to reset the defaults.

There might be the case, then an actor cannot be initialized, i.e. due to hardware failure. In that case, the ActorBase::initialize() must not be invoked, and the failure should be reported to its supervisor via the code like:

send<ctx::thread, message::ChangeStateAck>(0, supervisor->get_id(), id, State::initialized, false);

What to do with the failure, is the job of the supervisor.

actor summary

Subscribe and initialize the default values in initialize() method. Start messaging with other actors in advance_start(). Initialize the hardware via the overriding advance_init() method and when you done, invoke the ActorBase::advance_init(). Similarly, shutdown the hardware via the overriding advance_stop() method and when you done, invoke the ActorBase::advance_stop().

All cross-actor communication when everything is ready is performed when actors are in the operational state.

supervisor

Supervisor is a special kind of actor, which manages (or "orchestrates", if you like) other actors (including supervisors), which it owns. Supervisor is responsible for initializing, shutting down, synchronization of actors and it also handles actors failures.

Supervisor initialization (shut down) is simple: it becomes initialized (off) when all its child actors are initialized (off). It is imporant, that supervisor waits, until all its children become initialized (off).

When it is initialized it advances self into operational state and dispatches start message to all its children. This way it synchronizes start of actors: an actor has guarantee, when it starts sending messages to other actors, they are operational.

fail policies

Dealing with actor failure, is a bit more complex task, as it depends on actor state and the fail policy, defined in the actor.

The full list of fail policies is:

enum class FailPolicy {
    restart        = 0b00000001,
    force_restart  = 0b00000011,
    escalate       = 0b00000100,    // default
    force_escalate = 0b00001100,
    ignore         = 0b00010000,
};

If actor stops in operational state this is the normal case, and supervisor does nothing, unless the actor policy is force_restart or force_escalate, which start restart and escalate procedure correspondingly.

The actor restart procedure is simple: send it stop message (if the actor wasn't off), and then initialize() it and wait initialization confirmation (it's done via ActorBase::advance_init()).

The actor escalate procedure is simple too: supervisor stops all actors, and then shuts self down too. If it was the root supervisor, it exits from ::process() messages method too, as there are no more messages. Then, usually, that leads to exiting from main() method of the firmware which causes hardware restart.

If actor fails in initializing state, and it's policy is restart, then supervisor performs the restart actor procedure.

If actor fails in initializing state, and it's policy is escalate, then supervisor performs the escalate failure procedure.

If actor has ignore policy and it stops, the supervisor continue to perform it's normal activity (i.e. wait of initialization/shutdown of other actors).

NB: fail policy can be changed by actor itself during it's lifetime, i.e. escalate falure after 3 failed initializations.

The default policy of an actor is escalate.

The "failed to shutdown" is meaningless for the framework, i.e. supervisor always waits until it receives child actor shutdown confirmation.

fail policies summary

force_* policies, "ignore" actor state and instruct supervisor to perform appropriate procedure. restart, escalate and ignore has sense only for operational actor state.

composition

The underlying idea is: if an actor fails to initialize (due to hardware failure), give it another chance by restarting it afresh. May be a few times. May be recurrently.

Supervisor was designed to be composeable, i.e. it can contain other supervisors as its childern forming hierarchy of actors .

Why would you need that, instead of having the "flat" actors list and just have messaging? It is needed to handle failure escalation: supervisor groups related actors, and if one of them fails, restart the whole group, maybe a few times, util escalating the failure upstream, i.e. to root supervisor.

If the root supervisor fails too... ok, it seems, that we have tried the best we can, and the final radical remedy left is just to restart the whole board (i.e. exit from main() entry point).

message priorities

In rotor_light it is possitble to have multiple queues. Queues are indexed from 0, as usually, howerver queues are processed in the reverse order, so a queue with higher index is processed before any queue with lower index. This means that queue index is actually queue priority.

using Queue = rl::Queue<Storage, 15, 5>;

In the example above master queue consists of two sub-queues: first (index 0) with storage for 15 messages, and the second (index 1) with storage for 5 messages. Until there is any message in the second queue, it is processed; then it starts processing messages in the first queue.