123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239 |
- Runtime control for long-running services
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- This project employs a set common conventions for all of its controllable
- services, with few minor variations. Upon startup, the service named foosrv
- creates (binds) a unix domain socket (AF_UNIX)
- /run/ctrl/foosrv
- and listens for incoming connections. A matching control tool, fooctl, is
- meant to be used as an interactive command. When invoked, fooctl connects
- to the socket, sends commands, receives replies, waits for events etc and
- then exits.
- Client connections are supposed to be short in most cases. The client is only
- there to request and observe service state changes. Since the client tool
- is supposed to be used by the (singular) human user, there should be at most
- about one instance running at any given time. To allow for some error recovery,
- most services do accept several control connections at a time, but tend to have
- a pretty low limit on the number of active connections.
- File system interaction
- ~~~~~~~~~~~~~~~~~~~~~~~
- It is up to the system to ensure /run/ctrl exists and is a directory.
- Typically this means `mkdir /run/ctrl` somewhere in early system startup
- scripts, probably shortly after mounting /run as a tmpfs.
- Individual services will abort if bind() fails, for instance because the
- socket in question already exists. This is done mostly because that's how
- syscalls work, but it also helps preventing dual instances of the same
- service running at the same time. The socket dirent is used as a lock to
- prevent the second invocation.
- Whenever possible, the services attempt to unlink their socket on exit
- to allow themselves to be restarted. It may so happen however that a service
- dies without clearing the socket up (think SIGKILL or SIGSEGV), in which
- case it won't be able to restart. There is no handling for such cases in
- this project. If the unbound socket remains in the system after the service
- dies, it has to be removed to let the service restart.
- See "Sticky sockets" below for some discussion of a proper fix to this issue.
- Security and permissions
- ~~~~~~~~~~~~~~~~~~~~~~~~
- File system permissions fully control the access to the sockets. In most
- cases, there are no further check on the service side; if the client can
- connect() to a socket, it can run any commands on that socket.
- Setting socket permissions is tricky, so current approach is to set
- permissions on the directory (e.g. /run/ctrl) statically, and make sure
- file permissions on socket themselves do not interfere but chmod'ing
- them 0777.
- Note this project is written with a particular setup in mind that
- only requires a dedicated wheel group.
- Functional (non-control) sockets
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Some services are expected to provide some IPC for other processes in
- the system, not just the privileged wheel group, as a part of their normal
- operations (think for instance host name resolution service). Sockets for
- this kind of IPC are created in some other directory, not in /run/ctrl:
- /run/comm/resolved <-- functional
- /run/ctrl/resolved <-- control
- In this example, /run/resolved is available for all processes in the system,
- and allows the clients to perform name resolution only, while the control
- socket requires privileged access and allows to reconfigure the resolver
- (set DNS forwarders and so on).
- These functional sockets in general will follow different conventions,
- like for instance allowing large number of concurrent connections, and
- may use different protocols for communication.
- Per-command permissions
- ~~~~~~~~~~~~~~~~~~~~~~~
- The default assumption regarding control sockets is that anyone who can
- connect() can run all the commands there. There is no generic way to restrict
- particular commands.
- The service may require credentials to be passed as a part of its control
- protocol, and do something based on those, but so far it does not look like
- this trick will be used for permissions check.
- Sticky sockets
- ~~~~~~~~~~~~~~
- This is a proposal for a kernel change that would make the scheme described
- above simpler and more reliable. Also, this is the reason for not attempting
- userspace workarounds.
- Linux already allows creating sockets with mknod, even though the resulting
- FS nodes are completely useless. The idea is to make them useful, in a way
- that won't break existing code too much.
- Let's call a local socket with the sticky bit set a "sticky socket".
- This combination is meaningless and should never happen with conventional
- use of local sockets, so we can assign pretty much semantics to it.
- Trying to bind() a sticky socket should
- * fail with EBUSY if it is already bound by another process
- * fail with EPERM unless the calling process owns the socket
- (euid = uid of the socket), or it has CAP_CHOWN
- * succeed otherwise
- With sticky sockets, permission setup becomes simple and straightforward:
- mknod /run/ctrl/wsupp s
- chgrp wifi /run/ctrl/wsupp
- ...
- exec wimon
- The service only needs to know the name of the socket to bind.
- All passwd/group parsing code remains in chown where it belong.
- Wire protocol for control sockets
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The common protocol several long-running services in minibase use for their
- control sockets is a simplified version of generic netlink (GENL) protocol.
- The parties exchange messages over a SEQPACKET connection.
- Each message consists of a command optionally followed by attributes:
- +-----+-----+ Attributes:
- | len | cmd | <-- header +-----+-----+
- +-----+-----+ +-----+-----+ | len | key |
- | len | key | <-- attribute | len | key | +-----+-----+
- +-----+-----+ +-----+-----+ | some _nul |
- | value.... | | 3322 1100 | | l_te rmin |
- | ...... | +-----+-----+ | ated stri |
- +-----+-----+ 0x00112233 | ng\0 | <--+
- | len | key | <-- attribute +-----------+ |
- +-----+-----+ +-----+-----+ |
- | value.. | | len | key | +-----+-----+ |
- +-----+-----+ +-----+-----+ | len | key | |
- | len | key | <-- attribute | 7766 5544 | +-----+-----+ |
- +-----+-----+ | 3322 1100 | | 0F45 A8B3 | |
- | value.... | +-----+-----+ | 1379 | <--+
- | ......... | 0x0011..77 +-----+-----+ |
- +-----+-----+ raw MAC |
- |
- <- 4 bytes -> padded to 4 bytes
- All integers are host-endian. Lengths are in bytes and include respective
- headers, but do not include padding. Attributes are always padded to 4 bytes.
- For string attributes, length includes the terminating \0.
- Attributes may be nested. The payload of the enclosing attribute is then
- a sequence of attributes.
- When stored in memory, the message itself shares format with attributes.
- However, on the wire, the length is not included in the packet payload,
- and goes as metadata through the socket API instead.
- Communication is assumed to be synchronous (request-reply). The service
- replies with .command == 0 on success, .command = (-errno) < 0 on failure.
- There is exactly one reply for each request. It is assumed that the client
- knows what kind of reply to expect for each request issued.
- Replies with .command > 0 are notifications not caused by client requests.
- Pulling notifications out of the stream should leave a valid request-reply
- sequence. It is up to the service to decide whether to use them, and how.
- Clients that do not expect notifications should treat them as protocol errors.
- Commands are service-specific. Negative values are expected to be system-wide
- errno(3) codes. Error messages may include attributes.
- Attribute keys are service-specific. The service defines which keys should be
- used for each command. Current implementation silently ignores unexpected keys.
- The same key may be used several times within the same payload if both parties
- are known to expect this. If multiple uses of the key are not expected,
- the first attribute with the key is used and the rest get silently ignored.
- Integer payloads shorter than 4 bytes should be extended to 4 bytes for
- transmission.
- Differences between nlusctl and GENL
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The protocol was originally called "netlink-based userspace control protocol",
- and there were plans to share some of the code. Since then, the protocols have
- diverged a lot. The general format of the attributes is still the same though.
- nlusctl only runs over point-to-point connections. Netlink (RTNL and GENL)
- apparently have some support for ethernet-like point-to-net communication,
- which would explain .pid field in the request.
- nlusctl does not support asynchronous communication modes. So no .seq and
- no ACK or REQUEST flags.
- There are no multi-part replies in nlusctl. Where applicable, the client
- has to request the next part (entry, packet, whatever) explicitly using
- an index of some sort.
- In GENL, DUMP flags affect the meaning of the .cmd field.
- In nlusctl, distinct values of the .command field are used for this purpose.
- GENL commands have .version field, nlusctl is expected to use distinct .cmd
- values -- if it is going to be needed at all.
- Combining all that, nlusctl drops most of the fields found in GENL headers,
- and removes distinction between struct nlmsg/nlgen/nlerr. This was one of the
- biggest reasons to choose a custom protocol over GENL.
- GENL and nlusctl use different encoding for lists of similar items within
- the same payload. GENL, because of the very weird way they parse the messages
- within the kernel, must use a nested attribute with 0, 1, 2, ... keys:
- [ ATTR_SOMETHING,
- [ 0, value ],
- [ 1, value ],
- [ 2, value ] ]
- In contrast, nlusctl uses top-level multi-keys for this purpose:
- [ ATTR_SOMETHING, value ],
- [ ATTR_SOMETHING, value ],
- [ ATTR_SOMETHING, value ]
- The trick GENL uses needs an extra header and breaks key => type-of-payload
- relation since the enumeration keys may happen to match (and often do match)
- unrelated ATTR_* constants.
- Otherwise the format of the attributes is the same in GENL, RTNL and nlusct.
- This was done intentionally to share as much parsing code as possible.
- Library support
- ~~~~~~~~~~~~~~~
- See ../lib/nlusctl.h and ../lib/nlusctl/
|