control.txt 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239
  1. Runtime control for long-running services
  2. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  3. This project employs a set common conventions for all of its controllable
  4. services, with few minor variations. Upon startup, the service named foosrv
  5. creates (binds) a unix domain socket (AF_UNIX)
  6. /run/ctrl/foosrv
  7. and listens for incoming connections. A matching control tool, fooctl, is
  8. meant to be used as an interactive command. When invoked, fooctl connects
  9. to the socket, sends commands, receives replies, waits for events etc and
  10. then exits.
  11. Client connections are supposed to be short in most cases. The client is only
  12. there to request and observe service state changes. Since the client tool
  13. is supposed to be used by the (singular) human user, there should be at most
  14. about one instance running at any given time. To allow for some error recovery,
  15. most services do accept several control connections at a time, but tend to have
  16. a pretty low limit on the number of active connections.
  17. File system interaction
  18. ~~~~~~~~~~~~~~~~~~~~~~~
  19. It is up to the system to ensure /run/ctrl exists and is a directory.
  20. Typically this means `mkdir /run/ctrl` somewhere in early system startup
  21. scripts, probably shortly after mounting /run as a tmpfs.
  22. Individual services will abort if bind() fails, for instance because the
  23. socket in question already exists. This is done mostly because that's how
  24. syscalls work, but it also helps preventing dual instances of the same
  25. service running at the same time. The socket dirent is used as a lock to
  26. prevent the second invocation.
  27. Whenever possible, the services attempt to unlink their socket on exit
  28. to allow themselves to be restarted. It may so happen however that a service
  29. dies without clearing the socket up (think SIGKILL or SIGSEGV), in which
  30. case it won't be able to restart. There is no handling for such cases in
  31. this project. If the unbound socket remains in the system after the service
  32. dies, it has to be removed to let the service restart.
  33. See "Sticky sockets" below for some discussion of a proper fix to this issue.
  34. Security and permissions
  35. ~~~~~~~~~~~~~~~~~~~~~~~~
  36. File system permissions fully control the access to the sockets. In most
  37. cases, there are no further check on the service side; if the client can
  38. connect() to a socket, it can run any commands on that socket.
  39. Setting socket permissions is tricky, so current approach is to set
  40. permissions on the directory (e.g. /run/ctrl) statically, and make sure
  41. file permissions on socket themselves do not interfere but chmod'ing
  42. them 0777.
  43. Note this project is written with a particular setup in mind that
  44. only requires a dedicated wheel group.
  45. Functional (non-control) sockets
  46. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  47. Some services are expected to provide some IPC for other processes in
  48. the system, not just the privileged wheel group, as a part of their normal
  49. operations (think for instance host name resolution service). Sockets for
  50. this kind of IPC are created in some other directory, not in /run/ctrl:
  51. /run/comm/resolved <-- functional
  52. /run/ctrl/resolved <-- control
  53. In this example, /run/resolved is available for all processes in the system,
  54. and allows the clients to perform name resolution only, while the control
  55. socket requires privileged access and allows to reconfigure the resolver
  56. (set DNS forwarders and so on).
  57. These functional sockets in general will follow different conventions,
  58. like for instance allowing large number of concurrent connections, and
  59. may use different protocols for communication.
  60. Per-command permissions
  61. ~~~~~~~~~~~~~~~~~~~~~~~
  62. The default assumption regarding control sockets is that anyone who can
  63. connect() can run all the commands there. There is no generic way to restrict
  64. particular commands.
  65. The service may require credentials to be passed as a part of its control
  66. protocol, and do something based on those, but so far it does not look like
  67. this trick will be used for permissions check.
  68. Sticky sockets
  69. ~~~~~~~~~~~~~~
  70. This is a proposal for a kernel change that would make the scheme described
  71. above simpler and more reliable. Also, this is the reason for not attempting
  72. userspace workarounds.
  73. Linux already allows creating sockets with mknod, even though the resulting
  74. FS nodes are completely useless. The idea is to make them useful, in a way
  75. that won't break existing code too much.
  76. Let's call a local socket with the sticky bit set a "sticky socket".
  77. This combination is meaningless and should never happen with conventional
  78. use of local sockets, so we can assign pretty much semantics to it.
  79. Trying to bind() a sticky socket should
  80. * fail with EBUSY if it is already bound by another process
  81. * fail with EPERM unless the calling process owns the socket
  82. (euid = uid of the socket), or it has CAP_CHOWN
  83. * succeed otherwise
  84. With sticky sockets, permission setup becomes simple and straightforward:
  85. mknod /run/ctrl/wsupp s
  86. chgrp wifi /run/ctrl/wsupp
  87. ...
  88. exec wimon
  89. The service only needs to know the name of the socket to bind.
  90. All passwd/group parsing code remains in chown where it belong.
  91. Wire protocol for control sockets
  92. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  93. The common protocol several long-running services in minibase use for their
  94. control sockets is a simplified version of generic netlink (GENL) protocol.
  95. The parties exchange messages over a SEQPACKET connection.
  96. Each message consists of a command optionally followed by attributes:
  97. +-----+-----+ Attributes:
  98. | len | cmd | <-- header +-----+-----+
  99. +-----+-----+ +-----+-----+ | len | key |
  100. | len | key | <-- attribute | len | key | +-----+-----+
  101. +-----+-----+ +-----+-----+ | some _nul |
  102. | value.... | | 3322 1100 | | l_te rmin |
  103. | ...... | +-----+-----+ | ated stri |
  104. +-----+-----+ 0x00112233 | ng\0 | <--+
  105. | len | key | <-- attribute +-----------+ |
  106. +-----+-----+ +-----+-----+ |
  107. | value.. | | len | key | +-----+-----+ |
  108. +-----+-----+ +-----+-----+ | len | key | |
  109. | len | key | <-- attribute | 7766 5544 | +-----+-----+ |
  110. +-----+-----+ | 3322 1100 | | 0F45 A8B3 | |
  111. | value.... | +-----+-----+ | 1379 | <--+
  112. | ......... | 0x0011..77 +-----+-----+ |
  113. +-----+-----+ raw MAC |
  114. |
  115. <- 4 bytes -> padded to 4 bytes
  116. All integers are host-endian. Lengths are in bytes and include respective
  117. headers, but do not include padding. Attributes are always padded to 4 bytes.
  118. For string attributes, length includes the terminating \0.
  119. Attributes may be nested. The payload of the enclosing attribute is then
  120. a sequence of attributes.
  121. When stored in memory, the message itself shares format with attributes.
  122. However, on the wire, the length is not included in the packet payload,
  123. and goes as metadata through the socket API instead.
  124. Communication is assumed to be synchronous (request-reply). The service
  125. replies with .command == 0 on success, .command = (-errno) < 0 on failure.
  126. There is exactly one reply for each request. It is assumed that the client
  127. knows what kind of reply to expect for each request issued.
  128. Replies with .command > 0 are notifications not caused by client requests.
  129. Pulling notifications out of the stream should leave a valid request-reply
  130. sequence. It is up to the service to decide whether to use them, and how.
  131. Clients that do not expect notifications should treat them as protocol errors.
  132. Commands are service-specific. Negative values are expected to be system-wide
  133. errno(3) codes. Error messages may include attributes.
  134. Attribute keys are service-specific. The service defines which keys should be
  135. used for each command. Current implementation silently ignores unexpected keys.
  136. The same key may be used several times within the same payload if both parties
  137. are known to expect this. If multiple uses of the key are not expected,
  138. the first attribute with the key is used and the rest get silently ignored.
  139. Integer payloads shorter than 4 bytes should be extended to 4 bytes for
  140. transmission.
  141. Differences between nlusctl and GENL
  142. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  143. The protocol was originally called "netlink-based userspace control protocol",
  144. and there were plans to share some of the code. Since then, the protocols have
  145. diverged a lot. The general format of the attributes is still the same though.
  146. nlusctl only runs over point-to-point connections. Netlink (RTNL and GENL)
  147. apparently have some support for ethernet-like point-to-net communication,
  148. which would explain .pid field in the request.
  149. nlusctl does not support asynchronous communication modes. So no .seq and
  150. no ACK or REQUEST flags.
  151. There are no multi-part replies in nlusctl. Where applicable, the client
  152. has to request the next part (entry, packet, whatever) explicitly using
  153. an index of some sort.
  154. In GENL, DUMP flags affect the meaning of the .cmd field.
  155. In nlusctl, distinct values of the .command field are used for this purpose.
  156. GENL commands have .version field, nlusctl is expected to use distinct .cmd
  157. values -- if it is going to be needed at all.
  158. Combining all that, nlusctl drops most of the fields found in GENL headers,
  159. and removes distinction between struct nlmsg/nlgen/nlerr. This was one of the
  160. biggest reasons to choose a custom protocol over GENL.
  161. GENL and nlusctl use different encoding for lists of similar items within
  162. the same payload. GENL, because of the very weird way they parse the messages
  163. within the kernel, must use a nested attribute with 0, 1, 2, ... keys:
  164. [ ATTR_SOMETHING,
  165. [ 0, value ],
  166. [ 1, value ],
  167. [ 2, value ] ]
  168. In contrast, nlusctl uses top-level multi-keys for this purpose:
  169. [ ATTR_SOMETHING, value ],
  170. [ ATTR_SOMETHING, value ],
  171. [ ATTR_SOMETHING, value ]
  172. The trick GENL uses needs an extra header and breaks key => type-of-payload
  173. relation since the enumeration keys may happen to match (and often do match)
  174. unrelated ATTR_* constants.
  175. Otherwise the format of the attributes is the same in GENL, RTNL and nlusct.
  176. This was done intentionally to share as much parsing code as possible.
  177. Library support
  178. ~~~~~~~~~~~~~~~
  179. See ../lib/nlusctl.h and ../lib/nlusctl/