254-padding-negotiation.txt 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417
  1. Filename: 254-padding-negotiation.txt
  2. Title: Padding Negotiation
  3. Authors: Mike Perry
  4. Created: 01 September 2015
  5. Status: Needs-Revision
  6. 0. Overview
  7. This proposal aims to describe mechanisms for requesting various types
  8. of padding from relays.
  9. These padding primitives are general enough to use to defend against
  10. both website traffic fingerprinting as well as hidden service circuit
  11. setup fingerprinting.
  12. 1. Motivation
  13. Tor already supports both link-level padding via (CELL_PADDING cell
  14. types), as well as circuit-level padding (via RELAY_COMMAND_DROP relay
  15. cells).
  16. Unfortunately, there is no way for clients to request padding from
  17. relays, or request that relays not send them padding to conserve
  18. bandwidth. This proposal aims to create a mechanism for clients to do
  19. both of these.
  20. It also establishes consensus parameters to limit the amount of padding
  21. that relays will send, to prevent custom wingnut clients from requesting
  22. too much.
  23. 2. Link-level padding
  24. Padding is most useful if it can defend against a malicious or
  25. compromised guard node. However, link-level padding is still useful to
  26. defend against an adversary that can merely observe a Guard node
  27. externally, such as for low-resolution netflow-based attacks (see
  28. Proposal 251[1]).
  29. In that scenario, the primary negotiation mechanism we need is a way for
  30. mobile clients to tell their Guards to stop padding, or to pad less
  31. often. The following Trunnel payload should cover the needed
  32. parameters:
  33. const CHANNELPADDING_COMMAND_STOP = 1;
  34. const CHANNELPADDING_COMMAND_START = 2;
  35. /* The start command tells the relay to alter its min and max netflow
  36. timeout range values, and send padding at that rate (resuming
  37. if stopped). The stop command tells the relay to stop sending
  38. link-level padding. */
  39. struct channelpadding_negotiate {
  40. u8 version IN [0];
  41. u8 command IN [CHANNELPADDING_COMMAND_START, CHANNELPADDING_COMMAND_STOP];
  42. /* Min must not be lower than the current consensus parameter
  43. nf_ito_low. Ignored if command is stop. */
  44. u16 ito_low_ms;
  45. /* Max must not be lower than ito_low_ms. Ignored if command is stop. */
  46. u16 ito_high_ms;
  47. };
  48. After the above cell is received, the guard should use the 'ito_low_ms' and
  49. 'ito_high_ms' values as the minimum and maximum values (respectively) for
  50. inactivity before it decides to pad the channel. The actual timeout value is
  51. randomly chosen between those two values through an appropriate probability
  52. distribution (see proposal251 for the netflow padding protocol).
  53. More complicated forms of link-level padding can still be specified
  54. using the primitives in Section 3, by using "leaky pipe" topology to
  55. send the RELAY commands to the Guard node instead of to later nodes in
  56. the circuit.
  57. Because the above link-level padding only sends padding cells if the link is
  58. idle, it can be used in combination with the more complicated circuit-level
  59. padding below, without compounding overhead effects.
  60. 3. End-to-end circuit padding
  61. For circuit-level padding, we need the ability to schedule a statistical
  62. distribution of arbitrary padding to overlay on top of non-padding
  63. traffic (aka "Adaptive Padding").
  64. The statistical mechanisms that define padding are known as padding
  65. machines. Padding machines can be hardcoded in Tor, specified in the
  66. consensus, and custom research machines can be listed in Torrc.
  67. 3.1. Padding Machines
  68. Circuits can have either one or two state machines at both the origin and at a
  69. specified middle hop.
  70. Each state machine can contain up to three states ("Start", "Burst" and "Gap")
  71. governing their behavior, as well as an "END" state. Not all states need to be
  72. used.
  73. Each state of a padding machine specifies either:
  74. * A histogram describing inter-arrival cell delays; OR
  75. * A parameterized delay probability distribution for inter-arrival cell delays
  76. In either case, the lower bound of the delay probability distribution can be
  77. specified as the start_usec parameter, and/or it can be learned by measuring
  78. the RTT of the circuit at the middle node. For client-side machines, RTT
  79. measurement is always set to 0. RTT measurement at the middle node is
  80. calculated by measuring the difference between the time of arrival of an
  81. received cell (ie: away from origin) and the time of arrival of a sent cell
  82. (ie: towards origin). The RTT is continually updated so long as two cells do
  83. not arrive back-to-back in either direction. If the most recent measured RTT
  84. value is larger than our measured value so far, this larger value is used. If
  85. the most recent measured RTT value is lower than our measured value so far, it
  86. is averaged with our current measured value. (We favor longer RTTs slightly in
  87. this way, because circuits are growing away from the middle node and becoming
  88. longer).
  89. If the histogram is used, it has an additional special "infinity" bin that
  90. means "infinite delay".
  91. The state can also provide an optional parameterized distribution that
  92. specifies how many total cells (or how many padding cells) can be sent on the
  93. circuit while the machine is in this state, before it transitions to a new
  94. state.
  95. Each state of a padding machine can react to the following cell events:
  96. * Non-padding cell received
  97. * Padding cell received
  98. * Non-padding cell sent
  99. * Padding cell sent
  100. Additionally, padding machines emit the following internal events to themselves:
  101. * Infinity bin was selected
  102. * The histogram bins are empty
  103. * The length count for this state was exceeded
  104. Each state of the padding machine specifies a set of these events that cause
  105. it to cancel any pending padding, and a set of events that cause it to
  106. transition to another state, or transition back itself.
  107. When an event causes a transition to a state (or back to the same state), a
  108. delay is sampled from the histogram or delay distribution, and padding cell is
  109. scheduled to be sent after that delay.
  110. If a non-padding cell is sent before the timer, the timer is canceled and a
  111. new padding delay is chosen.
  112. 3.1.1. Histogram Specification
  113. If a histogram is used by a state (as opposed to a fixed parameterized
  114. distribution), then each of the histograms' fields represent a probability
  115. distribution that is encoded into bins of exponentially increasing width.
  116. The first bin of the histogram (bin 0) has 0 width, with a delay value of
  117. start_usec+rtt_estimate (from the machine definition, and rtt estimate above).
  118. The remaining bins are exponentially spaced, starting at this offset and
  119. covering the range of the histogram, which is range_usec.
  120. The intermediate bins thus divide the timespan range_usec with offset
  121. start_usec+rtt_estimate, so that smaller bin indexes represent narrower time
  122. ranges, doubling up until the last bin. The last bin before the "infinity bin"
  123. thus covers [start_usec+rtt_estimate+range_usec/2,
  124. start_usec+rtt_estimate+range_usec).
  125. This exponentially increasing bin width allows the histograms to most
  126. accurately represent small interpacket delay (where accuracy is needed), and
  127. devote less accuracy to larger timescales (where accuracy is not as
  128. important).
  129. To sample the delay time to send a padding packet, perform the
  130. following:
  131. * Select a bin weighted by the number of tokens in its index compared to
  132. the total.
  133. * If the infinity bin is selected, do not schedule padding.
  134. * If bin 0 is selected, schedule padding at exactly its time value.
  135. * For other bins, uniformly sample a time value between this bin and
  136. the next bin, and schedule padding then.
  137. 3.1.1.1. Histogram Token Removal
  138. Tokens can be optionally removed from histogram bins whenever a padding or
  139. non-padding packet is sent. With this token removal, the histogram functions
  140. as an overall target delay distribution for the machine while it is in that
  141. state.
  142. If token removal is enabled, when a padding packet is sent, a token is removed
  143. from the bin corresponding to the target delay. When a non-padding packet is
  144. sent, the actual delay from the previous packet is calculated, and the
  145. histogram bin corresponding to that delay is inspected. If that bin has
  146. tokens remaining, it is decremented.
  147. If the bin has no tokens left, the state removes a token from a different bin,
  148. as specified in its token removal rule. The following token removal options
  149. are defined:
  150. * None -- Never remove any tokens
  151. * Exact -- Only remove from the target bin, if it is empty, ignore it.
  152. * Higher -- Remove from the next higher non-empty bin
  153. * Lower -- Remove from the next higher non-empty bin
  154. * Closest -- Remove from the closest non-empty bin by index
  155. * Closest_time -- Remove from the closest non-empty bin by index, by time
  156. When all bins exept the infinity bin are empty in a histogram, the padding
  157. machine emits the internal "bins empty" event to itself.
  158. Bin 0 and the bin before the infinity bin both have special rules for purposes
  159. of token removal. While removing tokens, all values less than bin 0 are
  160. treated as part of bin 0, and all values greater than
  161. start_usec+rtt_estimate+range_sec are treated as part of the bin before the
  162. infinity bin. Tokens are not removed from the infinity bin when non-padding is
  163. sent. (They are only removed when an "infinite" delay is chosen).
  164. 3.1.2. Delay Probability Distribution
  165. Alternatively, a delay probability distribution can be used instead of a
  166. histogram, to sample padding delays.
  167. In this case, the designer also needs to specify the appropriate distribution
  168. parameters, and when a padding cell needs to be scheduled, the padding
  169. subsystem will sample a positive delay value (in microseconds) from that
  170. distribution (where the minimum delay is range_usec+start_usec as is the case
  171. for histograms).
  172. We currently support the following probability distributions:
  173. Uniform, Logistic, Log-logistic, Geometric, Weibull, Pareto
  174. 3.2. State Machine Selection
  175. Clients will select which of the defined available padding machines to use
  176. based on the conditions that these machines specify. These conditions include:
  177. * How many hops the circuit must be in order for the machine to apply
  178. * If the machine requires vanguards to be enabled to apply
  179. * The state the circuit must be in for machines to apply (building,
  180. relay early cells remaining, opened, streams currently attached).
  181. * If the circuit purpose matches a set of purposes for the machine.
  182. * If the target hop of the machine supports circuit padding.
  183. Clients will only select machines whose conditions fully match given circuits.
  184. A machine is represented by a positive number that can be thought of as a "menu
  185. option" through the list of padding machines. The currently supported padding
  186. state machines are:
  187. [1]: CIRCPAD_MACHINE_CIRC_SETUP
  188. A padding machine that obscures the initial circuit setup in an
  189. attempt to hide onion services.
  190. 3.3. Machine Negotiation
  191. When a machine is selected, the client uses leaky-pipe delivery to send a
  192. RELAY_COMMAND_PADDING_NEGOTIATE to the target hop of the machine, using the
  193. following trunnel relay cell payload format:
  194. /**
  195. * This command tells the relay to alter its min and max netflow
  196. * timeout range values, and send padding at that rate (resuming
  197. * if stopped). */
  198. struct circpad_negotiate {
  199. u8 version IN [0];
  200. u8 command IN [CIRCPAD_COMMAND_START, CIRCPAD_COMMAND_STOP];
  201. /** Machine type is left unbounded because we can specify
  202. * new machines in the consensus */
  203. u8 machine_type;
  204. };
  205. Upon receipt of a RELAY_COMMAND_PADDING_NEGOTIATE cell, the middle node sends
  206. a RELAY_COMMAND_PADDING_NEGOTIATED with the following format:
  207. /**
  208. * This command tells the relay to alter its min and max netflow
  209. * timeout range values, and send padding at that rate (resuming
  210. * if stopped). */
  211. struct circpad_negotiated {
  212. u8 version IN [0];
  213. u8 command IN [CIRCPAD_COMMAND_START, CIRCPAD_COMMAND_STOP];
  214. u8 response IN [CIRCPAD_RESPONSE_OK, CIRCPAD_RESPONSE_ERR];
  215. /** Machine type is left unbounded because we can specify
  216. * new machines in the consensus */
  217. u8 machine_type;
  218. };
  219. The 'machine_type' field should be the same as the one from the
  220. PADDING_NEGOTIATE cell. This is because, as an optimization, new machines can
  221. be installed at the client side immediately after tearing down an old machine.
  222. If the response machine type does not match the current machine type, the
  223. response was for a previous machine, and can be ignored.
  224. If the response field is CIRCPAD_RESPONSE_OK, padding was successfully
  225. negotiated. If it is CIRCPAD_RESPONSE_ERR, the machine is torn down and we do
  226. not pad.
  227. 4. Examples of Padding Machines
  228. In the original WTF-PAD design[2], the state machines are used as follows:
  229. The "Burst" histogram specifies the delay probabilities for sending a
  230. padding packet after the arrival of a non-padding data packet.
  231. The "Gap" histogram specifies the delay probabilities for sending
  232. another padding packet after a padding packet was just sent from this
  233. node. This self-triggering property of the "Gap" histogram allows the
  234. construction of multi-packet padding trains using a simple statistical
  235. distribution.
  236. Both "Gap" and "Burst" histograms each have a special "Infinity" bin,
  237. which means "We have decided not to send a packet".
  238. Intuitively, the burst state is used to detect when the line is idle
  239. (and should therefore have few or no tokens in low histogram bins). The
  240. lack of tokens in the low histogram bins causes the system to remain in
  241. the burst state until the actual application traffic either slows,
  242. stalls, or has a gap.
  243. The gap state is used to fill in otherwise idle periods with artificial
  244. payloads from the server (and should have many tokens in low bins, and
  245. possibly some also at higher bins). In this way, the gap state either
  246. generates entirely fake streams of cells, or extends real streams with
  247. additional cells.
  248. The Adaptive Padding Early implementation[3] uses parameterized distributions
  249. instead of histograms, but otherwise uses the states in the same way.
  250. It should be noted that due to our generalization of these states and their
  251. transition possibilities, more complicated interactions are also possible. For
  252. example, it is possible to simulate circuit extension, so that all circuits
  253. appear to continue to extend up until the RELAY_EARLY cell count is depleted.
  254. It is also possible to create machines that simulate traffic on unused
  255. circuits, or mimic onion service activity on clients that aren't otherwise
  256. using onion services.
  257. 5. Security considerations and mitigations
  258. The risks from this proposal are primarily DoS/resource exhaustion, and
  259. side channels.
  260. 5.1. Rate limiting
  261. Current research[2,3] indicates that padding should be be effective against
  262. website traffic fingerprinting at overhead rates of 50-60%. Circuit setup
  263. behavior can be concealed with far less overhead.
  264. We recommend that three consensus parameters be used in the event that
  265. the network is being overloaded from padding to such a degree that
  266. padding requests should be ignored:
  267. * circpad_global_max_padding_pct
  268. - The maximum percent of sent padding traffic out of total traffic
  269. to allow in a tor process before ceasing to pad. Ex: 75 means
  270. 75 padding packets for every 100 non-padding+padding packets.
  271. This definition is consistent with the overhead values in Proposal
  272. #265, though it does not take node position into account.
  273. * circpad_global_allowed_cells
  274. - The number of padding cells that must be transmitted before the
  275. global ratio limit is applied.
  276. Additionally, each machine can specify its own per-machine limits for
  277. the allowed cell counters and padding overhead percentages.
  278. When either global or machine limits are reached, padding is no longer
  279. scheduled. The machine simply becomes idle until the overhead drops below
  280. the threshold.
  281. Finally, the consensus can also be used to specify that clients should
  282. use only machines that are flagged as reduced padding, or disable circuit
  283. padding entirely, with the following two parameters:
  284. * circpad_padding_reduced=1
  285. - Tells clients to only use padding machines with the
  286. 'reduced_padding_ok' machine condition flag set.
  287. * circpad_padding_disabled=1
  288. - Tells clients to stop circuit padding immediately, and not negotiate
  289. any further padding machines.
  290. 5.2. Overhead accounting
  291. In order to monitor the quantity of padding to decide if we should alter
  292. these limits in the consensus, every node will publish the following
  293. values in a padding-counts line in its extra-info descriptor:
  294. * read_drop_cell_count
  295. - The number of RELAY_DROP cells read by this relay.
  296. * write_drop_cell_count
  297. - The number of RELAY_DROP cells sent by this relay.
  298. Each of these counters will be rounded to the nearest 10,000 cells. This
  299. rounding parameter will also be listed in the extra-info descriptor line, in
  300. case we change it in a later release.
  301. In the future, we may decide to introduce Laplace Noise in a similar
  302. manner to the hidden service statistics, to further obscure padding
  303. quantities.
  304. 5.3. Side channels
  305. In order to prevent relays from introducing side channels by requesting
  306. padding from clients, all of the padding negotiation commands are only
  307. valid in the outgoing (from the client/OP) direction.
  308. Similarly, to prevent relays from sending malicious padding from arbitrary
  309. circuit positions, if RELAY_DROP cells arrive from a hop other than that
  310. with which padding was negotiated, this cell is counted as invalid for
  311. purposes of CIRC_BW control port fields, allowing the vanguards addon to
  312. close the circuit upon detecting this activity.
  313. -------------------
  314. 1. https://gitweb.torproject.org/torspec.git/tree/proposals/251-netflow-padding.txt
  315. 2. https://www.cs.kau.se/pulls/hot/thebasketcase-wtfpad/
  316. 3. https://www.cs.kau.se/pulls/hot/thebasketcase-ape/