268-guard-selection.txt 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510
  1. Filename: 268-guard-selection.txt
  2. Title: New Guard Selection Behaviour
  3. Author: Isis Lovecruft, George Kadianakis, [Ola Bini]
  4. Created: 2015-10-28
  5. Status: Obsolete
  6. (Editorial note: this was origianlly written as a revision of
  7. proposal 259, but it diverges so substantially that it seemed
  8. better to assign it a new number for reference, so that we
  9. aren't always talking about "The old 259" and "the new 259". -NM)
  10. This proposal has been obsoleted by proposal #271.
  11. §1. Overview
  12. Tor uses entry guards to prevent an attacker who controls some
  13. fraction of the network from observing a fraction of every user's
  14. traffic. If users chose their entries and exits uniformly at
  15. random from the list of servers every time they build a circuit,
  16. then an adversary who had (k/N) of the network would deanonymize
  17. F=(k/N)^2 of all circuits... and after a given user had built C
  18. circuits, the attacker would see them at least once with
  19. probability 1-(1-F)^C. With large C, the attacker would get a
  20. sample of every user's traffic with probability 1.
  21. To prevent this from happening, Tor clients choose a small number of
  22. guard nodes (currently 3). These guard nodes are the only nodes
  23. that the client will connect to directly. If they are not
  24. compromised, the user's paths are not compromised.
  25. But attacks remain. Consider an attacker who can run a firewall
  26. between a target user and the Tor network, and make
  27. many of the guards they don't control appear to be unreachable.
  28. Or consider an attacker who can identify a user's guards, and mount
  29. denial-of-service attacks on them until the user picks a guard
  30. that the attacker controls.
  31. In the presence of these attacks, we can't continue to connect to
  32. the Tor network unconditionally. Doing so would eventually result
  33. in the user choosing a hostile node as their guard, and losing
  34. anonymity.
  35. This proposal outlines a new entry guard selection algorithm, which
  36. addresses the following concerns:
  37. - Heuristics and algorithms for determining how and which guard(s)
  38. is(/are) chosen should be kept as simple and easy to understand
  39. as possible.
  40. - Clients in censored regions or who are behind a fascist firewall
  41. who connect to the Tor network should not experience any
  42. significant disadvantage in terms of reachability or usability.
  43. - Tor should make a best attempt at discovering the most
  44. appropriate behaviour, with as little user input and
  45. configuration as possible.
  46. §2. Design
  47. Alice, an OP attempting to connect to the Tor network, should
  48. undertake the following steps to determine information about the
  49. local network and to select (some) appropriate entry guards. In the
  50. following scenario, it is assumed that Alice has already obtained a
  51. recent, valid, and verifiable consensus document.
  52. The algorithm is divided into four components such that the full
  53. algorithm is implemented by first invoking START, then repeatedly
  54. calling NEXT while adviced it SHOULD_CONTINUE and finally calling
  55. END. For an example usage see §A. Appendix.
  56. Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE
  57. is used for the algorithm to be able to tell the caller whether we
  58. consider the work done or not - this can be used to retry primary
  59. guards when we finally are able to connect to a guard after a long
  60. network outage, for example.
  61. This algorithm keeps track of the unreachability status for guards
  62. in state global to the system, so that repeated runs will not have
  63. to rediscover unreachability over and over again. However, this
  64. state does not need to be persisted permanently - it is purely an
  65. optimization.
  66. The algorithm expects several arguments to guide its behavior. These
  67. will be defined in §2.1.
  68. The goal of this algorithm is to strongly prefer connecting to the
  69. same guards we have connected to before, while also trying to detect
  70. conditions such as a network outage. The way it does this is by keeping
  71. track of how many guards we have exposed ourselves to, and if we have
  72. connected to too many we will fall back to only retrying the ones we have
  73. already tried. The algorithm also decides on sample set that should
  74. be persisted - in order to minimize the risk of an attacker forcing
  75. enumeration of the whole network by triggering rebuilding of
  76. circuits.
  77. §2.1. Definitions
  78. Bad guard: a guard is considered bad if it conforms with the function IS_BAD
  79. (see §G. Appendix for details).
  80. Dead guard: a guard is considered dead if it conforms with the function
  81. IS_DEAD (see §H. Appendix for details).
  82. Obsolete guard: a guard is considered obsolete if it conforms with the
  83. function IS_OBSOLETE (see §I. Appendix for details).
  84. Live entry guard: a guard is considered live if it conforms with the function
  85. IS_LIVE (see §D. Appendix for details).
  86. §2.1. The START algorithm
  87. In order to start choosing an entry guard, use the START
  88. algorithm. This takes four arguments that can be used to fine tune
  89. the workings:
  90. USED_GUARDS
  91. This is a list that contains all the guards that have been used
  92. before by this client. We will prioritize using guards from this
  93. list in order to minimize our exposure. The list is expected to
  94. be sorted based on priority, where the first entry will have the
  95. highest priority.
  96. SAMPLED_GUARDS
  97. This is a set that contains all guards that should be considered
  98. for connection. This set should be persisted between runs. It
  99. should be filled by using NEXT_BY_BANDWIDTH with GUARDS as an
  100. argument if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
  101. guards after winnowing out older guards.
  102. N_PRIMARY_GUARDS
  103. The number of guards we should consider our primary
  104. guards. These guards will be retried more frequently and will
  105. take precedence in most situations. By default the primary
  106. guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
  107. When the algorith is used in constrained mode (have bridges or entry
  108. nodes in the configuration file), this value should be 1 otherwise the
  109. proposed value is 3.
  110. DIR
  111. If this argument is set, we should only consider guards that can
  112. be directory guards. If not set, we will consider all guards.
  113. The primary work of START is to initialize the state machine depicted
  114. in §2.2. The initial state of the machine is defined by:
  115. GUARDS
  116. This is a set of all guards from the consensus. It will primarily be used
  117. to fill in SAMPLED_GUARDS
  118. FILTERED_SAMPLED
  119. This is a set that contains all guards that we are willing to connect to.
  120. It will be obtained from calling FILTER_SET with SAMPLED_GUARDS as
  121. argument.
  122. REMAINING_GUARDS
  123. This is a running set of the guards we have not yet tried to connect to.
  124. It should be initialized to be FILTERED_SAMPLED without USED_GUARDS.
  125. STATE
  126. A variable that keeps track of which state in the state
  127. machine we are currently in. It should be initialized to
  128. STATE_PRIMARY_GUARDS.
  129. PRIMARY_GUARDS
  130. This list keeps track of our primary guards. These are guards
  131. that we will prioritize when trying to connect, and will also
  132. retry more often in case of failure with other guards.
  133. It should be initialized by calling algorithm
  134. NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains
  135. N_PRIMARY_GUARDS elements.
  136. §2.2. The NEXT algorithm
  137. The NEXT algorithm is composed of several different possibly flows. The
  138. first one is a simple state machine that can transfer between two
  139. different states. Every time NEXT is invoked, it will resume at the
  140. state where it left off previously. In the course of selecting an
  141. entry guard, a new consensus can arrive. When that happens we need
  142. to update the data structures used, but nothing else should change.
  143. Before jumping in to the state machine, we should first check if it
  144. was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried
  145. any of the PRIMARY_GUARDS. If this is the case, and we are not in
  146. STATE_PRIMARY_GUARDS, we should save the previous state and set the
  147. state to STATE_PRIMARY_GUARDS.
  148. §2.2.1. The STATE_PRIMARY_GUARDS state
  149. Return each entry in PRIMARY_GUARDS in turn. For each entry, if the
  150. guard should be retried and considered suitable use it. A guard is
  151. considered to eligible to retry if is marked for retry or is live
  152. and id not bad. Also, a guard is considered to be suitable if is
  153. live and, if is a directory it should not be a cache.
  154. If all entries have been tried transition to STATE_TRY_REMAINING.
  155. §2.2.2. The STATE_TRY_REMAINING state
  156. Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in
  157. turn.For each entry, if a guard is found return it.
  158. Return each entry from REMAINING_GUARDS in turn.
  159. For each entry, if the guard should be retried and considered
  160. suitable use it and mark it as unreachable. A guard is
  161. considered to eligible to retry if is marked for retry or is live
  162. and id not bad. Also, a guard is considered to be suitable if is
  163. live and, if is a directory it should not be a cache.
  164. If no entries remain in REMAINING_GUARDS, transition to
  165. STATE_PRIMARY_GUARDS.
  166. §2.2.3. ON_NEW_CONSENSUS
  167. First, ensure that all guard profiles are updated with information
  168. about whether they were in the newest consensus or not.
  169. Update the bad status for all guards in USED_GUARDS and SAMPLED_GUARDS.
  170. Remove all dead guards from USED_GUARDS and SAMPLED_GUARDS.
  171. Remove all obsolete guards from USED_GUARDS and SAMPLED_GUARDS.
  172. §2.3. The SHOULD_CONTINUE algorithm
  173. This algorithm takes as an argument a boolean indicating whether the
  174. circuit was successfully built or not.
  175. After the caller have tried to build a circuit with a returned
  176. guard, they should invoke SHOULD_CONTINUE to understand if the
  177. algorithm is finished or not. SHOULD_CONTINUE will always return
  178. true if the circuit failed. If the circuit succeeded,
  179. SHOULD_CONTINUE will always return false, unless the guard that
  180. succeeded was the first guard to succeed after
  181. INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the
  182. state to STATE_PRIMARY_GUARDS and return true.
  183. §2.4. The END algorithm
  184. The goal of this algorithm is simply to make sure that we keep track
  185. of successful connections made. This algorithm should be invoked
  186. with the guard that was used to correctly set up a circuit.
  187. Once invoked, this algorithm will mark the guard as used, and make
  188. sure it is in USED_GUARDS, by adding it at the end if it was not there.
  189. §2.5. Helper algorithms
  190. These algorithms are used in the above algorithms, but have been
  191. separated out here in order to make the flow clearer.
  192. NEXT_PRIMARY_GUARD
  193. - Return the first entry from USED_GUARDS that is not in
  194. PRIMARY_GUARDS and that is in the most recent consensus.
  195. - If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with
  196. REMAINING_GUARDS as the argument.
  197. NEXT_BY_BANDWIDTH
  198. - Takes G as an argument, which should be a set of guards to
  199. choose from.
  200. - Return a randomly select element from G, weighted by bandwidth.
  201. FILTER_SET
  202. - Takes G as an argument, which should be a set of guards to filter.
  203. - Filter out guards in G that don't comply with IS_LIVE (see
  204. §D. Appendix for details).
  205. - If the filtered set is smaller than MINIMUM_FILTERED_SAMPLE_SIZE and G
  206. is smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, expand G and try to
  207. filter out again. G is expanded by adding one new guard at a time using
  208. NEXT_BY_BANDWIDTH with GUARDS as an argument.
  209. - If G is not smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, G should not be
  210. expanded. Abort execution of this function by returning null and report
  211. an error to the user.
  212. §3. Consensus Parameters, & Configurable Variables
  213. This proposal introduces several new parameters that ideally should
  214. be set in the consensus but that should also be possible to
  215. set or override in the client configuration file. Some of these have
  216. proposed values, but for others more simulation and trial needs to
  217. happen.
  218. PRIMARY_GUARDS_RETRY_INTERVAL
  219. In order to make it more likely we connect to a primary guard,
  220. we would like to retry the primary guards more often than other
  221. types of guards. This parameter controls how many minutes should
  222. pass before we consider retrying primary guards again. The
  223. proposed value is 3.
  224. SAMPLE_SET_THRESHOLD
  225. In order to allow us to recognize completely unreachable network,
  226. we would like to avoid connecting to too many guards before switching
  227. modes. We also want to avoid exposing ourselves to too many nodes in a
  228. potentially hostile situation. This parameter, expressed as a
  229. fraction, determines the number of guards we should keep as the
  230. sampled set of the only guards we will consider connecting
  231. to. It will be used as a fraction for the sampled set.
  232. If we assume there are 1900 guards, a setting of 0.02
  233. means we will have a sample set of 38 guards.
  234. This limits our total exposure. Proposed value is 0.02.
  235. MINIMUM_FILTERED_SAMPLE_SIZE
  236. The minimum size of the sampled set after filtering out nodes based on
  237. client configuration (FILTERED_SAMPLED). Proposed value is ???.
  238. MAXIMUM_SAMPLE_SIZE_THRESHOLD
  239. In order to guarantee a minimum size of guards after filtering,
  240. we expand SAMPLED_GUARDS until a limit. This fraction of GUARDS will be
  241. used as an upper bound when expanding SAMPLED_GUARDS.
  242. Proposed value is 0.03.
  243. INTERNET_LIKELY_DOWN_INTERVAL
  244. The number of minutes since we started trying to find an entry
  245. guard before we should consider the network down and consider
  246. retrying primary guards before using a functioning guard
  247. found. Proposed value 5.
  248. §4. Security properties and behavior under various conditions
  249. Under normal conditions, this algorithm will allow us to quickly
  250. connect and use guards we have used before with high likelihood of
  251. working. Assuming the first primary guard is reachable and in the
  252. consensus, this algorithm will deterministically always return that
  253. guard.
  254. Under dystopic conditions (when a firewall is in place that blocks
  255. all ports except for potentially port 80 and 443), this algorithm
  256. will try to connect to 2% of all guards before switching modes to try
  257. dystopic guards. Currently, that means trying to connect to circa 40
  258. guards before getting a successful connection. If we assume a
  259. connection try will take maximum 10 seconds, that means it will take
  260. up to 6 minutes to get a working connection.
  261. When the network is completely down, we will try to connect to 2% of
  262. all guards plus 2% of all dystopic guards before realizing we are
  263. down. This means circa 50 guards tried assuming there are 1900 guards
  264. in the network.
  265. In terms of exposure, we will connect to a maximum of 2% of all
  266. guards plus 2% of all dystopic guards, or 3% of all guards,
  267. whichever is lower. If N is the number of guards, and k is the
  268. number of guards an attacker controls, that means an attacker would
  269. have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their
  270. guards selected before we fall back. In real terms, this means an
  271. attacker would need to control over 10% of all guards in order to
  272. have a larger than 50% chance of controlling a guard for any given client.
  273. In addition, since the sampled set changes slowly (the suggestion
  274. here is that guards in it expire every month) it is not possible for
  275. an attacker to force a connection to an entry guard that isn't
  276. already in the users sampled set.
  277. §A. Appendix: An example usage
  278. In order to clarify how this algorithm is supposed to be used, this
  279. pseudo code illustrates the building of a circuit:
  280. ESTABLISH_CIRCUIT:
  281. if chosen_entry_node = NULL
  282. if context = NULL
  283. context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards,
  284. sampled_guards=[],
  285. options,
  286. n_primary_guards=3,
  287. dir=false,
  288. guards_in_consensus)
  289. chosen_entry_node = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
  290. if not IS_SUITABLE(chosen_entry_node)
  291. try another entry guard
  292. circuit = composeCircuit(chosen_entry_node)
  293. return circuit
  294. ON_FIRST_HOP_CALLBACK(channel):
  295. if !SHOULD_CONTINUE:
  296. ALGO_CHOOSE_ENTRY_GUARD_END(entryGuard)
  297. else
  298. chosen_entry_node = NULL
  299. §B. Appendix: Entry Points in Tor
  300. In order to clarify how this algorithm is supposed to be integrated with
  301. Tor, here are some entry points to trigger actions mentioned in spec:
  302. When establish_circuit:
  303. If *chosen_entry_node* doesn't exist
  304. If *context* exist, populate the first one as *context*
  305. Otherwise, use ALGO_CHOOSE_ENTRY_GUARD_START to initalize a new *context*.
  306. After this when we want to choose_good_entry_server, we will use
  307. ALGO_CHOOSE_ENTRY_GUARD_NEXT to get a candidate.
  308. Use chosen_entry_node to build_circuit and handle_first_hop,
  309. return this circuit
  310. When entry_guard_register_connect_status(should_continue):
  311. if !should_continue:
  312. Call ALGO_CHOOSE_ENTRY_GUARD_END(chosen_entry_node)
  313. else:
  314. Set chosen_entry_node to NULL
  315. When new directory_info_has_arrived:
  316. Do ON_NEW_CONSENSUS
  317. §C. Appendix: IS_SUITABLE helper function
  318. A guard is suitable if it satisfies all of the folowing conditions:
  319. - It's considered to be live, according to IS_LIVE.
  320. - It's a directory cache if a directory guard is requested.
  321. - It's not the chosen exit node.
  322. - It's not in the family of the chosen exit node.
  323. This conforms to the existing conditions in "populate_live_entry_guards()".
  324. §D. Appendix: IS_LIVE helper function
  325. A guard is considered live if it satisfies all of the folowing conditions:
  326. - It's not disabled because of path bias issues (path_bias_disabled).
  327. - It was not observed to become unusable according to the directory or
  328. the user configuration (bad_since).
  329. - It's marked for retry (can_retry) or it's been unreachable for some
  330. time (unreachable_since) but enough time has passed since we last tried
  331. to connect to it (entry_is_time_to_retry).
  332. - It's in our node list, meaninig it's present in the latest consensus.
  333. - It has a usable descriptor (either a routerdescriptor or a
  334. microdescriptor) unless a directory guard is requested.
  335. - It's a general-purpose router unless UseBridges is configured.
  336. - It's reachable by the configuration (fascist_firewall_allows_node).
  337. This conforms to the existing conditions in "entry_is_live()".
  338. A guard is observed to become unusable according to the directory or the
  339. user configuration if it satisfies any of the following conditions:
  340. - It's not in our node list, meaninig it's present in the latest consensus.
  341. - It's not currently running (is_running).
  342. - It's not a bridge and not a configured bridge
  343. (node_is_a_configured_bridge) and UseBridges is True.
  344. - It's not a possible guard and is not in EntryNodes and UseBridges is
  345. False.
  346. - It's in ExcludeNodes. Nevertheless this is ignored when
  347. loading from config.
  348. - It's not reachable by the configuration (fascist_firewall_allows_node).
  349. - It's disabled because of path bias issues (path_bias_disabled).
  350. This conforms to the existing conditions in "entry_guards_compute_status()".
  351. §E. Appendix: UseBridges and Bridges configurations
  352. This is mutually exclusive with EntryNodes.
  353. If options->UseBridges OR options->EntryNodes:
  354. - guards = populate_live_entry_guards() - this is the "bridge flavour" of
  355. IS_SUITABLE as mentioned before.
  356. - return node_sl_choose_by_bandwidth(guards, WEIGHT_FOR_GUARD)
  357. This is "choose a guard from S by bandwidth weight".
  358. UseBridges and Bridges must be set together. Bridges go to bridge_list (via
  359. bridge_add_from_config()), but how is it used?
  360. learned_bridge_descriptor() adds the bridge to the global entry_guards if
  361. UseBridges = True.
  362. We either keep the existing global entry_guards OR incorporate bridges in the
  363. proposal (remove non bridges from USED_GUARDS, and REMAINING_GUARDS = bridges?)
  364. If UseBridges is set as true, we need to fill the SAMPLED_GUARDS
  365. with bridges specified and learned from consensus.
  366. §F. Appendix: EntryNodes configuration
  367. This is mutually exclusive with Bridges.
  368. The global entry_guards will be updated with entries in EntryNodes
  369. (see entry_guards_set_from_config()).
  370. If EntryNodes is set, we need to fill the SAMPLED_GUARDS with
  371. EntryNodes specified in options.
  372. §G. Appendix: IS_BAD helper function
  373. A guard is considered bad if is not included in the newest
  374. consensus.
  375. §H. Appendix: IS_DEAD helper function
  376. A guard is considered dead if it's marked as bad for
  377. ENTRY_GUARD_REMOVE_AFTER period (30 days) unless they have been disabled
  378. because of path bias issues (path_bias_disabled).
  379. §I. Appendix: IS_OBSOLETE helper function
  380. A guard is considered obsolete if it was chosen by an Tor
  381. version we can't recognize or it was chosen more than GUARD_LIFETIME ago.
  382. -*- coding: utf-8 -*-