203-https-frontend.txt 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278
  1. Filename: 203-https-frontend.txt
  2. Title: Avoiding censorship by impersonating an HTTPS server
  3. Author: Nick Mathewson
  4. Created: 24 Jun 2012
  5. Status: Obsolete
  6. Note: Obsoleted-by pluggable transports.
  7. Overview:
  8. One frequently proposed approach for censorship resistance is that
  9. Tor bridges ought to act like another TLS-based service, and deliver
  10. traffic to Tor only if the client can demonstrate some shared
  11. knowledge with the bridge.
  12. In this document, I discuss some design considerations for building
  13. such systems, and propose a few possible architectures and designs.
  14. Background:
  15. Most of our previous work on censorship resistance has focused on
  16. preventing passive attackers from identifying Tor bridges, or from
  17. doing so cheaply. But active attackers exist, and exist in the wild:
  18. right now, the most sophisticated censors use their anti-Tor passive
  19. attacks only as a first round of filtering before launching a
  20. secondary active attack to confirm suspected Tor nodes.
  21. One idea we've been talking about for a while is that of having a
  22. service that looks like an HTTPS service unless a client does some
  23. particular secret thing to prove it is allowed to use it as a Tor
  24. bridge. Such a system would still succumb to passive traffic
  25. analysis attacks (since the packet timings and sizes for HTTPS don't
  26. look that much like Tor), but it would be enough to beat many current
  27. censors.
  28. Goals and requirements:
  29. We should make it impossible for a passive attacker who examines only
  30. a few packets at a time to distinguish Tor->Bridge traffic from an
  31. HTTPS client talking to an HTTPS server.
  32. We should make it impossible for an active attacker talking to the
  33. server to tell a Tor bridge server from a regular HTTPS server.
  34. We should make it impossible for an active attacker who can MITM the
  35. server to learn from the client whether it thought it was connecting
  36. to an HTTPS server or a Tor bridge. (This implies that an MITM
  37. attacker shouldn't be able to learn anything that would help it
  38. convince the server to act like a bridge.)
  39. It would be nice to minimize the required code changes to Tor, and
  40. the required code changes to any other software.
  41. It would be good to avoid any requirement of close integration with
  42. any particular HTTP or HTTPS implementation.
  43. If we're replacing our own profile with that of an HTTPS service, we
  44. should do so in a way that lets us use the profile of a popular
  45. HTTPS implementation.
  46. Efficiency would be good: layering TLS inside TLS is best avoided if
  47. we can.
  48. Discussion:
  49. We need an actual web server; HTTP and HTTPS are so complicated that
  50. there's no practical way to behave in a bug-compatible way with any
  51. popular webserver short of running that webserver.
  52. More obviously, we need a TLS implementation (or we can't implement
  53. HTTPS), and we need a Tor bridge (since that's the whole point of
  54. this exercise).
  55. So from a top-level point of view, the question becomes: how shall we
  56. wire these together?
  57. There are three obvious ways; I'll discuss them in turn below.
  58. Design #1: TLS in Tor
  59. Under this design, Tor accepts HTTPS connections, decides which ones
  60. don't look like the Tor protocol, and relays them to a webserver.
  61. +--------------------------------------+
  62. +------+ TLS | +------------+ http +-----------+ |
  63. | User |<------> | Tor Bridge |<----->| Webserver | |
  64. +------+ | +------------+ +-----------+ |
  65. | trusted host/network |
  66. +--------------------------------------+
  67. This approach would let us use a completely unmodified webserver
  68. implementation, but would require the most extensive changes in Tor:
  69. we'd need to add yet another flavor to Tor's TLS ice cream parlor,
  70. and try to emulate a popular webserver's TLS behavior even more
  71. thoroughly.
  72. To authenticate, we would need to take a hybrid approach, and begin
  73. forwarding traffic to the webserver as soon as a webserver
  74. might respond to the traffic. This could be pretty complicated,
  75. since it requires us to have a model of how the webserver would
  76. respond to any given set of bytes. As a workaround, we might try
  77. relaying _all_ input to the webserver, and only replying as Tor in
  78. the cases where the website hasn't replied. (This would likely
  79. create recognizable timing patterns, though.)
  80. The authentication itself could use a system akin to Tor proposals
  81. 189/190, where an early AUTHORIZE cell shows knowledge of a shared
  82. secret if the client is a Tor client.
  83. Design #2: TLS in the web server
  84. +----------------------------------+
  85. +------+ TLS | +------------+ tor0 +-----+ |
  86. | User |<------> | Webserver |<------->| Tor | |
  87. +------+ | +------------+ +-----+ |
  88. | trusted host/network |
  89. +----------------------------------+
  90. In this design, we write an Apache module or something that can
  91. recognize an authenticator of some kind in an HTTPS header, or
  92. recognize a valid AUTHORIZE cell, and respond by forwarding the
  93. traffic to a Tor instance.
  94. To avoid the efficiency issue of doing an extra local
  95. encrypt/decrypt, we need to have the webserver talk to Tor over a
  96. local unencrypted connection. (I've denoted this as "tor0" in the
  97. diagram above.) For implementation convenience, we might want to
  98. implement that as a NULL TLS connection, so that the Tor server code
  99. wouldn't have to change except to allow local NULL TLS connections in
  100. this configuration.
  101. For the Tor handshake to work properly here, we'll need a way for the
  102. Tor instance to know which public key the webserver is configured to
  103. use.
  104. We wouldn't need to support the parts of the Tor link protocol used
  105. to authenticate clients to servers: relays shouldn't be using this
  106. subsystem at all.
  107. The Tor client would need to connect and prove its status as a Tor
  108. client. If the client uses some means other than AUTHORIZE cells, or
  109. if we want to do the authentication in a pluggable transport, and we
  110. therefore decided to offload the responsibility for TLS itself to the
  111. pluggable transport, that would scare me: Supporting pluggable
  112. transports that have the responsibility for TLS would make it fairly
  113. easy to mess up the crypto, and I'd rather not have it be so easy to
  114. write a pluggable transport that accidentally makes Tor less secure.
  115. Design #3: Reverse proxy
  116. +----------------------------------+
  117. | +-------+ http +-----------+ |
  118. | | |<------>| Webserver | |
  119. +------+ TLS | | | +-----------+ |
  120. | User |<------> | Proxy | |
  121. +------+ | | | tor0 +-----------+ |
  122. | | |<------>| Tor | |
  123. | +-------+ +-----------+ |
  124. | trusted host/network |
  125. +----------------------------------+
  126. In this design, we write a server-side proxy to sit in front of Tor
  127. and a webserver, or repurpose some existing HTTPS proxy. Its role
  128. will be to do TLS, and then forward connections to Tor or the
  129. webserver as appropriate. (In the web world, this kind of thing is
  130. called a "reverse proxy", so that's the term I'm using here.)
  131. To avoid fingerprinting, we should choose a proxy that's already in
  132. common use as a TLS front-end for webservers -- nginx, perhaps.
  133. Unfortunately, the more popular tools here seem to be pretty complex,
  134. and the simpler tools less widely deployed. More investigation would
  135. be needed.
  136. The authorization considerations would be as in Design #2 above; for
  137. the reasons discussed there, it's probably a good idea to build the
  138. necessary authorization into Tor itself.
  139. I generally like this design best: it lets us isolate the "Check for
  140. a valid authenticator and/or a valid or invalid HTTP header, and
  141. react accordingly" question to a single program.
  142. How to authenticate: The easiest way
  143. Designing a good MITM-resistant AUTHORIZE cell, or an equivalent
  144. HTTP header, is an open problem that we should solve in proposals
  145. 190 and 191 and their successors. I'm calling it out-of-scope here;
  146. please see those proposals, their attendant discussion, and their
  147. eventual successors.
  148. How to authenticate: a slightly harder way
  149. Some proposals in this vein have in the past suggested a special
  150. HTTP header to distinguish Tor connections from non-Tor connections.
  151. This could work too, though it would require substantially larger
  152. changes on the Tor client's part, would still require the client
  153. take measures to avoid MITM attacks, and would also require the
  154. client to implement a particular browser's http profile.
  155. Some considerations on distinguishability
  156. Against a passive eavesdropper, the easiest way to avoid
  157. distinguishability in server responses will be to use an actual web
  158. server or reverse web proxy's TLS implementation.
  159. (Distinguishability based on client TLS use is another topic
  160. entirely.)
  161. Against an active non-MITM attacker, the best probing attacks will be
  162. ones designed to provoke the system into acting in ways different from
  163. those in which a webserver would act: responding earlier than a web
  164. server would respond, or later, or differently. We need to make sure
  165. that, whatever the front-end program is, it answers anything that
  166. would qualify as a well-formed or ill-formed HTTP request whenever
  167. the web server would. This must mean, for example, that whatever the
  168. correct form of client authorization turns out to be, no prefix of
  169. that authorization is ever something that the webserver would respond
  170. to. With some web servers (I believe), that's as easy as making sure
  171. that any valid authenticator isn't too long, and doesn't contain a CR
  172. or LF character. With others, the authenticator would need to be a
  173. valid HTTP request, with all the attendant difficulty that would
  174. raise.
  175. Against an attacker who can MITM the bridge, the best attacks will be
  176. to wait for clients to connect and see how they behave. In this
  177. case, the client probably needs to be able to authenticate the bridge
  178. certificate as presented in the initial TLS handshake -- or some
  179. other aspect of the TLS handshake if we're feeling insane. If the
  180. certificate or handshake isn't as expected, the client should behave
  181. as a web browser that's just received a bad TLS certificate. (The
  182. alternative there would be to try to impersonate an HTTPS client that
  183. has just accepted a self-signed certificate. But that would probably
  184. require the Tor client to impersonate a full web browser, which isn't
  185. realistic.)
  186. Side note: What to put on the webserver?
  187. To credibly pretend not to be ourselves, we must pretend to be
  188. something else in particular -- and something not easily identifiable
  189. or inherently worthless. We should not, for example, have all
  190. deployments of this kind use a fixed website, even if that website is
  191. the default "Welcome to Apache" configuration: A censor would
  192. probably feel that they weren't breaking anything important by
  193. blocking all unconfigured websites with nothing on them.
  194. Therefore, we should probably conceive of a system like this as
  195. "Something to add to your HTTPS website" rather than as a standalone
  196. installation.
  197. Related work:
  198. meek [1] is a pluggable transport that uses HTTP for carrying bytes
  199. and TLS for obfuscation. Traffic is relayed through a third-party
  200. server (Google App Engine). It uses a trick to talk to the third
  201. party so that it looks like it is talking to an unblocked server.
  202. meek itself is not really about HTTP at all. It uses HTTP only
  203. because it's convenient and the big Internet services we use as cover
  204. also use HTTP. meek uses HTTP as a transport, and TLS for
  205. obfuscation, but the key idea is really "domain fronting," where it
  206. appears to the censor you are talking to one domain (www.google.com),
  207. but behind the scenes you are talking to another
  208. (meek-reflect.appspot.com). The meek-server program is an ordinary
  209. HTTP (not necessarily even HTTPS!) server, whose communication is
  210. easily fingerprintable; but that doesn't matter because the censor
  211. never sees that part of the communication, only the communication
  212. between the client and CDN.
  213. One way to think about the difference: if a censor (somehow) learns
  214. the IP address of a bridge as described in this proposal, it's easy
  215. and low-cost for the censor to block that bridge by IP address. meek
  216. aims to make it much more expensive: even if you know a domain is
  217. being used (in part) for circumvention, in order to block it have to
  218. block something important like the Google frontend or CloudFlare
  219. (high collateral damage).
  220. 1. https://trac.torproject.org/projects/tor/wiki/doc/meek