123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278 |
- Filename: 203-https-frontend.txt
- Title: Avoiding censorship by impersonating an HTTPS server
- Author: Nick Mathewson
- Created: 24 Jun 2012
- Status: Obsolete
- Note: Obsoleted-by pluggable transports.
- Overview:
- One frequently proposed approach for censorship resistance is that
- Tor bridges ought to act like another TLS-based service, and deliver
- traffic to Tor only if the client can demonstrate some shared
- knowledge with the bridge.
- In this document, I discuss some design considerations for building
- such systems, and propose a few possible architectures and designs.
- Background:
- Most of our previous work on censorship resistance has focused on
- preventing passive attackers from identifying Tor bridges, or from
- doing so cheaply. But active attackers exist, and exist in the wild:
- right now, the most sophisticated censors use their anti-Tor passive
- attacks only as a first round of filtering before launching a
- secondary active attack to confirm suspected Tor nodes.
- One idea we've been talking about for a while is that of having a
- service that looks like an HTTPS service unless a client does some
- particular secret thing to prove it is allowed to use it as a Tor
- bridge. Such a system would still succumb to passive traffic
- analysis attacks (since the packet timings and sizes for HTTPS don't
- look that much like Tor), but it would be enough to beat many current
- censors.
- Goals and requirements:
- We should make it impossible for a passive attacker who examines only
- a few packets at a time to distinguish Tor->Bridge traffic from an
- HTTPS client talking to an HTTPS server.
- We should make it impossible for an active attacker talking to the
- server to tell a Tor bridge server from a regular HTTPS server.
- We should make it impossible for an active attacker who can MITM the
- server to learn from the client whether it thought it was connecting
- to an HTTPS server or a Tor bridge. (This implies that an MITM
- attacker shouldn't be able to learn anything that would help it
- convince the server to act like a bridge.)
- It would be nice to minimize the required code changes to Tor, and
- the required code changes to any other software.
- It would be good to avoid any requirement of close integration with
- any particular HTTP or HTTPS implementation.
- If we're replacing our own profile with that of an HTTPS service, we
- should do so in a way that lets us use the profile of a popular
- HTTPS implementation.
- Efficiency would be good: layering TLS inside TLS is best avoided if
- we can.
- Discussion:
- We need an actual web server; HTTP and HTTPS are so complicated that
- there's no practical way to behave in a bug-compatible way with any
- popular webserver short of running that webserver.
- More obviously, we need a TLS implementation (or we can't implement
- HTTPS), and we need a Tor bridge (since that's the whole point of
- this exercise).
- So from a top-level point of view, the question becomes: how shall we
- wire these together?
- There are three obvious ways; I'll discuss them in turn below.
- Design #1: TLS in Tor
- Under this design, Tor accepts HTTPS connections, decides which ones
- don't look like the Tor protocol, and relays them to a webserver.
- +--------------------------------------+
- +------+ TLS | +------------+ http +-----------+ |
- | User |<------> | Tor Bridge |<----->| Webserver | |
- +------+ | +------------+ +-----------+ |
- | trusted host/network |
- +--------------------------------------+
- This approach would let us use a completely unmodified webserver
- implementation, but would require the most extensive changes in Tor:
- we'd need to add yet another flavor to Tor's TLS ice cream parlor,
- and try to emulate a popular webserver's TLS behavior even more
- thoroughly.
- To authenticate, we would need to take a hybrid approach, and begin
- forwarding traffic to the webserver as soon as a webserver
- might respond to the traffic. This could be pretty complicated,
- since it requires us to have a model of how the webserver would
- respond to any given set of bytes. As a workaround, we might try
- relaying _all_ input to the webserver, and only replying as Tor in
- the cases where the website hasn't replied. (This would likely
- create recognizable timing patterns, though.)
- The authentication itself could use a system akin to Tor proposals
- 189/190, where an early AUTHORIZE cell shows knowledge of a shared
- secret if the client is a Tor client.
- Design #2: TLS in the web server
- +----------------------------------+
- +------+ TLS | +------------+ tor0 +-----+ |
- | User |<------> | Webserver |<------->| Tor | |
- +------+ | +------------+ +-----+ |
- | trusted host/network |
- +----------------------------------+
- In this design, we write an Apache module or something that can
- recognize an authenticator of some kind in an HTTPS header, or
- recognize a valid AUTHORIZE cell, and respond by forwarding the
- traffic to a Tor instance.
- To avoid the efficiency issue of doing an extra local
- encrypt/decrypt, we need to have the webserver talk to Tor over a
- local unencrypted connection. (I've denoted this as "tor0" in the
- diagram above.) For implementation convenience, we might want to
- implement that as a NULL TLS connection, so that the Tor server code
- wouldn't have to change except to allow local NULL TLS connections in
- this configuration.
- For the Tor handshake to work properly here, we'll need a way for the
- Tor instance to know which public key the webserver is configured to
- use.
- We wouldn't need to support the parts of the Tor link protocol used
- to authenticate clients to servers: relays shouldn't be using this
- subsystem at all.
- The Tor client would need to connect and prove its status as a Tor
- client. If the client uses some means other than AUTHORIZE cells, or
- if we want to do the authentication in a pluggable transport, and we
- therefore decided to offload the responsibility for TLS itself to the
- pluggable transport, that would scare me: Supporting pluggable
- transports that have the responsibility for TLS would make it fairly
- easy to mess up the crypto, and I'd rather not have it be so easy to
- write a pluggable transport that accidentally makes Tor less secure.
- Design #3: Reverse proxy
- +----------------------------------+
- | +-------+ http +-----------+ |
- | | |<------>| Webserver | |
- +------+ TLS | | | +-----------+ |
- | User |<------> | Proxy | |
- +------+ | | | tor0 +-----------+ |
- | | |<------>| Tor | |
- | +-------+ +-----------+ |
- | trusted host/network |
- +----------------------------------+
- In this design, we write a server-side proxy to sit in front of Tor
- and a webserver, or repurpose some existing HTTPS proxy. Its role
- will be to do TLS, and then forward connections to Tor or the
- webserver as appropriate. (In the web world, this kind of thing is
- called a "reverse proxy", so that's the term I'm using here.)
- To avoid fingerprinting, we should choose a proxy that's already in
- common use as a TLS front-end for webservers -- nginx, perhaps.
- Unfortunately, the more popular tools here seem to be pretty complex,
- and the simpler tools less widely deployed. More investigation would
- be needed.
- The authorization considerations would be as in Design #2 above; for
- the reasons discussed there, it's probably a good idea to build the
- necessary authorization into Tor itself.
- I generally like this design best: it lets us isolate the "Check for
- a valid authenticator and/or a valid or invalid HTTP header, and
- react accordingly" question to a single program.
- How to authenticate: The easiest way
- Designing a good MITM-resistant AUTHORIZE cell, or an equivalent
- HTTP header, is an open problem that we should solve in proposals
- 190 and 191 and their successors. I'm calling it out-of-scope here;
- please see those proposals, their attendant discussion, and their
- eventual successors.
- How to authenticate: a slightly harder way
- Some proposals in this vein have in the past suggested a special
- HTTP header to distinguish Tor connections from non-Tor connections.
- This could work too, though it would require substantially larger
- changes on the Tor client's part, would still require the client
- take measures to avoid MITM attacks, and would also require the
- client to implement a particular browser's http profile.
- Some considerations on distinguishability
- Against a passive eavesdropper, the easiest way to avoid
- distinguishability in server responses will be to use an actual web
- server or reverse web proxy's TLS implementation.
- (Distinguishability based on client TLS use is another topic
- entirely.)
- Against an active non-MITM attacker, the best probing attacks will be
- ones designed to provoke the system into acting in ways different from
- those in which a webserver would act: responding earlier than a web
- server would respond, or later, or differently. We need to make sure
- that, whatever the front-end program is, it answers anything that
- would qualify as a well-formed or ill-formed HTTP request whenever
- the web server would. This must mean, for example, that whatever the
- correct form of client authorization turns out to be, no prefix of
- that authorization is ever something that the webserver would respond
- to. With some web servers (I believe), that's as easy as making sure
- that any valid authenticator isn't too long, and doesn't contain a CR
- or LF character. With others, the authenticator would need to be a
- valid HTTP request, with all the attendant difficulty that would
- raise.
- Against an attacker who can MITM the bridge, the best attacks will be
- to wait for clients to connect and see how they behave. In this
- case, the client probably needs to be able to authenticate the bridge
- certificate as presented in the initial TLS handshake -- or some
- other aspect of the TLS handshake if we're feeling insane. If the
- certificate or handshake isn't as expected, the client should behave
- as a web browser that's just received a bad TLS certificate. (The
- alternative there would be to try to impersonate an HTTPS client that
- has just accepted a self-signed certificate. But that would probably
- require the Tor client to impersonate a full web browser, which isn't
- realistic.)
- Side note: What to put on the webserver?
- To credibly pretend not to be ourselves, we must pretend to be
- something else in particular -- and something not easily identifiable
- or inherently worthless. We should not, for example, have all
- deployments of this kind use a fixed website, even if that website is
- the default "Welcome to Apache" configuration: A censor would
- probably feel that they weren't breaking anything important by
- blocking all unconfigured websites with nothing on them.
- Therefore, we should probably conceive of a system like this as
- "Something to add to your HTTPS website" rather than as a standalone
- installation.
- Related work:
- meek [1] is a pluggable transport that uses HTTP for carrying bytes
- and TLS for obfuscation. Traffic is relayed through a third-party
- server (Google App Engine). It uses a trick to talk to the third
- party so that it looks like it is talking to an unblocked server.
- meek itself is not really about HTTP at all. It uses HTTP only
- because it's convenient and the big Internet services we use as cover
- also use HTTP. meek uses HTTP as a transport, and TLS for
- obfuscation, but the key idea is really "domain fronting," where it
- appears to the censor you are talking to one domain (www.google.com),
- but behind the scenes you are talking to another
- (meek-reflect.appspot.com). The meek-server program is an ordinary
- HTTP (not necessarily even HTTPS!) server, whose communication is
- easily fingerprintable; but that doesn't matter because the censor
- never sees that part of the communication, only the communication
- between the client and CDN.
- One way to think about the difference: if a censor (somehow) learns
- the IP address of a bridge as described in this proposal, it's easy
- and low-cost for the censor to block that bridge by IP address. meek
- aims to make it much more expensive: even if you know a domain is
- being used (in part) for circumvention, in order to block it have to
- block something important like the Google frontend or CloudFlare
- (high collateral damage).
- 1. https://trac.torproject.org/projects/tor/wiki/doc/meek
|