123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302 |
- Filename: 302-padding-machines-for-onion-clients.txt
- Title: Hiding onion service clients using padding
- Author: George Kadianakis, Mike Perry
- Created: Thursday 16 May 2019
- Status: Closed
- Implemented-In: 0.4.1.1-alpha
- NOTE: Please look at section 3 of padding-spec.txt now, not this document.
- 0. Overview
- Tor clients use "circuits" to do anonymous communications. There are various
- types of circuits. Some of them are for navigating the normal Internet,
- others are for fetching Tor directory information, others are for connecting
- to onion services, while others are simply for measurements and testing.
- It's currently possible for MITM type of adversaries (like tor-network-level
- and local-area-network adversaries) to distinguish Tor circuit types from
- each other using a wide array of metadata and distinguishers.
- In this proposal, we study various techniques that can be used to
- distinguish client-side onion service circuits and provide WTF-PAD circuit
- padding machines (using prop#254) to hide them against certain adversaries.
- 1. Motivation
- We are writing this proposal for various reasons:
- 1) We believe that in an ideal setting MITM adversaries should not be able
- to distinguish circuit types by inspecting traffic. Tor traffic should
- look amorphous to an outside observer to maximize uncertainty and
- anonymity properties.
- Client-side onion service circuits are an easy target for this proposal,
- because we believe we can improve their privacy with low bandwidth
- overhead.
- 2) We want to start experimenting with the WTF-PAD subsystem of Tor, and
- this use-case provides us with a good testbed.
- 3) We hope that by actually starting to use the WTF-PAD subsystem of Tor, we
- will encourage more researchers to start experimenting with it.
- 2. Scope of the proposal [SCOPE]
- Given the above, this proposal sets forth to use the WTF-PAD system to hide
- client-side onion service circuits against the classifiers of paper by Kwon
- et al. above.
- By client-side onion service circuits we refer to these two types of circuits:
- - Client-side introduction circuits: Circuit from client to the introduction point
- - Client-side rendezvous circuits: Circuit from client to the rendezvous point
- Service-side onion service circuits are not in scope for this proposal, and
- this is because hiding those would require more bandwidth and also more
- advanced WTF-PAD features.
- Furthermore, this proposal only aims to cloak the naive distinguishing
- features mentioned in the [KNOWN_DISTINGUISHERS] section, and can by no
- means guarantee that client-side onion service circuits are totally
- indistinguishable by other means.
- The machines specified in this proposal are meant to be lightweight and
- created for a specific purpose. This means that they can be easily extended
- with additional states to do more advanced hiding.
- 3. Known distinguishers against onion service circuits [KNOWN_DISTINGUISHERS]
- Over the past years it's been assumed that motivated adversaries can
- distinguish onion-service traffic from normal Tor traffic given their
- special characteristics.
- As far as we know, there has been relatively little research-level work done
- to this direction. The main article published in this area is the USENIX
- paper "Circuit Fingerprinting Attacks: Passive Deanonymization of Tor Hidden
- Services" by Kwon et al. [0]
- The above paper deals with onion service circuits in sections 3.2 and 5.1.
- It uses the following three "naive" circuit features to distinguish circuits:
- 1) Circuit construction sequence
- 2) Number of incoming and outgoing cells
- 3) Duration of Activity ("DoA")
- All onion service circuits have particularly loud signatures to the above
- characteristics, but WTF-PAD (prop#254) gives us tools to effectively
- silence those signatures to the point where the paper's classifiers won't
- work.
- 4. Hiding circuit features using WTF-PAD
- According to section [KNOWN_DISTINGUISHERS] there are three circuit features
- we are attempting to hide. Here is how we plan to do this using the WTF-PAD
- system:
- 1) Circuit construction sequence
- The USENIX paper uses the directions of the first 10 cells sent in a
- circuit to fingerprint them. Client-side onion service circuits have
- unique circuit construction sequences and hence they can be fingeprinted
- using just the first 10 cells.
- We use WTF-PAD to destroy this feature of onion service circuits by
- carefully sending padding cells (relay DROP cells) during circuit
- construction and making them look exactly like most general tor circuits
- up till the end of the circuit construction sequence.
- 2) Number of incoming and outgoing cells
- The USENIX paper uses the amount of incoming and outgoing cells to
- distinguish circuit types. For example, client-side introduction circuits
- have the same amount of incoming and outgoing cells, whereas client-side
- rendezvous circuits have more incoming than outgoing cells.
- We use WTF-PAD to destroy this feature by changing the number of cells
- sent in introduction circuits. We leave rendezvous circuits as is, since
- the actual rendezvous traffic flow usually resembles well normal Tor
- circuits.
- 3) Duration of Activity ("DoA")
- The USENIX paper uses the period of time during which circuits send and
- receive cells to distinguish circuit types. For example, client-side
- introduction circuits are really short lived, wheras service-side
- introduction circuits are very long lived. OTOH, rendezvous circuits have
- the same median lifetime as general Tor circuits which is 10 minutes.
- We use WTF-PAD to destroy this feature of client-side introduction
- circuits by setting a special WTF-PAD option, which keeps the circuits
- open for 10 minutes completely mimicking the DoA of general Tor circuits.
- 4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
- In this section we give an overview of how circuit construction looks like
- to a network or guard-level adversary. We use this knowledge to make the
- right padding machines that can make intro and rend circuits look like these
- general circuits.
- In particular, most general Tor circuits used to surf the web or download
- directory information, start with the following 6-cell relay cell sequence (cells
- surrounded in [brackets] are outgoing, the others are incoming):
- [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
- When this is done, the client has established a 3-hop circuit and also
- opened a stream to the other end. Usually after this comes a series of DATA
- cell that either fetches pages, establishes an SSL connection or fetches
- directory information:
- [DATA] -> [DATA] -> DATA -> DATA
- The above stream of 10 relay cells defines the grand majority of general
- circuits that come out of Tor browser during our testing, and it's what we
- are gonna use to make introduction and rednezvous circuits blend in.
- Please note that in this section we only investigate relay cells and not
- connection-level cells like CREATE/CREATED or AUTHENTICATE/etc. that are
- used during the link-layer handshake. The rationale is that connection-level
- cells depend on the type of guard used and are not an effective fingerprint
- for a network/guard-level adversary.
- 5. WTF-PAD machines
- For the purposes of this proposal we will make use of four WTF-PAD machines
- as follows:
- - Client-side introduction circuit hiding machine (origin-side)
- - Client-side introduction circuit hiding machine (relay-side)
- - Client-side rendezvous circuit hiding machine (origin-side)
- - Client-side rendezvous circuit hiding machine (relay-side)
- In the following sections we will analyze these machines.
- 5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
- These two machines are meant to hide client-side introduction circuits. The
- origin-side machine sits on the client and sends padding towards the
- introduction circuit, whereas the relay-side machine sits on the middle-hop
- (second hop of the circuit) and sends padding towards the client. The
- padding from the origin-side machine terminates at the middle-hop and does
- not get forwarded to the actual introduction point.
- Both of these machines only get activated for introduction circuits, and
- only after an INTRODUCE1 cell has been sent out.
- This means that before the machine gets activated our cell flow looks like this:
- [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRODUCE1]
- Comparing the above with section [CIRCCONSTRUCTION], we see that the above
- cell sequence matches the one from general circuits up to the first 7 cells.
- However, in normal introduction circuits this is followed by an
- INTRODUCE_ACK and then the circuit gets teared down, which does not match
- the sequence from [CIRCCONSTRUCTION].
- Hence when our machine is used, after sending an [INTRODUCE1] cell, we also
- send a [PADDING_NEGOTIATE] cell, which gets answered by a PADDING_NEGOTIATED
- cell and an INTRODUCE_ACKED cell. This makes us match the [CIRCCONSTRUCTION]
- sequence up to the first 10 cells.
- After that, we continue sending padding from the relay-side machine so as to
- fake a directory download, or an SSL connection setup. We also want to
- continue sending padding so that the connection stays up longer to destroy
- the "Duration of Activity" fingerprint.
- To calculate the padding overhead, we see that the origin-side machine just
- sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine
- sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means
- that the average overhead of this machine is 11 padding cells.
- In terms of WTF-PAD terminology, these machines have three states (START,
- OBF, END). They move from the START to OBF state when the first
- non-padding cell is received on the circuit, and they stay in the OBF
- state until all the padding gets depleted. The OBF state is controlled by
- a histogram which specifies the parameters described in the paragraphs
- above. After all the padding finishes, it moves to END state.
- We also set a special WTF-PAD flag which keeps the circuit open even after
- the introduction is performed. In particular, with this feature the circuit
- will stay alive for the same durations as normal web circuits before they
- expire (usually 10 minutes).
- 5.2. Client-side rendezvous circuit hiding machines
- The rendezvous circuit machines apply on client-side rendezvous circuits and
- only after the rendezvous point has been established (REND_ESTABLISHED has
- been received). Up to that point, the following cell sequence has been
- observed on the circuit:
- [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [ESTABLISH_REND] -> REND_ESTABLISHED
- which matches the general circuit construction sequence [CIRCCONSTRUCTION]
- up to the first 6 cells. However after that, normal rendezvous circuits
- receive a RENDEZVOUS2 cell followed by a [BEGIN] and a CONNECTED, which does
- not fit the circuit construction sequence we are trying to imitate.
- Hence our machine gets activated right after REND_ESTABLISHED is received,
- and continues by sending a [PADDING_NEGOTIATE] and a [DROP] cell, before
- receiving a PADDING_NEGOTIATED and a DROP cell, effectively blending into
- the general circuit construction sequence on the first 10 cells.
- After that our machine gets deactivated, and we let the actual rendezvous
- circuit shape the traffic flow. Since rendezvous circuits usually immitate
- general circuits (their purpose is to surf the web), we can expect that they
- will look alike.
- In terms of overhead, this machine is quite light. Both sides send 2 padding
- cells, for a total of 4 padding cells.
- 6. Overhead analysis
- Given the parameters above, intro circuit machines have an overhead of 11
- padding cells, and rendezvous circuit machines have an overhead of 4
- cpadding ells. . This means that for every intro and rendezvous circuit
- there will be an overhead of 15 padding cells in average, which is about
- 7.5kb.
- In the PrivCount paper [1] we learn that the Tor network sees about 12
- million successful descriptor fetches per day. We can use this figure to
- assume that the Tor network also sees about 12 million intro and rendezvous
- circuits per day. Given the 7.5kb overhead of each of these circuits, we get
- that our padding machines infer an additional 94GB overhead per day on the
- network, which is about 3.9GB per hour.
- XXX Isn't this kinda intense????? Using the graphs from metrics we see that
- the Tor network has total capacity of 300 Gbit/s which is about 135000GB per
- hour, so 3.9GB per hour is not that much, but still...
- 7. Discussion
- 7.1. Alternative approaches
- These machines try to hide onion service client-side circuits by obfuscating
- their looks. This is a reasonable approach, but if the resulting circuits
- look unlike any other Tor circuits, they would still be fingerprintable just
- by that fact.
- Another approach we could take is make normal client circuits look like
- onion service circuits, or just make normal clients establish fake onion
- service circuits periodically. The hope here is that the adversary won't be
- able to distinguish fake onion service circuits from real ones. This
- approach has not been taken yet, mainly because it requires additional
- WTF-PAD features and poses greater overhead risks.
- 7.2. Future work
- As discussed in [SCOPE], this proposal only aims to hide some very specific
- features of client-side onion service circuits. There is lots of work to be
- done here to see what other features can be used to distinguish such
- circuits, and also what other classifiers can be built using deep learning
- and whatnot.
- ---
- [0]: https://www.usenix.org/node/190967
- https://blog.torproject.org/technical-summary-usenix-fingerprinting-paper
- [1]: "Understanding Tor Usage with Privacy-Preserving Measurement"
- by Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, and Micah Sherr
- In Proceedings of the Internet Measurement Conference 2018 (IMC 2018).
|