An analysis of Tor's security

Introduction

This page presents the mathematics behind my estimate of how much security Tor provides given a specific dollar budget. This page is for persons trained both in mathematics and in Tor's design.

Assumptions

To convert a dollar budget into a relay count, I divide the budget by $25 per relay (per month), a value that considers networking costs incurred. Feel free to quibble with this exact dollar value — the page allows you to set the value however you see fit. I do not include the cost of personnel to run the system. I am not considering that higher bandwidth relays would garner more circuits. And more. This is a basic model that I expect is not far off, and you can easily download the code and run yourself with different parameters.

If you think these considerations are important or missing from my model, I would reply: Tor provides no analysis at all for its users. It provides no specific dollar-based evaluation of its level of protection for its users. Instead the Tor Project makes vague claims of protection. Please, I welcome your own analysis that would estimate the costs more accurately. I intend to start a discussion, not to provide the final answer.

The point of my analysis is to show that the cost of success for an adversary is quite low overall as a starting point. Tor's protections are, when considering its fundamentals, exceptionally weak and they are not equivalent to the strength of cryptographic encryption (e.g., breaking AES). No one needs to "break encryption" to get around Tor's protections; that's a misconception. Tor instead provides very, very weak protection based on randomly selecting relays. The relays are run by strangers. If you are running Tor you are misplacing your trust in strangers.

Given the budget of the adversary, how do I determine what fraction is spent on guards and exits respectively? Given an adversary budget, I brute-force possible allocations between guards and exits and report the allocation that maximizes success. This runs very quickly. Surely guards and exits have different bandwidth costs and that could be added to the code if desired.

I make an assumption that the adversary is already part of the system. You can tweak the code to evaluate the security of users if instead the budget represents the relays an adversary can add to the system. (The code is javascript in your browser.)

I assume that selection of relays as guards or exits is independent and with replacement. This makes the math slightly easier and is not far off.

Onion Services

Here, the user is running an onion service. The goal of the adversary is to determine the IP address of the onion service.

In this case, the adversary's goal is to be selected at random as a guard node of the onion service. This approach is described in a 2006 paper [pdf]. The adversary sets up a single node to connect to the onion service and sends traffic in a pattern that is recognizable by its own relays. If one of the adversary's guards happens to be selected as the guard of the Onion Service, we have success.

We say the adversary is in control of $x$ guard relays out of $N$ total in the Tor network. The onion service starts by selecting three guards: one primary and two backups. The adversary makes an unlimited number of attempts to connect to the onion service; she needs only one to be a success.

The probability of adversary success, $p$, is the probability that the adversary controls at least one guard relay out of three:

$p = 1 - \bigl(1 - \frac{x}{N}\bigr)^3$.

Secure Drop

In this case, the user is visiting a secure drop site. The goal of the adversary is to determine the IP address of the user connecting to the secure drop site.

The adversary is again in control of $x$ guard relays out of $N$ total in the Tor network. Each time a user connects to the secure drop site, they select a guard relay at random per instructions. The secure drop site has also selected three guards.

This is just a special case of onion services, with two parts. First, the probability that the user selects a guard relay controlled by the adversary for one or more of its $j$ connections is $1 - \bigl(1 - \frac{x}{N}\bigr)^{j}$. Second, the probability that the secure drop site selects one or more guards controlled by the adversary is $1 - \bigl(1 - \frac{x}{N}\bigr)^{3}$.

Since both must be true, the probability of adversary success is the product of the two probabilities:

$p = \bigl(1 - (1 - \tfrac{x}{N})^{j}\bigr)\cdot\bigl(1 - (1 - \tfrac{x}{N})^{3}\bigr)$.

Tor Browser

Here the user is using Tor Browser to connect to the clear web, and the goal of the adversary is to determine what web site they are visiting. This analysis has nothing to do with Onion Sites.

This analysis is the most complicated. To start, the user selects three guards, two of which are backups. Assume the primary guard is used a fraction $\alpha$ of the time (code assumes 90%). Assume the user makes $J$ connections to the same destination.

Let the following notation be used:

Symbol	Description
$C_i$	the event the adversary controls guard $i$, where $i = 1, 2, \text{or } 3$.
$G_{ij}$	the event the user chooses guard $i$ for attempt $j$.
$S_j$	the event the adversary succeeds in attempt $j$.
$p_x$	the probability the adversary controls any given exit relay. Estimated as the fraction current exits that are controlled by the adversary.
$p_g$	the probability the adversary controls any given guard relay. We estimate this prob as the fraction current guards that are controlled by the adversary.
$J$	the total number of circuits attempted.

The probability of adversary success is the probability that for at least one of the $J$ connections the adversary controls both the guard and the exit. The overall probability is the sum over success from all possible combinations of guards and exits:

$P[S] = P[S,C_1,C_2,C_3] + P[S,\neg C_1,C_2,C_3] + P[S,C_1,\neg C_2,C_3] + \ldots$

Each term can be expanded by conditional probability. Consider as an example this one term:

$P[S,\neg C_1,C_2,C_3] = P[S\mid \neg C_1,C_2,C_3]\cdot P[\neg C_1,C_2,C_3]$.

The first factor expands as follows:

$P[S\mid \neg C_1,C_2,C_3]$

$= 1 - P\Bigl(\bigcap_{j}\neg S_j \;\Big|\; \neg C_1,C_2,C_3\Bigr)$

$= 1 - \prod_{j} P[\neg S_j \mid \neg C_1,C_2,C_3]$

$= 1 - P[\neg S_1 \mid \neg C_1,C_2,C_3]^{\,J}$

$= 1 - \bigl(1 - P[S_1 \mid \neg C_1,C_2,C_3]\bigr)^{J}$

Before continuing, let's note a couple things. First, the probability of success is the same for each connection attempt, so we may express the event in terms of the first connection. Hence the derivation switches to $S_1$ below. Second, given that the adversary controls guard $i$ and the user selects guard $i$, the probability of success on that circuit is the probability the adversary controls the exit is:

$P[S_j\mid G_{ij},C_i] = P[S_1\mid G_{i1},C_i] = p_x.$

Now we compute that inner conditional probability.

$P[S_j\mid \neg C_1,C_2,C_3]$

$= \sum_i P[S_j, G_{ij} \mid \neg C_1,C_2,C_3]$

$= \sum_i P[S_j\mid G_{ij}, \neg C_1,C_2,C_3]\;P[G_{ij}]$

$= P[S_1\mid G_{2j}, C_2]\;P[G_{2j}] + P[S_1\mid G_{3j}, C_3]\;P[G_{3j}]$

$= p_x\cdot\bigl(P[G_{2j}] + P[G_{3j}]\bigr)$

With these defintions, we have enough to compute the probability. I used javascript to creating the risk sliders on the main page, and so you can download the source code to see how I did it.

Symbol	Description
\(C_i\)	the event the adversary controls guard \(i\), where \(i = 1, 2, \text{or } 3\).
\(G_{ij}\)	the event the user chooses guard \(i\) for attempt \(j\).
\(S_j\)	the event the adversary succeeds in attempt \(j\).
\(p_x\)	the probability the adversary controls any given exit relay. Estimated as the fraction current exits that are controlled by the adversary.
\(p_g\)	the probability the adversary controls any given guard relay. We estimate this prob as the fraction current guards that are controlled by the adversary.
\(J\)	the total number of circuits attempted.