Ch2-Lecture 7¶

Entropy and Entanglement Distillation¶

The theme of the last two lectures has been of a quantum information theoretic nature — we have studied cloning (or rather, lack thereof), entanglement, and non-local correlations. Before progressing to our next main theme of quantum algorithms, we now give a brief taste of more advanced ideas in quantum information. In the process, we will continue getting used to working with quantum states in both the state vector and density operator formalisms.

The main questions we ask in this lecture are the following:

How can we quantify the “amount” of entanglement in a composite quantum system?
Under what conditions can “less entangled” states be “converted” to “more entangled” states?

The first question will require the foundational concept of entropy, introduced in the context of classical information theory by Claude Shannon in 1948. The entropy is worthy of a lecture in itself, being an extremely important quantity. The second question above is tied to the discovery of entanglement distillation, in which, similar to the age-old process of distilling potable water from salt water (or more fittingly for our analogy, “pure” water from “dirty” water), one can “distill” pure entanglement from noisy entanglement.

Entropy¶

One of the most influential scientific contributions of the 20th century was the 1948 paper of Claude Shannon, “A Mathematical Theory of Communication”, which single-handed founded the field of information theory. Roughly, the aim of information theory is to study information transmission and compression. For this, Shannon introduced the notion of entropy, which intuitively quantifies “how random” a data source is, or the “average information content” of the source. It turns out that a quantum generalization of entropy will be vital to quantifying entanglement; as such, we begin by defining and motivating the classical Shannon entropy.

Shannon entropy¶

Let \(X\) be a discrete random variable taking values from set \( \left\{x_1,\dots,x_n\right\} \), where \( \text{Pr}\left(x_i\right) := \text{Pr}\left(X=x_i\right)\) denotes the probability that \(X\) takes value \(x_i\) . Then, the Shannon entropy \(H(X)\) is defined as

\[ H(X)= \sum_{i=1}^{n} -\text{Pr}\left(x_i\right) \log \left( \text{Pr}\left(x_i\right) \right) \]

Here, the logarithm is taken base 2, and we define \(0.\log\left(0\right)=0 \).

Before getting our hands dirty understanding \(H(x)\), let us step back and motivate it via data compression. Suppose we have a data source whose output we wish to transmit from (say) Germany to Canada. Naturally, we may wish to first compress the data, so that we need to transmit as few bits as possible between the two countries. Furthermore, a compression scheme is useless unless we are later able to recover or uncompress the data in Canada. This raises the natural question: How few bits can one transmit, so as to ensure recovery of the data on the receiving end? Remarkably, Shannon’s noiseless coding theorem says that this quantity is given by the entropy! Roughly, the theorem says that in order to reliably transmit N i.i.d. (independently and identically distributed) random variables \(X_i\) from a random source \(X\), it is necessary and sufficient to instead send \(H(X)\) bits of communication.

We now explore the sense in which \(H(X)\) indeed quantifies the “randomness” or “uncertainty” of \(X\) by considering two boundary cases. In the first boundary case, \(X\) models a fair coin flip, i.e. \(X\) takes value HEADS or TAILS with probability 1/2 each. Then,

\[ H(X)=- \frac{1}{2} \log\left( \frac{1}{2} \right)- \frac{1}{2} \log\left( \frac{1}{2} \right)= \frac{1}{2} + \frac{1}{2} =1 \]

Therefore, we interpret a fair coin as encoding, on average, one bit of information. Alternatively, in the information transmission setting, we would need to transmit a single bit to convey the answer of the coin flip from Germany to Canada. This intuitively makes sense — since the outcome of the coin flip is completely random, there is no way to guess its outcome in Canada with success probability greater than 1/2 (i.e. a random guess).

Send it after class 1¶

Suppose random variable \(Y\) models a biased coin, e.g. takes value HEADS and TAILS with probability 1 and 0, respectively. What is \(H(Y)\)?

In the above exercise, there is no “uncertainty”; we know the outcome will be HEADS. Thus, in the communication setting, one can interpret this as saying zero bits of communication are required to transmit the outcome of the coin flip from Germany to Canada (assuming both Germany and Canada know the probabilities of obtaining HEADS and TAILS beforehand). Indeed, the answer to the exercise above is \(H(Y) = 0\).

Send it after class 2¶

Let random variable \(Z\) take values in set \(\{0, 1, 2\}\) with probabilities \(\{1/4, 1/4, 1/2\}\), respectively. What is \(H(Z)\)

The entropy formula is odd-looking; to understand how it is derived, the key observation is the intuition behind the coin flip examples, which says that “when an event is less likely to happen, it reveals more information”. To capture this intuition, Shannon started with a formula for information content \(I(x_i)\), which for any possible event \(x_i\) for random variable \(X\), is given by

\[ I \left(x_i\right)= \log\left( \frac{1}{ \text{Pr}\left(x_i\right) } \right)=- \log\left( \text{Pr}\left(x_i\right) \right) \]

Since the log function is strictly monotonically increasing (i.e. \(I(x) > I(y)\) if \(x > y\) for \(x, y \in (0, \infty)\)), it holds that \(I(x_i)\) captures the idea that “rare events yield more information”. But \(I(x)\) also has three other important properties we expect of an “information measure”; here are the first two:

(Information is non-negative) \(I(x) \geq 0\)
if \( \text{Pr}\left(x\right) = 1\), then \(I(x) = 0\). (If an event occurs with certainty, said occurrence conveys no information)

For the third important property, we ask — why did we use the log function? Why not any other monotonically increasing function satisfying properties (1) and (2) above? Recall that, by definition, two random variables \(X\) and \(Y\) are independent if

\[ \text{Pr}\left(X = x \text{ and } Y = y\right)= \text{Pr}\left(X = x\right)\text{Pr}\left(Y = y\right) . \]

Moreover, if \(X\) and \(Y\) are independent, then intuitively one expects the information conveyed by the joint event \(z := (X = x \text{ and } Y = y)\) to be additive, i.e. \(I(z) = I(x) + I(y)\). But this is precisely what the information content I(x) satisfies, due to its use of the log function.

Send it after class 3¶

Let \(X\) and \(Y\) be independent random variables. Then, for \(z := (X = x \text{ and } Y =y)\), express \(I(z)\) in terms of \(I(x)\) and \(I(y)\).

We can now use the information content to derive the formula for entropy — \(H(X)\) is simply the expected information content over all possible events \( \left\{ x_1 \dots x_n \right\} \). (Recall here that for random variable \(X\) taking values \(x \in \{x_i\}\), its expectation \(E[x]\) is given by \(E[x] = \sum_{i} \text{Pr}\left(x_i\right). xi\) ) This is precisely why at the start of this section, we referred to \(H(x)\) as the average information content of a data source.

Von Neumann Entropy¶

Recall the first aim of this lecture was to use entropy to measure entanglement. For this, we shall require a quantum generalization of the Shannon entropy \(H(X)\), named as the von Neumann entropy \(S( \rho )\), for density operator \( \rho \). To motivate this definition, let us recall the “hierarchy of matrix classes” we introduced in discussing measurements:

Hermitian operators, \(\text{Herm} \left(\mathbb{C}^d\right) \) , which generalize the real numbers.
Positive semidefinite operators, \(\text{Pos} \left(\mathbb{C}^d\right)\) , which generalize the non-negative real numbers.
Orthogonal projection operators, \(\Pi\left(\mathbb{C}^d\right)\), which generalize the set \(\{0, 1\}\).

Note that \(\Pi\left(\mathbb{C}^d\right) \subseteq \text{Pos} \left(\mathbb{C}^d\right) \subseteq \text{Herm} \left(\mathbb{C}^d\right)\) , and that the notion of “generalize” above means the eigenvalues of the operators fall into the respective class the operators generalize. (For example, matrices in \(\text{Pos} \left(\mathbb{C}^d\right)\) have non-negative eigenvalues.) Applying this same interpretation to the set of density operators acting on \(\mathbb{C}^d\), \(D \left(\mathbb{C}^d\right)\) , we thus have that density operators generalize the notion of a probability distribution. Indeed, any probability distribution can be embedded into a diagonal density matrix.

Send it after class 4¶

Let \( \left\{p_i\right\}^d_{i=1} \) denote a probability distribution. Define diagonal matrix \( \rho \in \mathcal{L} \left( \mathbb{C}^d\right) \) such that \( \rho \left(i,i\right)=p_i \) . Show that \( \rho \) is a density matrix.

Since the eigenvalues \( \lambda_i \left( \rho \right) \) of a density operator \( \rho \in D \left( \mathbb{C} ^d\right) \) form a probability distribution, the natural approach for defining a quantum entropy formula is to apply the classical Shannon entropy to the spectrum of \( \rho \):

\[ S \left( \rho \right):=H \left( \left\{ \lambda _i \left( \rho \right) \right\}_{i=1}^d \right)= \sum_{i=1}^{d}- \lambda_i \left(\rho\right) \log\left(\lambda_i \left(\rho\right)\right) \]

Operator functions. It is important to pause now and take stock of what we have done in defining \(S( \rho )\) : In order to apply a function \(f : \mathbb{R} \mapsto \mathbb{R} \) to a Hermitian matrix \(H \in \text{Herm} \left(\mathbb{C}^d\right)\) , we instead applied \(f\) to the eigenvalues of \(H\). Why does this “work”? Let us look at the Taylor series expansion of \(f\) , which for e.g. \(f = e^x\) is (the series converges for all x)

\[ e^x= \sum_{i=1}^{\infty} \frac{x^n}{n!}=1+x+ \frac{x^2}{2!}+\frac{x^3}{3!}+\dots \]

This suggests an idea — to define \(e^H\) , perhaps we can substitute \(H\) in the right hand side of the Taylor series expansion of \(e^x\) :

\[ e^H:= I+H+ \frac{H^2}{2!}+\frac{H^3}{3!}+\dots \]

Indeed, this leads to our desired definition; that to generalize the function \(f(x) = e^x\) to Hermitian matrices, we apply \(f\) to the eigenvalues of \(H\).

Send it after class 5¶

Let \(H\) have spectral decomposition \(H = \sum_{i=1} \lambda_i \left| \lambda_i \right\rangle\!\left\langle \lambda_i \right|\). Show that

\[ e^H=\sum_{i=1} e^{\lambda_i} \left| \lambda_i \right\rangle\!\left\langle \lambda_i \right| \]

This idea of applying functions \(f : \mathbb{R} \mapsto \mathbb{R} \) to the eigenvalues of Hermitian operators is used so frequently in quantum information that we give such “generalized \(f\) ” a name — operator functions. In the case of \(S( \rho )\), by setting \(f(x) = \log\left(x\right)\), we can rewrite \(S( \rho )\) as

\[ S( \rho )=- \text{Tr}\left( \rho \log\left( \rho \right) \right) \]

Send it after class 6¶

Verify \(\sum_{i=1}^{d}- \lambda_i \left(\rho\right) \log\left(\lambda_i \left(\rho\right)\right)=- \text{Tr}\left( \rho \log\left( \rho \right) \right)\)
Let \(f (x) = x^2\) . What is \(f(X)\), \(X\) being the Pauli \(X\) operator? Why does this yield the same results as multiplying \(X\) by itself via matrix multiplication?
Let \(f (x) =\sqrt{x}\). For any pure state \( \left|\psi\right\rangle \in \mathbb{C} ^d \) define rank one density operator \( \rho = \left| \psi \right\rangle\!\left\langle \psi \right| \). What is \(\sqrt{\rho}\)?
What is \(\sqrt{Z}\) for the Pauli \(Z\) operator? Is it uniquely defined?

Properties of the von Neumann entropy¶

Let us now see how properties of \(H(X)\) carry over to \(S( \rho )\). These will prove crucial in our understanding of quantifying entanglement shortly.

When does a quantum state have no entropy? Recall in our biased coin flip example that if an outcome occurs with probability 1 in our distribution, then \(H(X) = 0\). Quantumly, the analogue of this statement is that \(S( \rho ) = 0\) if and only if \( \rho \) is a pure state, i.e. \( \rho = \left| \psi \right\rangle\!\left\langle \psi \right| \) for some \( \left|\psi\right\rangle \in \mathbb{C}^d\) . This is because a recall a pure state is the special case of a mixed state in which one of the states in the preparation procedure is picked with certainty.
When does a quantum state have maximum entropy? We saw that when \(X\) represents a fair coin flip, \(H(X) = 1\). This is, in fact, the unique distribution maximizing \(H\). Applying this directly to the definition of \(S( \rho )\), we find that \(S( \rho )\) is maximized over all \( \rho \in D \left( \mathbb{C}^2\right)\) if and only if both eigenvalues of \( \rho \) are 1/2. This implies that \( \rho = I/2\). Moreover, this statement generalizes to any dimension \(d \geq 2\) — for \( \rho \in D \left( \mathbb{C}^d\right)\) , \(S( \rho )\) is maximized if and only if \( \rho = I/d\).
Quantum information is non-negative. Since \(H(X) \geq 0\), it immediately follows by definition that \(S( \rho ) \geq 0\).
What is the quantum analogue of independent probability distributions \(X\) and \(Y\) ? Recall that in defining information content, the log function was chosen so as to ensure information is additive when two random variables \(X\) and \(Y\) are independently distributed. The quantum analogue of this has a natural expression: Let \( \rho, \sigma \in D \left( \mathbb{C}^d\right) \), be density matrices. Then, \( \rho \) and \( \sigma \) are independent if their joint state is \( \rho \otimes \sigma \). In the next exercise, you will prove that this indeed preserves our desired additivity property of information for independent quantum states.

There are other important properties of \(S( \rho )\) which we do not wish to focus on at present; for completeness, however, let us briefly mention two more: (1) For arbitrary, possibly entangled, bipartite mixed states \( \rho_{AB} \):\(S \left(\rho_{AB}\right) \leq S \left(\rho_{A}\right)+S \left(\rho_{B}\right) \) (subadditivity), and (2) \(S \left( \sum_{i=1}^{n} p_i \rho_i\right) \geq \sum_{i=1}^{n} p_i S \left(\rho_i\right) \) for \( \left\{p_i\right\}_{i=1}^{m} \) (concavity). Here, and henceforth in this course, we use the shorthand

\[ \rho_A := \text{Tr}_B\left( \rho_{AB}\right) \]

Send it after class 7¶

Prove that for any pure state \( \left|\psi\right\rangle \), \(S( \left| \psi \right\rangle\!\left\langle \psi \right| ) = 0\).
For \( \rho =I/d\), what is \(S( \rho )\)?
Prove that \(S( \rho \otimes \sigma ) = S( \rho ) + S( \sigma )\)

Quantifying entanglement in composite quantum systems¶

With the notion of entropy in hand, we return to the following fundamental question. Let \( \rho_{AB} \in D \left( \mathbb{C}^d \otimes \mathbb{C}^d\right) \) be a bipartite quantum state. Can one efficiently determine if \(\rho_{AB}\) is entangled? (Recall this means that \(\rho_{AB}\) cannot be written \(\rho_{AB} = \sum_{i=1} p_i \rho_{A,i} \otimes \rho_{B,i}\) as a convex combination of product states.) Roughly, if one uses \(d\) to encode the size of the input (i.e. the input is the entire \(d^2\times d^2\) matrix representing \(\rho_{A,B}\) ), then deciding this question turns out to be NP-hard. This directly implies that quantifying “how much” entanglement is in \(\rho_{A,B}\) is also NP-hard. However, there is a special case in which we can do both tasks efficiently — the case of bipartite pure states \( \left|\psi_{AB}\right\rangle \in \mathbb{C}^d \otimes \mathbb{C}^d \) . It is here in which the von Neumann entropy plays a role.

In fact, a previous lecture already discussed an efficient test for entanglement for \(\left|\psi_{AB}\right\rangle\) — the latter is entangled if and only if

\[ \rho_A:= \text{Tr}_B\left( \left| \psi_{AB} \right\rangle\!\left\langle \psi_{AB} \right| \right) \]

has rank at least 2. This, in turn, followed because it immediately implies the Schmidt rank of \( \left|\psi_{AB}\right\rangle \) is at least 2. However, we can say more. Suppose we have Schmidt decomposition

\[ \left|\psi_{AB}\right\rangle= \sum_{i=1}^{d}s_i \left| a_i \right\rangle\!\left\langle b_i \right| \]

Then, intuitively, as with the example of a Bell pair, \( \left|\psi_{AB}\right\rangle \) is “highly entangled” if all the Schmidt coefficients \(s_i\) are approximately equal in magnitude, and \( \left|\psi_{AB}\right\rangle \) is “weakly entangled” if there exists a single \(s_i\) whose magnitude is approximately 1. Do we know of a function which quantifies precisely this sort of behavior on the set \(\{s_i\}\)? Indeed, the entropy function! This notion of \(s_i\) being “spread out” versus “concentrated” is highly reminiscent of our fair versus biased coin flip example for the Shannon entropy. We can therefore use the von Neumann entropy to define an entanglement measure \(E( \left|\psi_{AB}\right\rangle )\) as

\[ E \left(\left|\psi_{AB}\right\rangle\right):=S \left( \rho_A\right) \]

Finally, let us close this section with a natural question — does \(E \left(\left|\psi_{AB}\right\rangle\right)\) still measure entanglement when its input is allowed to be a mixed state \( \rho_{AB} \) (as opposed to a pure state \( \left|\psi_{AB}\right\rangle \)? This is the last question in the following exercise.

Send it after class 8¶

What is \(E \left(\left| \Phi^+_{AB}\right\rangle\right) \) for \( \left|\Phi^+\right\rangle = \frac{1}{2} \left( \left|00\right\rangle + \left|11\right\rangle \right)\) a Bell state?
What is \(E \left( \left|+\right\rangle_A \left|-\right\rangle_B\right) \)?
Unlike the example of the fair coin, the Schmidt coefficients \(s_i\) of \( \left|\psi_{AB}\right\rangle \) are not probabilities, but amplitudes (i.e. we do not have \( \sum_{i=1} s_i = 1\), but rather \( \sum_{i=1} s_i^2 = 1\)). Show, however, that the notion of probabilities is recovered in the formula \(S( \rho_A)\), i.e. show that the eigenvalues of \( \rho_A\) are the precisely set \( \left\{s_i^2\right\}_{i=1}^d \), which do form a distribution.
Define \(E \left( \rho_{AB} \right):=S \left( \text{Tr}_B\left( \rho_{AB} \right) \right)=S \left( \rho_A \right)\). Recall that the maximally mixed state on two qubits is a product state, i.e. \(I/4 = I/2 \otimes I/2\). Show that \(E(I/4) = 1\). Why does this imply when cannot use \(E\) as an entanglement measure for bipartite mixed states?

Entanglement distillation¶

Now that we have a notion of how to quantify entanglement in pure states, we can become greedy — under what circumstances is it possible to “increase” the amount of entanglement in a composite quantum system? This is a highly non-trivial question, as fundamental communication tasks such as teleportation require highly entangled Bell pairs as a resource. Unfortunately, experimentally producing such pure states is generally a difficult task due to noise from the environment. In other words, in a lab one is typically able to produce mixed states, as opposed to pure states. Moreover, even if Alice could produce perfect Bell pairs in a lab on Earth, when she sends half of a Bell pair to Bob on Mars, the transmitted qubit will again typically be subject to noise, yielding a shared mixed state \( \rho_{AB}\) between Alice and Bob. Do Alice and Bob have any hope of running the teleportation protocol given \( \rho_{AB}\) ?

Local Operations and Classical Communication (LOCC). To answer the question,it is important to first define the rules of the game. Since Alice and Bob are spatially separated, they are not able to apply joint quantum operations to both systems \(A\) and \(B\), e.g. they cannot apply a non-factorizable unitary \(U_{AB} \in U \left( \mathbb{C}^d \otimes \mathbb{C}^d\right)\) to \( \rho_{AB}\) . However, they can apply local unitaries and measurements, e.g. factorizable unitaries of the form \(U_A \otimes U_B\) for \(U_A,UB \in U( \mathbb{C}^d)\) (i.e. Alice locally applies \(U_A\) , Bob locally applies \(U_B\) ). They can also pick up the phone and call one another to transmit classical information. Taken together, this set of allowed operations is given a name: Local Operations and Classical Communication (LOCC). The question is thus: Given a shared mixed state \( \rho_{AB}\) , can Alice and Bob use LOCC to “purify” or “distill” Bell states out of \( \rho_{AB}\) ? The answer is sometimes yes, and protocols accomplishing this are called distillation protocols, as they recover “pure” entanglement from “noisy” or mixed state entanglement.

A simple distillation protocol. We shall discuss a simple distillation protocol, known as the recurrence protocol. Given as input a mixed two-qubit state \( \rho_{AB} \in D \left( \mathbb{C}^2 \otimes \mathbb{C}^2 \right) \) our aim is to distill the Bell state known as the singlet, \(\left|\Psi^-\right\rangle = \frac{1}{\sqrt{2}} \left( \left|01\right\rangle - \left|10\right\rangle \right)\) ; note \(I \otimes Y \left|\Psi^-\right\rangle =i \left|\Phi^+\right\rangle \), making it easy to convert one Bell state into the other for the teleportation scheme. There is a required precondition for the protocol to work — the input state \( \rho_{AB}\) must have sufficient initial overlap with \( \left|\Psi^-\right\rangle \) i, i.e.

\[ F ( \rho_{AB} ) := \left\langle \Psi^- \right| \rho_{AB}\left| \Psi^- \right\rangle > \frac{1}{2} \]

In other words, in transmitting half of \( \left|\Psi^-\right\rangle \) from Alice to Bob, the resulting mixed state \( \rho_{AB}\) should not have deviated “too far” from \( \left|\Psi^-\right\rangle \). Henceforth, we shall use shorthand \(F\) to denote \(F ( \rho_{AB} )\) for brevity.

Suppose Alice and Bob share two copies of \( \rho_{AB}\) ; let us label them \( \rho_{A_1B_1}\) and \( \rho_{A_2B_2}\) , where Alice holds systems \(A_1 , A_2\) , and Bob holds \(B_1 , B_2\) . Each round of the distillation protocol proceeds as follows.

(Twirling operation) Alice picks a Pauli operator \(U\) from set \(\{I, X, Y, Z\}\) uniformly at random, and communicates this choice to Bob. They each locally apply operator \(\sqrt{U}\) to \( \rho_{A_iB_i}\) for \(i \in \{1, 2\}\) (note that \(U\) is defined using the notion of operator functions), obtaining

\[ \sigma_{A_iB_i}:= \left(\sqrt{U_A} \otimes\sqrt{U_B} \right) \rho_{A_iB_i}\left(\sqrt{U_A}^\dagger \otimes\sqrt{U_B}^\dagger \right) \]

This random choice of Pauli and its subsequent local application is together called the twirling map \(\Phi:D \left( \mathbb{C}^2 \otimes \mathbb{C}^2\right) \mapsto D \left( \mathbb{C}^2 \otimes \mathbb{C}^2\right) \), and is not a unitary map (due to the random choice over Pauli operators); nevertheless, it can clearly be implemented given the ability to flip random coins and apply arbitrary single qubit gates. (The formal framework for studying such operations is via Trace Preserving Completely Positive Maps, and is beyond the scope of this course.) The nice thing about the twirling operation is that, for any input \( \rho_{AB}\) , \(\Phi \left(\rho_{AB}\right) \) can be diagonalized in the Bell basis, i.e. can be written

\[ \Phi \left(\rho_{AB}\right) = F \left| \Psi^- \right\rangle\!\left\langle \Psi^- \right|+ \frac{1-F}{3} \left| \Psi^+ \right\rangle\!\left\langle \Psi^+ \right| + \frac{1-F}{3} \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right| + \frac{1-F}{3} \left| \Phi^- \right\rangle\!\left\langle \Phi^- \right| \]

for Bell basis \( \left\{ \left|\Phi^+\right\rangle , \left|\Phi^-\right\rangle , \left|\Psi^+\right\rangle , \left|\Psi^-\right\rangle \right\} \), where \(F\) is given above
(Convert from \( \left|\Psi^-\right\rangle \) to \( \left|\Phi^+\right\rangle \) Alice applies Pauli \(Y\) to her half of each state to obtain states:

\[ \sigma_{A_iB_i}=\frac{1-F}{3} \left| \Psi^- \right\rangle\!\left\langle \Psi^- \right| + \frac{1-F}{3} \left| \Psi^+ \right\rangle\!\left\langle \Psi^+ \right| + F \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right| +\frac{1-F}{3} \left| \Phi^- \right\rangle\!\left\langle \Phi^- \right| \]

This shifts most of the weight in \(\Phi( \rho_{A_iB_i} \) from \( \left|\Psi^-\right\rangle \) to \( \left|\Phi^+\right\rangle \), since \(F > 1/2\).
(Application of CNOT gates) Alice applies a CNOT gate with qubit \(A_1\) as the control and \(A_2\) as the target. Bob does the same for \(B_1\) and \(B_2\) .
(Local measurements) Alice and Bob each locally measure \(A_2\) and \(B_2\) in the standard basis, obtaining outcomes \(a\) and \(b\) in \(\{0, 1\}\), respectively. They pick up the phone to compare their measurement results \(a\) and \(b\). If \(a = b\), they keep the remaining composite system on \(AB\), denoted \( \sigma^\prime_{A_1B_1} \). Otherwise if \(a \neq b\), they throw out all systems and start again.
(Convert from \( \left|\Phi^+\right\rangle \) to \( \left|\Psi^-\right\rangle \)) Alice applies Pauli \(Y\) to her half of \(\sigma^\prime_{A_1B_1}\) to convert its \( \left|\Phi^+\right\rangle \) component back to \( \left|\Psi^-\right\rangle \).

To get a better sense of this protocol in action, let us apply it to a concrete example. Suppose Alice sends half of the singlet to Bob, and along the way, the state \( \left|\Psi^-\right\rangle \) is injected with some completely random noise, denoted by the identity matrix:

\[ \rho_{AB}= \frac{1}{2} \left| \Psi^- \right\rangle\!\left\langle \Psi^- \right| + \frac{1}{8} I = \frac{3}{4} \left| \Psi^- \right\rangle\!\left\langle \Psi^- \right| + \frac{1}{12} \left| \Psi^+ \right\rangle\!\left\langle \Psi^+ \right| + \frac{1}{12} \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right| + \frac{1}{12} \left| \Phi^- \right\rangle\!\left\langle \Phi^- \right| \]

where the second equality follows by recalling the identity matrix diagonalizes in any basis, including the Bell basis. (The noise-inducing channel above is formally known as the depolarizing channel in quantum information theory.)

Steps 3 and 4 are a bit messier. The intuition here is that the CNOT entangles \( \sigma_{A_1B_1}\) with \( \sigma_{A_2B_2}\) , and the measurement forces the bits in the second system (formerly in state \( \sigma_{A_2B_2}\) ) to match; via the entanglement just created, this increases the weight on the terms where bits match, i.e. \( \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right| \) and \( \left| \Phi^- \right\rangle\!\left\langle \Phi^- \right| \). Thus, the final Step 5 will yield a state with higher overlap on the desired singlet state \( \left|\Psi^-\right\rangle \).

To run through the full technical analysis for Steps 3 and 4 would be tedious, so we analyze one term of the computation for brevity. Before Step 3 is run, Alice and Bob share state

\[ \sigma_{A_1B_1} \otimes \sigma_{A_2B_2} \in D \left( \mathbb{C}^2 \otimes \mathbb{C}^2 \otimes\mathbb{C}^2 \otimes\mathbb{C}^2 \right) \]

where recall Alice holds qubits \(A_1\) , \(A_2\) , and Bob holds \(B_1\) , \(B_2\) . Since each \(\sigma_{A_iB_i}\) has 4 terms in its mixture, the tensor product has 16 terms. By linearity, Step 3 then applies gates CNOT\(_{A_1A_2}\) and CNOT\(_{B_1B_2}\) to each of these 16 terms, where our notational convention is that CNOT\(_{12}\) has qubit 1 as the control and qubit 2 as the target.

Finally, let us briefly state what this protocol does. A careful but tedious analysis yields that with probability at least 1/4, this protocol maps the input state \( \rho_{A_1B_1} \otimes \rho_{A_2B_2}\) to an output state \( \sigma^\prime_{A_1B_1} \) such that (for \(F_\rho := F ( \rho_{A_1B_1} )\))

\[ F ( \sigma^\prime_{A_1B_1} )= \frac{F_\rho^2+ \frac{1}{9} \left(1-F_\rho\right)^2 }{F_\rho^2+ \frac{2}{3}F_\rho \left(1-F_\rho\right)+ \frac{5}{9} \left(1-F_\rho\right)^2 } \]

So long as \(F_\rho > 1/2\), one can show \(F ( \sigma^\prime_{A_1B_1} )>F_\rho\) ; thus, recursively applying this protocol (using many pairs of input states \( \rho_{AB}\) ) improves our overlap with our desired target state of \( \left|\Psi^-\right\rangle \).

Send it after class 9¶

Show that \(\sqrt{Z} \otimes \sqrt{Z}\) maps \( \left|\Phi^+\right\rangle \) to \( \left|\Phi^-\right\rangle \) and vice versa. Using the additional identities that \(\sqrt{X} \otimes \sqrt{X}\) maps \( \left|\Phi^+\right\rangle \) to \( \left|\Psi^+\right\rangle \) and vice versa, and \(\sqrt{Y} \otimes \sqrt{Y}\) maps \( \left|\Phi^-\right\rangle \) to \( \left|\Psi^+\right\rangle \) and vice versa, show that the twirling map leaves \(\rho_{AB}= \frac{1}{2} \left| \Psi^- \right\rangle\!\left\langle \Psi^- \right| + \frac{1}{8} I = \frac{3}{4} \left| \Psi^- \right\rangle\!\left\langle \Psi^- \right| + \frac{1}{12} \left| \Psi^+ \right\rangle\!\left\langle \Psi^+ \right| + \frac{1}{12} \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right| + \frac{1}{12} \left| \Phi^- \right\rangle\!\left\langle \Phi^- \right|\) invariant.
Show that applying \(Y_A \otimes I\) to \( \rho_{AB}\) yields in Step 2 that

\[ \sigma_{AB}= \frac{1}{12} \left| \Psi^- \right\rangle\!\left\langle \Psi^- \right| + \frac{1}{12} \left| \Psi^+ \right\rangle\!\left\langle \Psi^+ \right| + \frac{3}{4} \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right| + \frac{1}{12} \left| \Phi^- \right\rangle\!\left\langle \Phi^- \right| \]
Let us analyze one of these 16 terms: \( \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right|_{A_1B_1} \otimes \left| \Phi^+ \right\rangle\!\left\langle \Phi^+ \right|_{A_2B_2}\) .Show that

\[ (\text{CNOT}_{A_1A_2} \otimes \text{CNOT}_{B_1B_2} ) \left|\Phi^+\right\rangle_{A_1B_1} \otimes \left|\Phi^+\right\rangle_{A_2B_2} = \frac{1}{2} \left( \left|0000\right\rangle+\left|0011\right\rangle+\left|1100\right\rangle +\left|1111\right\rangle\right)_{A_1B_1A_2B_2} \]

If Alice and Bob run Step 4 on this state and obtain matching outcomes \(a = b\), what does the state on qubits \(A_1B_1\) collapse to?