Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the...

15
Mixed-state entanglement from local randomized measurements Andreas Elben, 1, 2, * Richard Kueng, 3, * Hsin-Yuan (Robert) Huang, 4, 5 Rick van Bijnen, 1, 2 Christian Kokail, 1, 2 Marcello Dalmonte, 6, 7 Pasquale Calabrese, 6, 7, 8 Barbara Kraus, 9 John Preskill, 4, 5, 10, 11 Peter Zoller, 1, 2 and Benoˆ ıt Vermersch 1, 2, 12 1 Center for Quantum Physics, University of Innsbruck, Innsbruck A-6020, Austria 2 Institute for Quantum Optics and Quantum Information of the Austrian Academy of Sciences, Innsbruck A-6020, Austria 3 Institute for Integrated Circuits, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria 4 Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA 5 Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA 6 The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy 7 SISSA, via Bonomea 265, 34136 Trieste, Italy 8 INFN, via Bonomea 265, 34136 Trieste, Italy 9 Institute for Theoretical Physics, University of Innsbruck, A–6020 Innsbruck, Austria 10 Walter Burke Institute for Theoretical Physics, Caltech, Pasadena, CA, USA 11 AWS Center for Quantum Computing, Pasadena, CA, USA 12 Univ. Grenoble Alpes, CNRS, LPMMC, 38000 Grenoble, France We propose a method for detecting bipartite entanglement in a many-body mixed state based on estimating moments of the partially transposed density matrix. The estimates are obtained by performing local random measurements on the state, followed by post-processing using the classical shadows framework. Our method can be applied to any quantum system with single-qubit control. We provide a detailed analysis of the required number of experimental runs, and demonstrate the protocol using existing experimental data [Brydges et al, Science 364, 260 (2019)]. Engineered quantum many-body systems exist in to- day’s laboratories as Noisy Intermediate Scale Quantum Devices (NISQ) [1]. This provides us with novel oppor- tunities to study and quantify entanglement – a funda- mental concept in both quantum information theory [2] and many-body quantum physics [3, 4]. For pure (or nearly-pure) states, entanglement has been detected by measuring the second R´ enyi entropy [510]. This has been achieved via, for instance, many-body quantum in- terference [79, 11, 12] (see also [13, 14]) and random- ized measurements [10, 1518]. However, many states of interest are actually highly mixed – either because of decoherence, or because they describe interesting subre- gions of a larger, globally entangled, system. Developing protocols which detect and quantify mixed-state entan- glement on intermediate scale quantum devices is thus an outstanding challenge. Below we propose and experimentally demonstrate conditions for mixed-state entanglement and measure- ment protocols based on the positive partial transpose (PPT) condition [2, 5, 19]. Consider two partitions A and B described by a (reduced) density matrix ρ AB . The well-known PPT condition checks if the partially trans- posed (PT) density matrix ρ T A AB [20] is positive semidef- inite, i.e. all eigenvalues are non-negative. If the PPT condition is violated – i.e. ρ T A AB does have negative eigen- values – A and B must be entangled. It is possible to turn the PPT condition into a quantitative entanglement measure. The negativity N (ρ AB )= λ<0 |λ|, with λ the spectrum of ρ T A AB , is positive if and only if the underlying * These authors contributed equally. state ρ AB violates the PPT condition [21]. While appli- cable to mixed states, computing the negativity requires accurately estimating the full spectrum of ρ T A AB . We by- pass this challenge by considering moments of the par- tially transposed density matrix (PT-moments) instead: p n = Tr[(ρ T A AB ) n ] for n =1, 2, 3, . . .. (1) These have been first studied in quantum field theory to quantify correlations in many-body systems [22]. Clearly, p 1 = tr(ρ AB ) = 1, while p 2 is equal to the purity tr[ρ 2 AB ] (see Table 1 in the Supplemental Material [23] (SM) for a visual derivation). Hence, p 3 is the lowest PT-moment that captures meaningful information about the partial transpose (see also Ref. [24]). In this letter, we first show that the first three PT- moments can be used to define a simple yet powerful test for bipartite entanglement: ρ AB PPT = p 3 p 2 2 . (2) The p 3 -PPT condition is the contrapositive of this asser- tion: if p 3 <p 2 2 , then ρ AB violates the PPT condition [see Fig. 1a)] and must therefore be entangled (see SM [23] for the proof). Similar to the PPT condition, the p 3 -PPT condition applies to mixed states and is com- pletely independent of the state in question. This is a key distinction from entanglement witnesses [25, 26], which can be more powerful, but which usually require detailed prior information about the state. While in gen- eral weaker than the full PPT condition, the p 3 -PPT condition relies on comparing two comparatively simple functionals and outperforms other state-independent en- tanglement detection protocols, like comparing purities of various nested subsystems [5, 710, 23]. As shown in arXiv:2007.06305v2 [quant-ph] 13 Nov 2020

Transcript of Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the...

Page 1: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

Mixed-state entanglement from local randomized measurements

Andreas Elben,1, 2, ∗ Richard Kueng,3, ∗ Hsin-Yuan (Robert) Huang,4, 5 Rick van

Bijnen,1, 2 Christian Kokail,1, 2 Marcello Dalmonte,6, 7 Pasquale Calabrese,6, 7, 8

Barbara Kraus,9 John Preskill,4, 5, 10, 11 Peter Zoller,1, 2 and Benoıt Vermersch1, 2, 12

1Center for Quantum Physics, University of Innsbruck, Innsbruck A-6020, Austria2Institute for Quantum Optics and Quantum Information of the Austrian Academy of Sciences, Innsbruck A-6020, Austria

3Institute for Integrated Circuits, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria4Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA

5Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA6The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy

7SISSA, via Bonomea 265, 34136 Trieste, Italy8INFN, via Bonomea 265, 34136 Trieste, Italy

9Institute for Theoretical Physics, University of Innsbruck, A–6020 Innsbruck, Austria10Walter Burke Institute for Theoretical Physics, Caltech, Pasadena, CA, USA

11AWS Center for Quantum Computing, Pasadena, CA, USA12Univ. Grenoble Alpes, CNRS, LPMMC, 38000 Grenoble, France

We propose a method for detecting bipartite entanglement in a many-body mixed state basedon estimating moments of the partially transposed density matrix. The estimates are obtained byperforming local random measurements on the state, followed by post-processing using the classicalshadows framework. Our method can be applied to any quantum system with single-qubit control.We provide a detailed analysis of the required number of experimental runs, and demonstrate theprotocol using existing experimental data [Brydges et al, Science 364, 260 (2019)].

Engineered quantum many-body systems exist in to-day’s laboratories as Noisy Intermediate Scale QuantumDevices (NISQ) [1]. This provides us with novel oppor-tunities to study and quantify entanglement – a funda-mental concept in both quantum information theory [2]and many-body quantum physics [3, 4]. For pure (ornearly-pure) states, entanglement has been detected bymeasuring the second Renyi entropy [5–10]. This hasbeen achieved via, for instance, many-body quantum in-terference [7–9, 11, 12] (see also [13, 14]) and random-ized measurements [10, 15–18]. However, many statesof interest are actually highly mixed – either because ofdecoherence, or because they describe interesting subre-gions of a larger, globally entangled, system. Developingprotocols which detect and quantify mixed-state entan-glement on intermediate scale quantum devices is thusan outstanding challenge.

Below we propose and experimentally demonstrateconditions for mixed-state entanglement and measure-ment protocols based on the positive partial transpose(PPT) condition [2, 5, 19]. Consider two partitions Aand B described by a (reduced) density matrix ρAB . Thewell-known PPT condition checks if the partially trans-posed (PT) density matrix ρTA

AB [20] is positive semidef-inite, i.e. all eigenvalues are non-negative. If the PPTcondition is violated – i.e. ρTA

AB does have negative eigen-values – A and B must be entangled. It is possible toturn the PPT condition into a quantitative entanglementmeasure. The negativity N (ρAB) =

∑λ<0 |λ|, with λ the

spectrum of ρTA

AB , is positive if and only if the underlying

∗ These authors contributed equally.

state ρAB violates the PPT condition [21]. While appli-cable to mixed states, computing the negativity requiresaccurately estimating the full spectrum of ρTA

AB . We by-pass this challenge by considering moments of the par-tially transposed density matrix (PT-moments) instead:

pn = Tr[(ρTA

AB)n] for n = 1, 2, 3, . . .. (1)

These have been first studied in quantum field theory toquantify correlations in many-body systems [22]. Clearly,p1 = tr(ρAB) = 1, while p2 is equal to the purity tr[ρ2

AB ](see Table 1 in the Supplemental Material [23] (SM) fora visual derivation). Hence, p3 is the lowest PT-momentthat captures meaningful information about the partialtranspose (see also Ref. [24]).

In this letter, we first show that the first three PT-moments can be used to define a simple yet powerful testfor bipartite entanglement:

ρAB ∈ PPT =⇒ p3 ≥ p22. (2)

The p3-PPT condition is the contrapositive of this asser-tion: if p3 < p2

2, then ρAB violates the PPT condition[see Fig. 1a)] and must therefore be entangled (see SM[23] for the proof). Similar to the PPT condition, thep3-PPT condition applies to mixed states and is com-pletely independent of the state in question. This isa key distinction from entanglement witnesses [25, 26],which can be more powerful, but which usually requiredetailed prior information about the state. While in gen-eral weaker than the full PPT condition, the p3-PPTcondition relies on comparing two comparatively simplefunctionals and outperforms other state-independent en-tanglement detection protocols, like comparing puritiesof various nested subsystems [5, 7–10, 23]. As shown in

arX

iv:2

007.

0630

5v2

[qu

ant-

ph]

13

Nov

202

0

Page 2: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

2

separable

PPTp3-PPT

c)

0 1 2 3 4 5t(ms)

1

2

p2 2/p 3

[1,2,3],[4,5,6]

[1,2,3],[6,7,8]

[1,2,3],[8,9,10]

0 1 2 3 4 5t(ms)

1.0

1.5

2.0

p2 2/p 3

[1], [2]

[1, 2], [3, 4]

[1, 2, 3], [4, 5, 6]

d)

b)a)

FIG. 1. Protocol and illustrations. a) The p3-PPT conditioncan be used to demonstrate mixed-state bipartite entangle-ment with PT-moments. Separable states are PPT states andalso fulfill the p3-PPT condition. Thus, quantum states whichviolate the p3-PPT condition must be bipartite entangled [seealso Eq. (2)]. b) In our protocol, PT-moments are measuredby applying local random unitaries followed by computationalbasis measurements. c-d) Violation of the p3-PPT condition,i.e. p22 > p3, is experimentally observed for connected c) anddisconnected (separated by d = 0, 2, 4 spins) d) partitions Aand B at various times t after a quantum quench [10]. Dots:experimental results. Error bars: Jackknife estimates of sta-tistical errors. Lines: numerical simulations including thedecoherence model presented in Ref. [10].

the SM [23], the p3-PPT condition becomes equivalent tothe PPT condition for Werner states (in this case, it is anecessary and sufficient condition for bipartite entangle-ment [27]).

The second main contribution of this letter is a mea-surement protocol to determine PT-moments in NISQdevices. Crucially, we employ randomized measurementsimplemented with local (single-qubit) random unitaries,see Fig. 1b) which are readily available in NISQ devicesand have been already successfully applied to measureentanglement entropies, many-body state-fidelities, andout-of-time ordered correlators [10, 28–30]. In contrast toprevious proposals for measuring PT-moments, our pro-tocol does not rely on many-body interference betweenidentical state copies [6, 24, 31], or on using global entan-gling random unitaries [32] built from interacting Hamil-tonians [16, 33–35]. Instead, it only requires single-qubitcontrol, and allows for the estimation of many distinctPT-moments from the same data. In particular, arbi-trary orders n ≥ 2 and arbitrary (connected, as well asdisconnected) partitions A, B can be measured.

While the experimental setup for our measurementprotocol is reminiscent of quantum state tomogra-phy [36–39], there are fundamental differences regardingthe required number of measurements (as independentstate copies), and the way the measured data is pro-

cessed. Without strong assumptions on the state [37, 38],performing tomography to infer an ε-approximation ofan unknown density matrix ρAB (e.g. in order to sub-sequently compute ε-approximations of pn) requires (atleast) order 2|AB|rank(ρAB)/ε2 measurements [40, 41].In the high accuracy regime (ε � 1), our direct es-timation protocol instead only requires order 2|AB|/ε2

measurements. For highly mixed states – the centraltopic of this work – this discrepancy heralds a signifi-cant reduction in measurement resources. Furthermore,we predict PT-moments through a ’direct’ and (multi-) linear postprocessing of the measurement data repre-sented as ’classical shadows’ [18]. Thus, data process-ing is cheap – both in memory and runtime – and canbe massively parallelized. Similar to previous measure-ment [10, 15, 16, 18, 29, 30, 42–44] and entanglementdetection [45–49] protocols based on randomized mea-surements, this is another distinct advantage over tomog-raphy which typically requires expensive data-processingalgorithms [36] or training a neural network [38].

Finally, we demonstrate our measurement protocol andthe p3-PPT condition experimentally in the context ofthe quantum simulation of many-body systems. Here,PT-moments have been shown to reveal universal prop-erties of quantum phases of matter [22, 50–53] and theirtransitions [22, 50, 54, 55]. Out of equilibrium, PT-moments allow to understand the dynamical process ofthermalization [56–59], and the fate of (many-body) lo-calization in presence of decoherence [60]. In this work,we analyze the data of Ref. [10] corresponding to theout-of-equilibrium dynamics in a spin model with long-range interactions, which was implemented in a 10-qubittrapped ion quantum simulator. In particular, we cer-tify the presence of mixed-state entanglement via the p3-PPT condition [see Fig. 1(c-d), and for details below].Furthermore, we monitor the time-evolution of p3 andobserve dynamical signatures of entanglement spreadingand thermalization [56, 57].

Protocol– The experimental ingredients to measurePT-moments build on resources similar to the ones pre-sented in Ref. [16] and realized in Ref. [10] to mea-sure Renyi entropies. The key new element is the post-processing of the experimental data [18]. As shown inFig. 1, the quantum state of interest is realized in a sys-tem of N qubits. In the partitions A and B, consistingof |A| and |B| spins, respectively, a randomized measure-ment is performed by applying random local unitariesu = u1 ⊗ · · · ⊗ u|AB|, with ui independent single qubitrotation sampled from a unitary 3-design [33, 61], and asubsequent projective measurement in the computationalbasis with outcome k = (k1, . . . , k|AB|). This is subse-quently repeated with M different random unitaries suchthat a data set of M bitstrings k(r) with r = 1, . . . ,M iscollected.

From this data set, the PT-moments pn can be esti-mated without having to reconstruct the density matrixρAB , and with a significantly smaller number of experi-mental runs M than required for full quantum state to-

Page 3: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

3

mography. To obtain such estimates, we rely on two ob-servations. First, each outcome k(r) can be used to definean unbiased estimator

ρ(r)AB =

⊗i∈AB

[3(u

(r)i )† |k(r)

i 〉 〈k(r)i |u

(r)i − I2

](3)

of the density matrix ρAB , i.e. E[ρ(r)AB ] = ρAB with the

expectation value taken over the unitary ensemble andprojective measurements [17, 18, 62, 63]. Second, thePT-moments pn can be viewed as an expectation value

of a n-copy observable−→ΠA←−ΠB evaluated on n-copies of

the original density matrix ρAB ,

pn = Tr[−→

ΠA←−ΠB ρ

⊗nAB

]. (4)

Here,−→ΠA and

←−ΠB are n-copy cyclic permutation op-

erators−→ΠA |k[1]

A ,k[2]A , . . . ,k

[n]A 〉 = |k[n]

A ,k[1]A , . . . ,k

[n−1]A 〉,←−

ΠB |k[1]B ,k

[2]B , . . . ,k

[n]B 〉 = |k[2]

B , . . . ,k[n]B ,k

[1]B 〉 that act on

the partitions A and B, respectively.Estimators of the PT-moments pn can now be derived

from Eqs. (3) and (4) using U-statistics [64]. Replacingρ⊗n with ρ(r1)⊗· · ·⊗ ρ(rn) where r1 6= r2 6= . . . 6= rn, cor-responding to independently sampled random unitariesu(r1), . . . , u(rn), we define the U-statistic

pn =1

n!

(M

n

)−1 ∑r1 6=r2 6=...6=rn

Tr[−→

ΠA←−ΠB ρ

(r1)AB ⊗ · · · ⊗ ρ

(rn)AB

].

(5)

It follows from the defining properties of U -statistics thatpn is an unbiased estimator of pn, i.e. E[pn] = pn withthe expectation value taken over the unitary ensembleand projective measurements [64]. Its variance governsthe statistical errors arising from finite M . Furthermore,a quick inspection of Eqs. (3) and (4) reveals that thesummands in Eq. (5) completely factorize into contrac-

tions of single qubit matrices, Tr[−→ΠA←−ΠB ρ

(r1)AB ⊗ · · · ⊗

ρ(rn)AB ] = Πi∈ATr[ρ

(r1),Ti · · · ρ(rn),T

i ]Πi∈BTr[ρ(r1)i · · · ρ(rn)

i ],

with ρ(r)AB = ⊗i∈AB ρ(r)

i as in Eq. (3). Thus, given Mobserved bitstrings kr, one can determine pn with clas-sical data processing scaling as Mn|AB|, without stor-ing exponentially large matrices on the classical post-processing device.

Statistical errors– As demonstrated in Fig. 1(c,d), PT-moments can be inferred using a finite number of ex-perimental runs M . Here, we investigate in detail thestatistical errors arising from the finite value of M .

Analytically, we bound statistical errors based on thevariance of the multi-copy observable in question. Forp2 = Tr[(ρTA

AB)2], our analysis reveals that the error decayrate depends on number of measurementsM . In the largeM regime, the error is proportional to 2|AB|p2/

√M . This

error bound is multiplicative – i.e. the size of the erroris proportional to the size of the target p2 – and 1/

√M

captures the expected decay rate for an estimation pro-cedure that relies on empirical averaging. For small and

100 101 102 103 104

M/2|AB|

10−3

10−2

10−1

100

101

Err

.p 2

|AB| = 2

|AB| = 4

|AB| = 6

|AB| = 8

100 101 102 103 104

M/2|AB|

10−3

10−2

10−1

100

101

Err

.p 3

|AB| = 2

|AB| = 4

|AB| = 6

|AB| = 8

a) b)

FIG. 2. Statistical errors for the GHZ state. Dashed linesrepresent scalings of ∝ 1/M , and ∝ 1/

√M . In both cases,

the number of measurements to estimate p2 a) and p3 b) with

accuracy 0.1 is of the order of 100× 2|AB|.

intermediate values of M , the estimation error is insteadbounded by 8× 21.5|AB|/M . While this is worse in termsof constants, the error decays at a much faster rate pro-portional to 1/M . Qualitatively similar results apply for

estimating p3 = Tr[(ρTA

AB)3], but there can be three decayregimes. For large M , the estimation error is boundedby 2|AB|p2

2/√M . This again captures the asymptotically

optimal rate 1/√M associated with empirical averaging,

but the constant is suppressed by p22, not p3 itself. For

intermediate M , the error decay rate is proportional to1/M , while an even faster rate ∝ 1/M3/2 governs theerror decay for small M . We refer to the SM for detailedstatements and proofs.

Now, we test these predictions numerically by simulat-ing the experimental protocol for various values of M insystems with N = |AB| qubits where a pure GHZ stateρ = |φGHZ〉〈φGHZ| is prepared. Here, A corresponds tothe first N/2 qubits, and B is the complement. The re-sults are shown in Fig. 2 and support our analytical errorbounds. They highlight in particular that the number ofmeasurement repetitions necessary to achieve a desiredaccuracy of ∼ 0.1 scales as 2|AB|. This enables the esti-mation of PT-moments in state of the art platforms withhigh repetition rates. These findings are discussed andconfirmed for the ground state of the transverse Isingmodel in the SM [23].

PT-moments in a trapped-ion quantum simulator–Below, we discuss the experimental demonstration of

the measurement of PT-moments in a trapped ion quan-tum simulator. To this end, we evaluate data taken inthe context of Ref. [10]. Here, the Renyi entropy growthin quench dynamics was investigated. The system, con-sisting in total of N = 10 qubits, was initialized in theNeel state |↑↓↑↓ . . .〉, and time-evolved with

HXY = ~∑i<j

Jij(σ+i σ−j + σ−i σ

+j ) + ~B

∑i

σzi (6)

with σzi the third spin-1/2 Pauli operator, σ+i (σ−i ) the

spin-raising (lowering) operators acting on spin i, andJij ≈ J0/|i− j|α the coupling matrix with an approxi-mate power-law decay α ≈ 1.24 and J0 = 420s−1. Af-ter time evolution, randomized measurements were per-formed, using M = 500 random unitaries and P = 150

Page 4: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

4

1 2 3 4 5 6 7 8|AB|

10−1

100

p 2

0 1 2 3 4d

10−1

100

p 2

1 2 3 4 5 6 7 8|AB|

10−2

10−1

100

p 3

0ms

1ms

2ms

3ms

4ms

5ms

0 1 2 3 4d

10−2

10−1

100

p 3

a)

c)

b)

d)

FIG. 3. Reconstruction of p2 = Tr[(ρTAAB)2] and p3 =

Tr[(ρTAAB)3] from experimental data [10]. A and B are parts

of a total system of 10 qubits. In a) and c), we takeA = [1, . . . , b|AB|/2c] and B = [b|AB|/2c + 1, . . . , |AB|]. Inb) and d), we take A = {1, 2, 3} and B = {4 + d, 5 + d, 6 + d}with d = 0, 1, . . . 4. Dots are obtained with the shadow es-timator [Eq. (5) second and third order], crosses with thedirect estimator (second order) of Ref. [10]. Different colorscorrespond to different times after the quantum quench withpurple [0 ms] corresponding to the initial product state. Foreach time, M = 500 unitaries and P = 150 measurementsper unitary were used. Lines: theory simulation includingdecoherence [10]. The ratio p22/p3, detecting entanglementaccording to the p3-PPT condition, is shown in Fig. 1 c) andd).

projective measurements per random unitary.

From this data, PT-moments can be inferred [65], withresults presented in Fig. 3. For the purity p2 a) b), we ob-serve good agreement with theory for up to N = 8 qubitspartitions, in particular the raise of p2 for partition sizesapproaching the total system size which is expected forsuch nearly pure states. For 9, 10-qubit partitions, thedata is not shown since the relative statistical error ofthe estimated data points approaches unity [66]. We how-ever note that the measured p2 is slightly underestimated.This is due to imperfect realizations of the random uni-taries, which tend to reduce the estimation of the overlapTr(ρr1ρr2). This effect is also present when measuringcross-platform fidelities [29]. For the third PT-momentp3 c), d), we observe the same kind of agreement be-tween theory value and experimental measurements. Inparticular, at large partition sizes, the protocol is ableto measure with high precision small values of p3. Thesesmall values are indeed fundamental to detect entangle-ment: a PPT violating state has a negative eigenvaluewhich reduces the value of p3, in comparison with thepurity p2. This effect is mathematically captured by thep3-PPT condition and allowed us to detect PPT viola-tion and thus entanglement for many-body mixed states[see Fig. 1c)]. In the SM [23], we present additional sim-

ulations showing the power of the p3-PPT condition, incomparison with the negativity and the condition basedon purities of nested subsystems.

The third PT-moment p3 does not only allow to de-tect mixed-state entanglement. It can also be used tostudy the dynamics of entanglement in various many-body quantum systems [22, 54–56, 60]. Here, we an-alyze the behavior of the dimensionless ratio R3 =− log2[p3/Tr(ρ3

AB)], which, as shown in quantum fieldtheory, follows the same universal behavior as the nega-tivity during evolution with a local Hamiltonian [56]. Weremark that R3 is however only well-defined for stateswith p3 > 0 (Werner states in large dimensions are acounter-example [23]). Furthermore, R3 is not an en-tanglement monotone [60]. It vanishes for all productstates, but can still be strictly positive for certain sepa-rable states [2, 60].

Fig. 4 illustrates the time evolution of R3 for (a) con-nected and (b) disconnected subsystems AB, respec-tively. The appearing peaks of R3 have been predictedand analyzed for various one-dimensional quantum sys-tems subject to local interactions [56, 57] (and have alsobeen studied in the context of Renyi mutual informa-tion [67, 68]). They can be understood in terms of prop-agating quasi-particles which describe collective excita-tions in the system [56, 57]. In this picture, entanglementbetween two partitions A and B is induced by the pres-ence of entangled pairs of quasi-particles shared betweenA and B. For each pair, the individual quasi-particlespropagate in opposite directions and start to entangle, inthe course of the time evolution, partitions that are moreand more separated [56, 57]. In particular, for two ad-jacent partitions (a), R3 increases at early times, whichis consistent with the picture of shared pairs of entan-gled quasi-particles entering the two partitions immedi-ately. After a certain time R3 reaches a maximum andstarts to decrease, which can be understood as the timewhen the quasi-particles start to ‘escape’ the region AB.For separated partitions (b), the peaks are delayed dueto the finite speed of propagation of the quasi particles.In addition, their maximum value is lowered because ofthe finite life-times of quasi-particles. The latter featureis characteristic to chaotic (non-integrable) thermalizingsystems [67] and is in our case further enhanced by de-coherence.

Conclusion– Our protocol extends the paradigm ofrandomized measurements, yielding the first direct mea-surement of PT-moments in a many-body system. U-statistics provides the key ingredient there and enablesus to harness a remarkable advantage over state tomog-raphy in terms of statistical errors. At a fundamentallevel, it is therefore natural to investigate how to accessnew important physical quantities based on random mea-surement data, and with significant savings in terms ofmeasurement and classical postprocessing over existingmethods. This approach can be used to derive protocolsto directly infer entanglement measures (including non-polynomial functions of the density matrix), such as the

Page 5: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

5

0 2 4 6 8 10t(ms)

0.0

0.5

�lo

g 2p 3

/Tr[

(⇢A

B)3 ]

A, B[1, 2], [3, 4][1, 2], [4, 5][1, 2], [5, 6][1, 2], [6, 7]

a)

0 2 4 6 8 10t(ms)

0

1

2

�lo

g 2p

3/Tr[

(⇢A

B)3 ]

[1], [2] [1, 2], [3, 4] [1, 2, 3], [4, 5, 6]

b)

FIG. 4. Evolution of the ratio R3 from experimental data [10].(a) Connected partitions. (b) Disconnected partitions sepa-rated by d = 0, 1, 2, 3 spins. Different colors correspond todifferent partitions AB. Dots are obtained with the shadowestimator Eq. (5) using experimental data [10]. Solid (dashed)lines: theory simulation of unitary dynamics (including deco-herence [10]).

von-Neumann entropy and the negativity.

ACKNOWLEDGMENTS

We are grateful to Alireza Seif who pointed out in-teresting error scaling effects for classical shadows ina Scirate comment addressing Ref. [18]. We thankM. Knap, S. Nezami, F. Pollmann and E. Wybo for dis-

cussions and valuable suggestions, as well as M. Joshifor the careful reading and comments on the manuscript.T. Brydges, P. Jurcevic, C. Maier, B. Lanyon, R. Blatt,and C. Roos have generously shared the experimen-tal data of Ref. [10]. Simulations were performedwith the QuTiP library [69]. Research in Innsbruck issupported by the European Union’s Horizon 2020 re-search and innovation programme under Grant Agree-ment No. 817482 (PASQuanS) and No. 731473 (Quan-tERA via QTFLAG), and by the Simons Collaborationon Ultra-Quantum Matter, which is a grant from theSimons Foundation (651440, P.Z.). B. K. acknowledgesfinancial support from the Austrian Academy of Sciencesvia the Innovation Fund ’Research, Science and Soci-ety’, the SFB BeyondC (Grant No. F7107-N38), and theAustrian Science Fund (FWF) grant DKALM: W1259-N27. Research at Caltech is supported by the KortschakScholars Program, the US Department of Energy (DE-SC0020290), the US Army Research Office (W911NF-18-1-0103), and the US National Science Foundation (PHY-1733907). The Institute for Quantum Information andMatter is an NSF Physics Frontiers Center. Research inTrieste is partly supported by European Research Coun-cil (grant No 758329 and 771536) and by the Italian Min-istry of Education under the FARE programme. BV ac-knowledges funding from the Austrian Science Fundation(FWF, P. 32597N).

[1] J. Preskill, Quantum 2, 79 (2018).[2] R. Horodecki, P. Horodecki, M. Horodecki, and

K. Horodecki, Rev. Mod. Phys. 81, 865 (2009).[3] J. Eisert, M. Cramer, and M. B. Plenio, Rev. Mod. Phys.

82, 277 (2010).[4] P. Calabrese and J. Cardy, J. Stat. Mech. Theor. Exp.

2016, 064003 (2016).[5] R. Horodecki and M. Horodecki, Phys. Rev. A 54, 1838

(1996).[6] P. Horodecki, Phys. Rev. Lett. 90, 167901 (2003).[7] R. Islam, R. Ma, P. M. Preiss, M. Eric Tai, A. Lukin,

M. Rispoli, and M. Greiner, Nature 528, 77 (2015).[8] A. M. Kaufman, M. E. Tai, A. Lukin, M. Rispoli,

R. Schittko, P. M. Preiss, and M. Greiner, Science 353,794 (2016).

[9] N. M. Linke, S. Johri, C. Figgatt, K. A. Landsman, A. Y.Matsuura, and C. Monroe, Phys. Rev. A 98 (2018).

[10] T. Brydges, A. Elben, P. Jurcevic, B. Vermersch,C. Maier, B. P. Lanyon, P. Zoller, R. Blatt, and C. F.Roos, Science 364, 260 (2019).

[11] C. M. Alves and D. Jaksch, Phys. Rev. Lett. 93, 110501(2004).

[12] A. J. Daley, H. Pichler, J. Schachenmayer, and P. Zoller,Phys. Rev. Lett. 109, 020505 (2012).

[13] J. Cardy, Phys. Rev. Lett. 106, 150404 (2011).[14] D. A. Abanin and E. Demler, Phys. Rev. Lett. 109,

020504 (2012).[15] S. J. van Enk and C. W. J. Beenakker, Phys. Rev. Lett.

108, 110503 (2012).

[16] A. Elben, B. Vermersch, M. Dalmonte, J. I. Cirac, andP. Zoller, Phys. Rev. Lett. 120, 050406 (2018).

[17] A. Elben, B. Vermersch, C. F. Roos, and P. Zoller, Phys.Rev. A 99, 052323 (2019).

[18] H.-Y. Huang, R. Kueng, and J. Preskill, Nat. Phys.(2020).

[19] A. Peres, Phys. Rev. Lett. 77, 1413 (1996).[20] The partial transpose (PT) operation – acting on

subsystem A – is defined as (|kA, kB〉 〈lA, lB |)TA =|lA, kB〉 〈kA, lB |, where {|kA, kB〉} is a product basis ofthe joint system AB.

[21] G. Vidal and R. F. Werner, Phys. Rev. A 65, 032314(2002).

[22] P. Calabrese, J. Cardy, and E. Tonni, Phys. Rev. Lett.109, 130502 (2012).

[23] See Supplemental Material.[24] J. Gray, L. Banchi, A. Bayat, and S. Bose, Phys. Rev.

Lett. 121, 150503 (2018).[25] B. M. Terhal, Phys. Lett. A 271, 319 (2000).[26] O. Guhne and N. Lutkenhaus, Phys. Rev. Lett. 96,

170502 (2006).[27] J. Watrous, The Theory of Quantum Information (Cam-

bridge University Press, 2018).[28] X. Mi, B. Vermersch, A. Elben, P. Roushan, Y. Chen,

P. Zoller, and V. Smelyanskiy, Bulletin of the AmericanPhysical Society 65 (2020).

[29] A. Elben, B. Vermersch, R. van Bijnen, C. Kokail,T. Brydges, C. Maier, M. K. Joshi, R. Blatt, C. F. Roos,and P. Zoller, Phys. Rev. Lett. 124, 010504 (2020).

Page 6: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

6

[30] M. K. Joshi, A. Elben, B. Vermersch, T. Brydges,C. Maier, P. Zoller, R. Blatt, and C. F. Roos, Phys.Rev. Lett. 124, 240505 (2020).

[31] E. Cornfeld, E. Sela, and M. Goldstein, Phys. Rev. A99, 062309 (2019).

[32] Y. Zhou, P. Zeng, and Z. Liu, Phys. Rev. Lett. 125,200502 (2020).

[33] C. Dankert, R. Cleve, J. Emerson, and E. Livine, Phys.Rev. A 80, 012304 (2009).

[34] Y. Nakata, C. Hirche, M. Koashi, and A. Winter, Phys.Rev. X 7, 021006 (2017).

[35] B. Vermersch, A. Elben, M. Dalmonte, J. I. Cirac, andP. Zoller, Phys. Rev. A 97, 023604 (2018).

[36] D. Gross, Y.-k. Liu, S. T. Flammia, S. Becker, and J. Eis-ert, Phys. Rev. Lett. 105, 150401 (2010).

[37] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma,D. Gross, S. D. Bartlett, O. Landon-Cardinal, D. Poulin,and Y.-K. Liu, Nat. Comm. 1, 149 (2010).

[38] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer,R. Melko, and G. Carleo, Nat. Phys. 14, 447 (2018).

[39] M. Guta, J. Kahn, R. Kueng, and J. A. Tropp, J. Phys.A 53, 204001 (2020).

[40] J. Haah, A. W. Harrow, Z. Ji, X. Wu, and N. Yu, IEEETransactions on Information Theory 63, 5628 (2017).

[41] R. O’Donnell and J. Wright, in Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Comput-ing , STOC ’16 (ACM, New York, NY, USA, 2016) pp.899–912.

[42] B. Vermersch, A. Elben, L. M. Sieberer, N. Y. Yao, andP. Zoller, Physical Review X 9, 021061 (2019).

[43] A. Elben, J. Yu, G. Zhu, M. Hafezi, F. Pollmann,P. Zoller, and B. Vermersch, Sci. Adv. 6, eaaz3666(2020).

[44] Z.-P. Cian, H. Dehghani, A. Elben, B. Vermersch,G. Zhu, M. Barkeshli, P. Zoller, and M. Hafezi, (2020),arXiv:2005.13543.

[45] M. C. Tran, B. Dakic, F. m. c. Arnault, W. Laskowski,and T. Paterek, Phys. Rev. A 92, 050301 (2015).

[46] M. C. Tran, B. Dakic, W. Laskowski, and T. Paterek,Phys. Rev. A 94, 042302 (2016).

[47] A. Ketterer, N. Wyderka, and O. Guhne, Phys. Rev.Lett. 122, 120505 (2019).

[48] W.-H. Zhang, C. Zhang, Z. Chen, X.-X. Peng, X.-Y. Xu,P. Yin, S. Yu, X.-J. Ye, Y.-J. Han, J.-S. Xu, G. Chen,C.-F. Li, and G.-C. Guo, Phys. Rev. Lett. 125, 030506(2020).

[49] L. Knips, J. Dziewior, W. K lobus, W. Laskowski, T. Pa-terek, P. J. Shadbolt, H. Weinfurter, and J. D. A. Mei-necke, npj Quantum Information 6 (2020).

[50] P. Calabrese, J. Cardy, and E. Tonni, J. Stat. Mech.Theor. Exp. 2013, P02008 (2013).

[51] P. Ruggiero, V. Alba, and P. Calabrese, Phys. Rev. B94, 035152 (2016).

[52] Y. Javanmard, D. Trapin, S. Bera, J. H. Bardarson, andM. Heyl, New J. Phys. 20, 083032 (2018).

[53] X. Turkeshi, P. Ruggiero, and P. Calabrese, Phys. Rev.B 101, 064207 (2020).

[54] C.-M. Chung, V. Alba, L. Bonnes, P. Chen, and A. M.Lauchli, Phys. Rev. B 90, 064401 (2014).

[55] K.-H. Wu, T.-C. Lu, C.-M. Chung, Y.-J. Kao, andT. Grover, arXiv:1912.03313.

[56] A. Coser, E. Tonni, and P. Calabrese, J. Stat. Mech.Theor. Exp. 2014, P12017 (2014).

[57] V. Alba and P. Calabrese, EPL 126, 60001 (2019).

[58] J. Kudler-Flam, M. Nozaki, S. Ryu, and M. T. Tan, J.High Energy Phys. 2020, 31 (2020).

[59] V. Alba and F. Carollo, arXiv:2002.09527.[60] E. Wybo, M. Knap, and F. Pollmann, arXiv:2004.13072.[61] D. Gross, K. Audenaert, and J. Eisert, J. Math. Phys.

48, 052104 (2007).[62] M. Ohliger, V. Nesme, and J. Eisert, New J. Phys. 15,

015024 (2013).[63] M. Paini and A. Kalev, arXiv:1910.10543.[64] W. Hoeffding, in Breakthroughs in Statistics (Springer,

1992) pp. 308–334.[65] Theoretically, an ideal distribution of the measurement

budget M · P would consist in setting P = 1, i.e.sampling new random unitaries for each experimentalrun. Experimentally, it might be however beneficial touse P ≥ 1. In this situation, we replace the estima-

tors defined in Eq. (3) with ˆρ(r) =∑P

s=1 ρ(r,s) where

ρ(r,s) =⊗

i(3(u(r)i )† |k(r,s)i 〉 〈k(r,s)i |u(r)

i − I2) and k(r,s)i

the outcome of the measurement s obtained after the ap-plication of the unitary r.

[66] The distribution of the measurement budget M · P inRef. [10] into unitaries M and projective measurementsper unitary P has been optimized for the purity estimatorp2 presented in Ref. [10] which differs from p2 defined inEq. (5). Thus, for the present data set [10] with M =500 and P = 150, the statistical uncertainty of the p2 issmaller than for p2 which performs best for P = O(1)and a correspondingly larger number of unitaries M .

[67] V. Alba and P. Calabrese, Phys. Rev. B 100, 115150(2019).

[68] S. Maity, S. Bandyopadhyay, S. Bhattacharjee, andA. Dutta, Phys. Rev. B 101, 180301 (2020).

[69] J. Johansson, P. Nation, and F. Nori, Comp. Phys.Comm. 184, 1234 (2013).

[70] K. Zyczkowski, Open Systems and Information Dynamics10, 297 (2003).

[71] S. Rana, Phys. Rev. A 87, 054301 (2013).[72] J. M. Landsberg, Tensors: geometry and applications

(American Mathematical Society (AMS), 2012) p. 439.[73] J. C. Bridgeman and C. T. Chubb, J. Phys. A50, 223001

(2017).[74] R. Kueng, “Quantum and classical information process-

ing with tensors (lecture notes),” (2019), Caltech coursenotes: https://iqim.caltech.edu/classes.

Page 7: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

7

Appendix A: The p3-PPT condition

In this section we present, prove and discuss the p3-PPT condition. The p3-PPT condition is the contraposi-tive of the following statement about moments of positivesemidefinite matrices with unit trace.

Proposition 1. For every positive semidefinite matrixX with unit trace (Tr(X) = 1) it holds that

tr(X2)2 ≤ tr(X3). (A1)

Note that Eq. (A1) resembles the following well-knownmonotonicity relation among Renyi entropies (see e.g.,Ref. [70]):

S3(ρ) ≤ S2(ρ) for Sn(ρ) = 11−n log2

(tr(ρn)

). (A2)

However, this relation only applies to density matrices,i.e. positive semidefinite matrices with unit trace. Thep3-PPT condition, in contrast, is designed to test the ab-sence of positive semidefiniteness. Hence, it is crucial tohave a condition that does not break down if the matrixin question has negative eigenvalues. Rel. (A1) (and itsdirect proof provided in the next subsection) do achievethis goal, while an argument based on monotonicity rela-tions between Renyi entropies can break down, becausethe logarithm of non-positive numbers is not properlydefined.

1. Proof of the p3-PPT condition

Let X be a Hermitian d × d matrix with eigenvalue

decomposition X =∑di=1 λi|xi〉〈xi|. For p ≥ 1, we intro-

duce the Schatten-p norms

‖X‖p =( d∑i=1

|λi|p)1/p

= Tr (|X|p)1/p,

where |X| =√X2 =

∑di=1 |λi||xi〉〈xi| denotes the

(matrix-valued) absolute value. The Schatten-p normsencompass most widely used matrix norms in quantuminformation. Concrete examples are the trace norm (p =1), the Hilbert-Schmidt/Frobenius norm (p = 2) and theoperator/spectral norm (p =∞). Each Schatten-p normcorresponds to the usual vector `p-norm of the vector ofeigenvalues λ = (λ1, . . . , λd)

T ∈ Rd:

‖λ‖`p =( d∑i=1

|λi|p)1/p

for p ≥ 1. (A3)

Hence, Schatten-p norms inherit many desirable proper-ties from their vector-norm counterparts. Here, we shalluse vector norm relations to derive a relation amongSchatten-p norms. It is based on Hoelder’s inequalitythat relates the inner product

〈v, w〉 =

d∑i=1

viwi for v, w ∈ Rd (A4)

to a combination of `p norms.

Fact 1 (Hoelder’s inequality for vector norms). Fixp, q ≥ 1 such that 1/p+ 1/q = 1. Then,

|〈v, w〉| ≤d∑i=1

|viwi| ≤ ‖v‖`p‖w‖`q (A5)

for any v, w ∈ Rd.

The well-known Cauchy-Schwarz inequality is a specialcase of this fact. Set p = q = 1/2 to conclude

|〈v, w〉| ≤ ‖v‖`2‖w‖`2 = 〈v, v〉1/2〈w,w〉1/2. (A6)

At the heart of our proof for the p3-PPT condition is asimple relation between Schatten-p norms of orders p =1, 2, 3.

Lemma 1. The following norm relation holds for everyHermitian matrix X:

‖X‖42 ≤ ‖X‖1‖X‖33

Proof. Let λ = (λ1, . . . , λd)T be the d-dimensional vec-

tor of eigenvalues of X. Apply Hoelder’s inequality withp = 3, q = 3/2 to the inner product of this vector ofeigenvalues with itself:

Tr(X2) = 〈λ, λ〉 ≤ ‖λ‖`3‖λ‖`2/3 = ‖X‖3‖λ‖`3/2 . (A7)

Next, we apply Cauchy-Schwarz to the remaining `3/2-norm:

‖λ‖`3/2 =( d∑i=1

|λi|3/2)2/3

=( d∑i=1

|λi||λi|1/2)2/3

≤(( d∑

i=1

|λi|2)1/2( d∑

i=1

|λi|2/2)1/2)2/3

=‖λ‖2/3`2‖λ‖1/3`1

= ‖X‖2/32 ‖X‖1/31 .

Inserting this relation into Eq. (A7) reveals

‖X‖22 ≤ ‖X‖2/32 ‖X‖1/31 ‖X‖3

which is equivalent to the claim (take the 3rd power anddivide by ‖X‖22).

Proposition 1 is an immediate consequence of Lemma 1and elementary properties of positive semidefinite ma-trices. Recall that a Hermitian d × d matrix is pos-itive semidefinite (psd) if every eigenvalue is nonnega-tive. This in turn ensures |X| = X and, by extension,‖X‖p = Tr(Xp)1/p for all p ≥ 1.

Page 8: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

8

2. Discussion and potential generalizations

The p3-PPT condition tests the absence of positivesemidefiniteness based on moments Tr(Xp) of order p =1, 2, 3. It is natural to wonder whether higher order mo-ments allow the construction of more refined tests. It ispossible to show that every positive semidefinite matrixX with unit trace must obey

tr(Xp−1)p−1 ≤ tr(Xp)p−2 for all p > 2. (A8)

As this is a direct extension of the p3-PPT condition(p = 3), we omit the proof. Unfortunately, we found nu-merically that these direct extensions actually produceweaker tests for the absence of positive semidefiniteness,i.e. there exist matrices X that violate the p3-PPT con-dition but satisfy Rel. (A8) for higher moments p ≥ 4.This is not completely surprising, since Rel. (A8) com-pares (powers of) neighboring matrix moments with or-der (p − 1) and p. As p increases, these matrix mo-ments suppress contributions of small eigenvalues evermore strongly. In the case of partially transposed quan-tum states, the eigenvalues are required to sum up toone and must be contained in the interval [−1/2, 1] [71].Thus, the negative eigenvalues can never dominate thespectrum and high matrix moment tests for the existenceof negative eigenvalues suffer from suppression effects.

This observation suggests that powerful tests for neg-ative eigenvalues should involve all matrix momentstr(Xp) up to a certain order pmax. It is useful to changeperspective in order to reason about potential improv-ments. The p3-PPT condition checks whether the fol-lowing inequality is true:

F3(X) = −tr(X3) + tr(X2)2 > 0. (A9)

For matrices X with unit trace, we can reinterpret thematrix-valued function F3(X) as a sum of (identical)degree-3 polynomials applied to all eigenvalues λ1, . . . , λdof X. Set p2 = tr(X2) and use tr(X) =

∑di=1 λi = 1 to

conclude

F3(X) =− tr(X3) + 2p2tr(X2)− p22tr(X)

=

d∑i=1

(−λ3

i + 2p2λ2i − p2

2λi)

=

d∑i=1

−λi(λi − p2)2 =:

d∑i=1

f3(λi). (A10)

Note that the polynomial

f3(x) = −x(x− p2)2 for x ∈ R (A11)

depends on p2 and, by extension, also on the matrix X.We will come back to this aspect later. For now, wepoint out that – regardless of the actual value of p2 –this polynomial has three interesting properties:

f3(x) ≤ 0 if x > 0,f3(0) = 0,f3(x) > 0 if x < 0.

(A12)

−0.5 0.0 0.5 1.0x

−0.5

0.0

0.5

1.0 f3(x) = −x(x− p2)2

p2 = 0.25

p2 = 0.5

p2 = 1

max{−x, 0}

FIG. A.1. Comparison of f3(x) = −x(x − p2)2 with thenegated rectifier function r(−x) = max {−x, 0} for differentvalues of p2 in the relevant interval [−1/2, 1] [71].

These properties reflect the behavior of another well-known function – the (negated) rectifier function (ReLU):

r(−x) = max {0,−x} =

{0 if x ≥ 0,

|x| if x < 0.(A13)

See Figure A.1 for a visual comparison. Applying the(negated) rectifier function to the eigenvalues of X wouldrecover the negativity:

N (X) =∑λi<0

|λi| =d∑i=1

r(−λi). (A14)

Hence, it is instructive to interpret F3(X) as a polyno-mial approximation to the (non-analytic) negativity func-tion.

On the level of polynomials, the condition f3(x) ≤ 0whenever x > 0 is most important. It implies that pos-itive eigenvalues of X can never increase the value of

F3(X) =∑di=1 f3(λi). In particular, F3(X) ≤ 0 when-

ever X is positive semidefinite – as stated in Proposi-tion 1. The p3-PPT condition is sound, i.e. it has nofalse positives.

Conversely, f3(x) > 0 for x < 0 implies that F3(X) canbecome positive if X has negative eigenvalues. Hence,the p3-PPT condition is not vacuous. It is capable ofdetecting negative eigenvalues in many, but not all, unit-trace matrices X.

Let us now return to the (matrix-dependent) param-eter choice in Eq. (A11). In principle, every polynomial

of the form f(a)3 (x) = −x(x − a)2 with a ∈ R obeys the

important structure constraints (A12) and therefore pro-duces a sound test for negative eigenvalues. For fixed X,the associated matrix polynomial evaluates to

F(a)3 (X) = −tr(X3) + 2atr(X2)− a2tr(X). (A15)

Page 9: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

9

We can optimize this expression over the parametera ∈ R to make the test as strong as possible. The opti-mal choice is a] = tr(X2)/tr(X) and produces a matrix

polynomial that obeys F(a])3 (X) ≥ maxa∈R F

(a)3 (X) for

X fixed. If X has also unit trace, the optimal parame-ter becomes a] = p2 and produces the p3-PPT condition(A9).

This construction of PPT conditions readily extendsto higher order polynomials fp(x) = apx

p + · · · a1x+ a0.Increasing the degree p produces more expressive ansatzfunctions that can approximate the (negated) rectifierfunction – and its core properties – ever more accu-rately. Viewed from this angle, it becomes apparent thatmeasuring more matrix moments can produce strongertests for detecting negative eigenvalues. However, it isnot so obvious how to choose the parameters ap, . . . , a0

“optimally”, or what “optimally” actually means in thiscontext. Some well-known polynomial approximationsof the rectifier function r(−x) – like Taylor expansionsof s(−x) = ln(1 + e−x) (the “softplus” function) – arenot well-suited for this task, because s(−x) > 0 even forx > 0. This in turn would imply that the associated testcondition may not be sound. We believe that a thoroughanalysis of these questions is timely and interesting, butwould go beyond the scope of this work. We intend toaddress it in future research.

Appendix B: p3-PPT condition for Werner States

Werner states are bipartite quantum states in a Hilbertspace HAB = HA ⊗ HB with dimensions dA = dB ≡ d,defined as

ρW = α

(d+ 1

2

)−1

Π+ + (1− α)

(d

2

)−1

Π− (B1)

with parameter α ∈ [0, 1] and Π± = 12 (I±Π12) pro-

jectors onto symmetric H+ and anti-symmetric H− sub-spaces of H = H+ ⊕H−, respectively [27]. Here, Π12 =∑di,j=1 |i〉 〈j| ⊗ |j〉 〈i| is the swap operator. We note that

the eigenvalues of ρW are thus given as λ+ = α(d+1

2

)−1

with multiplicity(d+1

2

)and λ− = (1−α)

(d2

)−1with mul-

tiplicity(d2

). The reduced state ρA of qudit A is given by

ρA = TrB [ρW ] = IA/d.

Using furthermore that ΠTA± = 1/2(∆1 ± (d ± 1)∆0)

with ∆0 = |φ+〉 〈φ+| being a projector onto the maxi-mally entangled state and ∆1 = I−∆0 [27], we find

ρTA

W =2α− 1

d∆0 +

1 + d− 2α

d

∆1

d2 − 1(B2)

with eigenvalues λ0 = (2α− 1)/d with multiplicity 1 andλ1 = (1 + d− 2α)/d(d2 − 1) with multiplicity d2 − 1.

We note that, for any d, λ0 < 0 for 0 ≤ α < 1/2. Thus,using the PPT condition, we find that ρW is entangledfor 0 ≤ α < 1/2. Using the explicit expression of the

eigenvalues, we can furthermore determine Tr[(ρTA)n

]for any n. We find for all local dimensions d

Tr[(ρTA)2

]2> Tr

[(ρTA)3

]for 0 ≤ α < 1

2(B3)

Thus, for Werner states the p3-PPT condition is equiv-alent to the full PPT condition. It can be furthermoreshown that Werner states are separable for α ≥ 1/2 [27].Thus, for Werner states, the p3-PPT condition is a neces-sary and sufficient condition for bipartite entanglement.This also holds true for “isotropic” states of the formρ = α1/d2 + (1− α)|φ+〉〈φ+|, which are closely related.

We note that Werner states can have non-positive PT-moments. For local dimension d > 3 there exists a pa-rameter interval [0, α∗) such that the associated Werner

state (B1) obeys p3 = Tr[(ρTA

W )3]< 0 for all α ∈ [0, α∗).

This highlights that the logarithm of PT-moments, ap-pearing also in the ratio R3 = − log2(p3/Tr[ρ3]), neednot be properly defined, justifying a claim from the pre-vious subsection. It is difficult to use entropic argumentsfor reasoning about relations between (logarithmic) PT-moments.

Finally, as shown in Ref. [60], we remark that R3 isnot an entanglement monotone. For separable Wernerstates with 1/2 ≤ α < 1/2 + 1/(2d), it holds that 0 <p3 < Tr[ρ3]. Thus, R3 = − log2(p3/Tr[ρ3]) can be greaterthan zero, even for separable states. Since R3 equals zerofor all product states, it is not an entanglement monotone[2].

Appendix C: Comparison of entanglementconditions for quench dynamics

In this section, we compare the diagnostic power of thefull PPT-condition, the p3-PPT condition and a condi-tion based on purities of nested subsystems to detect bi-partite entanglement of mixed states. Specifically, givena reduced density matrix ρAB in a bipartite system AB,we consider:

1. the PPT-condition detecting bipartite entangle-ment between A and B for a strictly positive neg-ativity N (ρAB) =

∑λ<0 |λ| > 0, with λ the spec-

trum of ρTA

AB [2].

2. the p3-PPT condition detecting bipartite entangle-ment between A and B for 1− p3/p

22 > 0.

3. a condition based on the purity of nested subsys-tems detecting bipartite entanglement between Aand B for Tr[ρ2

A] < Tr[ρ2AB ] with ρA = TrB [ρAB ]

the reduced density matrix of subsystem A [2].

The latter ’purity’ condition was used in previous experi-mental works measuring the second Renyi entropy [7–10]to reveal bipartite entanglement of weakly mixed states.

To test these conditions, we consider here, as an exam-ple, quantum states generated via quench dynamics in

Page 10: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

10

−1.0

−0.5

0.0

0.5A = [4, 5], B = [6, 7]

−1

0

1A = [3, 4, 5], B = [6, 7, 8]

0 (0.0) 1.25 (0.5) 2.5 (1.0) 3.75 (1.6) 5 (2.1)t[ms] (J0t)

0

1

2

A = [2, 3, 4, 5], B = [6, 7, 8, 9]

N (ρAB) 1 − p3/p22 1−Tr[ρ2

A]/Tr[ρ2AB]

a)

b)

c)

FIG. C.1. Comparing conditions for bipartite entanglementbetween two subsystems A and B for states generated withquench dynamics governed by HXY arising from an initialNeel state in a total system with N = 10 spins. Modelingthe experiment of Ref. [10], we chose J0 = 420s−1, α = 1.24,while other parameter choices lead to similar results. In allpanels, and for all quantities, a strictly positive value signalsbipartite entanglement.

interacting spin models. Specifically, we study quenchesin the XY -model with long-range interactions, as definedin Eq. (6) of the main text, in a total system with N = 10spins. The initial separable product state is a Neel state|↑↓↑↓ . . .〉.

As shown in Fig. C.1, the negativity (red lines) de-tects bipartite entanglement for all partitions sizes andall times after the quench. The p3-PPT condition (bluelines) performs similar for the partitions considered inpanel (b) and (c) and is thus able to detect bipartiteentanglement for highly mixed states ρAB whose puritydecreases to 0.3 for the panel (b) at late times. Thep3-PPT conditions fails however to detect the entangle-ment for the close-to completely mixed states of smallpartitions |AB| = 4 at late times, displayed in panel (a).This can be attributed to the fact that the p3-PPT con-dition only relies on low order PT-moments. The puritycondition (green lines) is only useful for the detectionof entanglement for large partitions AB with |AB| = 8(panel (c)). These remain weakly mixed during the entiretime evolution, since the total system of N = 10 spins isdescribed here by a pure state.

Appendix D: Error bars for PT moment predictions

Let us first review the data acquisition procedure. Toobtain meaningful information about an N -qubit stateρ, we first perform a collection of random single qubitrotations: ρ 7→ uρu†, where u = u1 ⊗ · · · ⊗ uN and eachui is chosen from a unitary 3-design. Subsequently, weperform computational basis measurements and store the

outcome:

ρ 7→ uρu† 7→ |k1, . . . , kN 〉. (D1)

Here, k1, . . . , kN ∈ {0, 1} denote the measurement out-comes on qubits 1, . . . , N . As shown in [17, 18, 63], theoutcome of this measurement provides a (single-shot) es-timate for the unknown state:

ρ =

N⊗i=1

[3(ui)

† |ki〉 〈ki|ui − I2]

(D2)

This tensor product is a random matrix – the unitariesu(i), as well as the observed outcomes ki are random –that produces the true underlying state in expectation:

E [ρ] = ρ. (D3)

Thus, the result of a (randomly selected) single-shot mea-surement provides a classical snapshot (D2) that repro-duces the true underlying state in expectation. This de-sirable feature extends to density matrices of subsystems.Let AB ⊂ {1, . . . , N} be a subset of |AB| ≤ N qubitsand let ρAB = Tr¬AB(ρ) the associated reduced densitymatrix. Then,

ρAB = Tr¬AB(ρ) =⊗i∈AB

[3(ui)

† |ki〉 〈ki|ui − I2]

(D4)

obeys E [ρAB ] = ρAB .

This feature can be used to estimate linear propertiesof the subsystem in question: o = Tr (OρAB). PerformM independent repetitions of the data acquisition proce-dure and use them to create a collection of (independent)

snapshots ρ(1)AB , . . . , ρ

(M)AB – a “classical shadow” [18] – and

form the empirical average of subsystem properties:

o = 1M

M∑r=1

Tr(Oρ

(r)AB

). (D5)

Convergence to the target value o = E [o] = Tr(OρAB)is controlled by the variance. Chebyshev’s inequality as-serts

Pr [|o− o| ≥ ε] ≤ Var [o]

ε2=

Var [Tr(OAB ρAB)]

Mε2. (D6)

The remaining (single-shot) variance obeys the follow-ing useful relation.

Fact 2 (Proposition 3 in [18]). Fix a subsystem AB and alinear function Tr(OρAB). Then, the single-shot varianceassociated with ρAB defined in Eq. (D4) obeys

Var [Tr (OρAB)] ≤ 2|AB|Tr(O2). (D7)

This inequality is true for any underlying state ρ andbounds the variance in terms of the subsystem dimensiondA = 2|AB| and the Hilbert-Schmidt norm (squared) ofthe observable O. Thus, roughly M ≈ 2|AB|Tr(O2)/ε2

measurement repetitions are necessary to predict o up toaccuracy ε.

Page 11: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

11

1. Predicting quadratic properties (p2)

The formalism introduced above readily extends topredictions of higher order polynomials. The specialcase of quadratic functions has already been addressedin Refs. [10, 15–17], and Ref. [18] (for the present formal-ism). The key idea is to represent a quadratic functionin ρ as a linear function on the tensor product ρ⊗ ρ:

o = Tr (OρAB ⊗ ρAB) . (D8)

This function can be approximated by replacing ρ ⊗ ρwith a symmetric tensor product of two distinct snap-shots ρ(i), ρ(j) (i 6= j):

12!

∑π∈S2

ρ(π(i))AB ⊗ ρ(π(j))

AB

= 12

(i)AB ⊗ ρ

(j)AB + ρ

(j)AB ⊗ ρ

(i)AB

). (D9)

There are(M2

)different ways of combining a collection

of M snapshots ρ(1), . . . , ρ(M) in this fashion. We canpredict o = Tr(OρAB ⊗ ρAB) by forming the empiricalaverage over all of them:

o =

(M

2

)−1∑i<j

Tr

(O 1

2!

∑π∈S2

ρ(π(i))AB ⊗ ρ(π(j))

AB

)

=

(M

2

)−1∑i<j

Tr(O(s)ρ

(i)AB ⊗ ρ

(j)AB

). (D10)

Here, we have implicitly defined the symmetrization O(s)

of the original target function O. This ansatz is a specialcase of Hoeffding’s U-statistics estimator [64]. Averagingboosts convergence to the desired expectation E [o] = oand the speed of convergence is controlled by the variance(D6).

Restriction to subsystems is also possible. Supposethat O only acts nontrivially on a subsystem AB of bothstate copies. Then,

o =

(M

2

)−1∑i<j

Tr(O(s)ρ

(i)AB ⊗ ρ

(j)AB

)(D11)

and the effective problem dimension becomes d2AB =

4|AB|. The tensor product structure (D2) of the indi-vidual snapshots allows for generalizing linear variance

bounds to this setting. Simply view ρ(i)AB ⊗ ρ

(j)AB as a sin-

gle snapshot of the quantum state ρAB ⊗ ρAB . Fact 2then ensures

Var[Tr(O(s)ρ

(i)AB ⊗ ρ

(j)AB

)]≤ 4|AB|Tr

(O2

(s)

). (D12)

The full variance of o is controlled in part by this relation,but also features linear variance terms [18, App. 6.A].Rather than reviewing this argument in full generality,let us focus on the task at hand: computing the variance

associated with predicting the PT-moment of order two.Fix a bipartite subsystem AB of interest and rewrite p2

as

p2 = Tr(

(ρ(TA)AB )2

)= Tr

(ρ2AB

)= Tr (ΠABρAB ⊗ ρAB) . (D13)

Here, ΠAB denotes the swap operator that permutes theentire subsystems AB within two copies of the globalsystem. We refer to Table I below for a visual deriva-tion of this well-known relation. The swap operator issymmetric under permuting tensor factors, Hermitian

(Π†AB = ΠAB) and orthogonal (Π2AB = IAB). These

properties ensure that the associated general estimator(D11) can be simplified considerably:

p2 =

(M

2

)−1∑i<j

Tr(

ΠAB ρ(i)AB ⊗ ρ

(j)AB

)=

(M

2

)−1∑i<j

Tr(ρ

(i)AB ρ

(j)AB

). (D14)

By construction, E [p2] = p2 = Tr(ρ2) and the speed ofconvergence is controlled by the variance. This variancedecomposes into a linear and a quadratic part. We ex-pand the definition of the variance:

Var [p2] = E[(p2 − E[p2])2

]= E

[p2

2

]− E [p2]

2(D15)

=

(M

2

)−2∑i<j

∑k<l

(Tr(ρ

(i)AB ρ

(j)AB

)Tr(ρ

(k)AB ρ

(l)AB

)− Tr(ρ2

AB)2).

The size and nature of each contribution depends on therelation between the indices i, j, k, l [64]:

1. all indices are distinct: distinct indices label in-dependent snapshots. In this case the expec-tation value factorizes completely and produces

E[Tr(ρ

(i)AB ρ

(j)AB)Tr(ρ

(k)AB ρ

(l)AB)

]= Tr(ρ2

AB)2. This is

completely offset by the subtraction of the expecta-tion value squared. Hence, terms where all indicesare distinct do not contribute to the variance.

2. exactly two indices coincide: In this case,the expectation value partly factorizes, e.g.

E[Tr(ρ

(i)AB ρ

(j)AB)Tr(ρ

(k)AB ρ

(j)AB)

]= E

[Tr(ρAB ρ

(j)AB)2

]for i 6= k and j = l. Such index combinationsproduce a linear variance term Var [Tr(Oρ)] with

O = ρAB . The entire sum contains(M2

)(21

)(M−22−1

)=(

M2

)2(M − 2) terms of this form.

3. two pairs of indices coincide: there are(M2

)(22

)(M−22−2

)=(M2

)contributions of this form

and each of them produces a quadratic varianceVar [Tr (OρAB ⊗ ρ′AB)] with O = ΠAB (swap).

We conclude that the variance of p2 decomposes into lin-ear and quadratic terms. These can be controlled via

Page 12: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

12

Rel. (D7) and Rel. (D12), respectively:

Var [p2] =

(M

2

)−1 (2(M − 2)Var [Tr(ρAB ρAB)]

+Var[Tr(ΠAB ρ

(1)AB ⊗ ρ

(2)AB)

])≤4(M − 2)2|AB|

M(M − 1)Tr(ρ2AB

)+

2× 4|AB|

M(M − 1)Tr(Π2AB

)≤4

(2|AB|p2

M

)+ 4

(21.5|AB|

M

)2

. (D16)

Chebyshev’s inequality (D6) allows us to translate thisinsight into an error bound.

Lemma 2 (Error bound for estimating p2). Fix a subsys-tem AB of interest and suppose that we wish to estimatep2 = Tr

((ρTA

AB)2). For ε, δ > 0, a total of

M ≥ 8 max

{2|AB|p2

ε2δ,

21.5|AB|

ε√δ

}(D17)

snapshots suffice to ensure that the estimator (D14) obeys|p2 − p2| ≤ ε with probability at least 1− δ.

It is worthwhile to briefly discuss this two-pronged er-ror bound. Asymptotically, i.e. for M →∞, the approx-imation error decays at a rate proportional to 1/

√M .

This is the expected asymptotic decay rate for an estima-tion procedure that relies on empirical averaging (MonteCarlo). The actual rate is also multiplicative, i.e. the ap-proximation error is proportional to the target p2. In thepractically more relevant, non-asymptotic setting, thingscan look strikingly different. For small and moderatesample sizes M , the variance bound (D16) is dominatedby the next-to-leading order term (21.5|AB| > 2|AB|p2,especially if p2 is small). Lemma 2 captures this discrep-ancy and heralds an error decay rate proportional to 1/Min this regime.

Finally, we point out that the dependence on δ inEq. (D17) can be considerably improved by using me-dian of means estimation [18]: split the total data intoequally sized chunks, construct independent estimatorsand take their median. For this procedure, a samplingrate proportional to log(1/δ) suffices. Moreover, medianof means is much more robust towards outlier corruptionand allows for using the same data to predict puritiesof many different subsystems simultaneously. This, how-ever, comes at the price of somewhat larger constants inthe error bound (D17) and heralds a tradeoff. In sta-tistical terms, median of means estimation dramaticallyincreases confidence levels (1 − δ) at the cost of slightlylarger error bars (confidence intervals). This tradeoff be-comes advantageous when one attempts to predict verymany properties from a single data set.

2. Predicting cubic properties (p3 and Tr(ρ3AB))

Cubic properties can be predicted in much the samefashion as quadratic properties [18]. Write o =

Tr (OρAB ⊗ ρAB ⊗ ρAB) and approximate ρ ⊗ ρ ⊗ ρ bya symmetric tensor product of three distinct snapshots

ρ(i)AB , ρ

(j)AB , ρ

(k)AB :

13!

∑π∈S3

ρ(π(i))AB ⊗ ρ(π(j))

AB ⊗ ρ(π(k))AB . (D18)

There are(M3

)different ways of combining a collection of

M (independent) snapshots ρ(1)AB , . . . , ρ

(M)AB in this fash-

ion. We estimate the cubic function o by averaging overall of them (U-statistics [64]):

o =

(M

3

)−1 ∑i<j<k

Tr

(O 1

3!

∑π∈S3

ρ(π(i)) ⊗ ρ(π(j)) ⊗ ρ(π(k))

).

(D19)

Once more, the variance controls the rate of convergenceto the desired target value E [o] = Tr (Oρ⊗ ρ⊗ ρ). Thisvariance decomposes into a linear, a quadratic and a cu-bic part. The argument is a straightforward generaliza-tion of the analysis from the previous subsection. Ratherthan repeating the steps in full generality, we directly fo-cus on the 3rd order PT-moment p3 of a subsystem AB:

p3 = Tr(

(ρTA

AB)3). (D20)

For notational simplicity, we suppress the subscript ABindicating the subsystem of interest and label the shad-

ows by lower-case indices: ρ(i)AB 7→ ρi for i = 1, . . . ,M .

Due to the cyclicity of the trace, the U-statistics estima-tor simplifies to(M

3

)p3 =

∑i<j<k

Tr

(13!

∑π∈S3

ρTA

π(i)ρTA

π(j)ρTA

π(k)

)(D21)

=∑i<j<k

12

(Tr(ρTAi ρTA

j ρTA

k

)+ Tr

(ρTAj ρTA

i ρTA

k

)),

where we have moved the normalization factor(M3

)−1

to the left hand side in order to to increase readabil-ity. When computing the variance, we need to considertwo sums over triples of distinct indices in {1, . . . ,M}.If all indices are distinct, the overall contribution van-ishes. Otherwise the contribution depends on the numberc ∈ {1, 2, 3} of indices the triples have in common. Thenumber of distinct choices for two triples with exactly cintegers in common is

(M3

)(3c

)(M−33−c

)and we infer(

M

3

)Var [p3]

=

(3

1

)(M − 3

2

)Var

[Tr(

(ρTA)2ρTA1

)]+

(3

2

)(M − 3

1

)Var

[Tr(ρTA 1

2

(ρTA

1 ρTA2 + ρTA

2 ρTA1

))]+Var

[12

(Tr(ρTA

1 ρTA2 ρTA

3

)+ Tr

(ρTA

2 ρTA1 ρTA

3

))]≤(M

3

)(9

ML+

18

M2Q+

12

M3C

). (D22)

Page 13: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

13

Here, ρ1, ρ2, ρ3 denote independent, random realizationsof the snapshot ρ and we have introduced place-holdersfor linear (L), quadratic (Q) and cubic (C) contributions,respectively. For the task at hand, these contributionscan be bounded individually and depend on the subsys-tem size AB:

1. linear contribution: set O = (ρTA

AB)2 for notationalbrevity. We can use Tr(OρTA) = Tr(OTA ρ) to ab-sorb the partial transpose in the linear function.Rel. (D7) then ensures

L ≤ 2|AB|Tr(ρ2)2, (D23)

where we have also used Tr((OTA)2) = Tr(O2), aswell as Tr(O2) = ‖O‖22 ≤ ‖O‖21 = Tr(O)2, becauseO = ρ2 is psd.

2. quadratic contribution: We can bring12

(Tr(ρTA ρTA

1 ρTA2 ) + Tr(ρTA ρTA

2 ρTA1

)into the

canonical form Tr(Oρ

(1)AB ⊗ ρ

(2)AB

)by introducing

O = 12 (ΠA(ρ⊗ IAB)ΠB

+ΠB(ρ⊗ IAB)ΠA) . (D24)

We refer to Table I for a visual derivation. Here,ΠA and ΠB are permutation operators that swapthe two A- and B-systems, respectively. Rel. (D12)then ensures

Q ≤ 22|AB|Tr(O2)≤ 23|AB|Tr(ρ2). (D25)

The final estimate follows from exploiting Π2A =

Π2B = IAB , as well as Tr

(ρ2 ⊗ I2AB

)= 2|AB|Tr(ρ2).

3. cubic contribution: We can bring the cubic func-tion 1

2

(Tr(ρTA

1 ρTA2 ρTA

3 ) + Tr(ρTA2 ρTA

1 ρTA3 ))

into the

canonical form Tr(Oρ1 ⊗ ρ2 ⊗ ρ3

)by introducing

O = 12

(−→ΠA ⊗

←−ΠB +

−→Π †A ⊗

←−Π †B

), (D26)

see Table I below. Here,−→ΠA is a cyclic permuta-

tion that exchanges all A-systems in a “forward”fashion (A1 7→ A2, A2 7→ A3, A3 7→ A1), while←−ΠB is another cyclic permutation that exchangesall B-systems in a “backwards” fashion (B3 7→ B2,B2 7→ B1, B1 7→ B3). A staightforward extensionof Rel. (D12) to cubic functions implies

C ≤ 23|AB|Tr(O2) ≤ 26|AB|, (D27)

because permutations are orthogonal (ΠΠ† = I)and Tr(O2) is dominated by Tr(IAB⊗IAB⊗IAB) =23|AB|.

Inserting these bounds into the variance formula for p3

reveals

Var [p3] ≤ 9

ML+

18

M2Q+

12

M3C (D28)

≤92|AB|

MTr(ρ2)2 + 18

23|AB|

M2Tr(ρ2) + 12

26|AB|

M3.

Combining this insight with Chebyshev’s inequality (D6)produces a suitable error bound. Recall that p2 =Tr((ρTA)2

)= Tr(ρ2) ∈ [2−|AB|, 1] denotes the purity of

the subsystem in question.

Lemma 3 (Error bound for estimating p3). Fix a subsys-tem AB of interest and suppose that we wish to estimatep3 = Tr

((ρTA)3

). For ε, δ > 0, a total of

M ≥ 39 max

{2|AB|p2

2

ε2δ,

21.5|AB|p2

ε√δ

,22|AB|

ε2/3δ1/3

}(D29)

snapshots suffice to ensure that the estimator (D21) obeys|p3 − p3| ≤ ε with probability at least 1− δ.

This bound on the sampling rate provides different er-ror decay rates for different regimes. For M → ∞, thefirst term in the maximum dominates and the error de-cays at an asymptotically unavoidable rate proportionalto 1/

√M . Conversely, for very small sample sizes M , the

third term dominates and conveys a much larger decayrate proportional to 1/M3/2. In the intermediate regime,the second term may dominate and lead to a inverse lin-ear decay rate 1/M , instead. The dependence on the er-ror parameter δ can once more be considerably improved(from 1/δ to log(1/δ)) by using median of means estima-tion. This refinement also allows for using the same datato predict the cubic PT-moment of very many subsys-tems simultaneously [18].

Finally, we point out that the estimation error fors3 = ‖ρ‖33 = Tr(ρ3) can be bounded in exactly the samefashion. For ε, δ > 0, a sampling rate M that obeysRel. (D29) also ensures that the U-statistics estimatorU-statistics estimator

s3 =

(M

3

)−1 ∑i<j<k

12

(Tr(ρiρj ρk

)+Tr

(ρj ρiρk

))(D30)

obeys |s3 − s3| ≤ ε with probability 1− δ.The proof is almost identical to the p3-analysis and we

leave it as an exercise for the dedicated reader.

3. Additional numerical simulations

Here, we complement Fig. 2 of the MT by showing inFig. D.1 statistical errors in the estimation of p2 and p3

for the ground state of the transverse Ising model H =J(∑i σ

xi σ

xi+1 + σzi ) at criticality. We observe the same

scaling behavior as in the case of the GHZ state. Forp2 [panel a)], there are indeed two regimes with different

decay rates (1/M and 1/√M). For p3 [panel b)], the

latter two decay rates 1/M and 1/√M are also clearly

visible. In contrast, the early regime decay rate is not aspronounced. This is likely due to limited system sizes –1/M3/2 does appropriately capture the decay of red dots(largest system size considered) in the top left corner,but seems to be absent in decay rates for smaller systemsizes.

Page 14: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

14

100 101 102 103 104

M/2|AB|

10−3

10−2

10−1

100

101E

rr.p 2

|AB| = 2

|AB| = 4

|AB| = 6

|AB| = 8

100 101 102 103 104

M/2|AB|

10−3

10−2

10−1

100

101

Err

.p 3

|AB| = 2

|AB| = 4

|AB| = 6

|AB| = 8

a) b)

FIG. D.1. Statistical errors for the ground state of the trans-verse field Ising model. Dashed lines represent scalings of∝ 1/M , and ∝ 1/

√M . In all cases, the number of measure-

ments to estimate p2 a) and p3 b) with accuracy 0.1 is of the

order of 100× 2|AB|.

Appendix E: Auxiliary results and wiring diagrams

The arguments from the previous subsections make useof identities satisfied by traces of partial transposes ofbipartite operators. Wiring diagrams – also known astensor network diagrams – provide a useful pictorial cal-culus for deriving such identifies. We refer the interestedreader to Refs. [72–74] for a thorough introduction andcontent ourselves here with a concise overview that willsuffice for the purposes at hand. The wiring formalismrepresents operators as boxes with lines emanating fromthem. These lines represent contra- (on the left) andco-variant indices (on the right):

X =∑i,j

[Xij ] |i〉〈j| = Xi j . (E1)

Two operators X and Y can be multiplied to produce an-other operator. This corresponds to an index contractionand is represented in the following fashion:

XY =∑i,k

(∑j

[X]ij [X]jk)|i〉〈k| = X Yji k .

(E2)

Transposition exchanges outgoing (contravariant) and in-coming (covariant) indices

XT =∑i,j

[X]ij |j〉〈i| = Xj i , (E3)

while the trace pairs up both indices and sums over them:

Tr(X) =∑i

[X]ii = X

i

= X . (E4)

We abbreviate this loop (contraction of leftmost andrightmost indices) by putting two circles at the end pointsof lines that should be contracted. This notation is notstandard, but will considerably increase the readabilityof more complex contraction networks.

This basic formalism readily extends to tensor prod-ucts if we arrange tensor product factors in parallel. Forinstance, a bipartite operator features two parallel lineson the left and on the right:

XAB = XAB

A A

B B(E5)

The upper lines represent the system A, while the lowerlines represent systemB. Two important bipartite opera-tors are the identity I (do nothing) and the swap operatorΠ that exchanges the systems:

I = and Π = . (E6)

Rules for multiplying and contracting operators readilyextend to the tensor setting. This allows us to reformu-late well-known expressions pictorially. For instance,

Tr(XY ) = X Y =X

Y(E7)

=Tr (ΠX ⊗ Y ) . (E8)

The wiring formalism is also exceptionally well-suited tocapture partial operations, like the partial transpose:

XTA

AB = XAB. (E9)

These elementary rules can be used to visually representmore complicated expressions – like a trace of multiplepartial transposes. The wiring formalism provides a pic-torial representation for such objects and a visual frame-work for modifying them. In particular, it is possible tobend, as well as unentangle, index lines and rearrangetensor factors at will. Table I collects several such mod-ifications that are important for the arguments above.

Page 15: Err. p · 3 timator ˆ^(r) AB= O i2AB h 3(u(r) i) yjk(r) i ihk (r) i ju (r) i I2 i (3) of the density matrix ˆ AB, i.e. E[^ˆ (r) AB] = ˆ AB with the expectation value taken over

15

expression diagram representation diagram reformulation modified expression

Tr(XTAABY

TAAB ) XAB YAB XAB YAB Tr (XABYAB)

Tr (XABYAB) XAB YAB

ΠB

ΠA YAB

XAB

Tr (ΠBΠAXAB ⊗ YAB)

ΠB ,ΠA: swaps

Tr(ρTAABX

TAABY

TAAB ) ρAB XAB YAB

ρAB

IAB

ΠB

ΠA YAB

XAB

Tr (ΠB(ρAB ⊗ IAB)ΠAXAB ⊗ YAB)

Tr(ρTAABY

TAABX

TAAB) ρAB YAB XAB

ρAB

IABΠA

ΠB

YAB

XAB

Tr (ΠA(ρAB ⊗ IAB)ΠBXAB ⊗ YAB)

Tr(XTA

ABYTAABZ

TAAB

)XAB YAB ZAB

−→ΠB

←−ΠA ZAB

YAB

XAB

Tr(−→

ΠB←−ΠAXAB ⊗ YAB ⊗ ZAB

)−→ΠB ,

←−ΠA: cycle permutations

Tr(Y TAABX

TAABZ

TAAB

)YAB XAB ZAB

−→ΠA

←−ΠB

ZAB

YAB

XAB

Tr(←−

ΠA−→ΠBXAB ⊗ YAB ⊗ ZAB

)←−ΠA−→ΠB =

(−→ΠB←−ΠA

)†

TABLE I. Reformulations of relevant tensor product expressions: The variance bounds in Sub. D 1 and Sub. D 2 are contingenton bringing certain expressions into canonical form, i.e. Tr (OXAB ⊗ YAB) for bilinear functions and Tr (O′XAB ⊗ YAB ⊗ ZAB)for trilinear ones. This table supports visual derivations for these reformulations. Expressions of interest (very left) are firsttranslated into wiring diagrams (center left). Subsequently, the rules of wiring calculus are used to re-arrange the diagrams(center right). Translating them into formulas (very right) produces equivalent expressions that respect the desired structure.