TensorFlow Quantum: A Software Framework for Quantum ... · 3/9/2020 · Jarrod R. McClean,...

TensorFlow Quantum:A Software Framework for Quantum Machine Learning

Michael Broughton,1, 9, ∗ Guillaume Verdon,1, 2, 8, 10, † Trevor McCourt,1, 11 Antonio J. Martinez,1, 8, 12

Jae Hyeon Yoo,3 Sergei V. Isakov,4 Philip Massey,5 Murphy Yuezhen Niu,1 Ramin Halavati,6

Evan Peters,8, 10, 13 Martin Leib,14 Andrea Skolik,14, 15, 16, 17 Michael Streif,14, 16, 17, 18 David Von Dollen,19

Jarrod R. McClean,1 Sergio Boixo,1 Dave Bacon,7 Alan K. Ho,1 Hartmut Neven,1 and Masoud Mohseni1, ‡

1Google Research, Venice, CA 902912X, Mountain View, CA 94043

3Google Research, Seoul, South Korea4Google Research, Zurich, Switzerland

5Google, New York, NY6Google, Munich, Germany

7Google Research, Seattle, WA 981038Institute for Quantum Computing, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada

9School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada10Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada

11Department of Mechanical & Mechatronics Engineering,University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada

12Department of Physics & Astronomy, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada13Fermi National Accelerator Laboratory, P.O. Box 500, Batavia, IL, 605010

14Data:Lab, Volkswagen Group, Ungererstr. 69, 80805 Mnchen, Germany15Ludwig Maximilian University, 80539 Mnchen, Germany

16Quantum Artificial Intelligence Laboratory, NASA Ames Research Center (QuAIL)17USRA Research Institute for Advanced Computer Science (RIACS)

18University Erlangen-Nrnberg (FAU), Institute of Theoretical Physics, Staudtstr. 7, 91058 Erlangen, Germany19Volkswagen Group Advanced Technologies, San Francisco, CA 94108

(Dated: March 9, 2020)

We introduce TensorFlow Quantum (TFQ), an open source library for the rapid prototypingof hybrid quantum-classical models for classical or quantum data. This framework offers high-levelabstractions for the design and training of both discriminative and generative quantum models underTensorFlow and supports high-performance quantum circuit simulators. We provide an overviewof the software architecture and building blocks through several examples and review the theoryof hybrid quantum-classical neural networks. We illustrate TFQ functionalities via several basicapplications including supervised learning for quantum classification, quantum control, and quantumapproximate optimization. Moreover, we demonstrate how one can apply TFQ to tackle advancedquantum learning tasks including meta-learning, Hamiltonian learning, and sampling thermal states.We hope this framework provides the necessary tools for the quantum computing and machinelearning research communities to explore models of both natural and artificial quantum systems,and ultimately discover new quantum algorithms which could potentially yield a quantum advantage.

CONTENTS

I. Introduction 2A. Quantum Machine Learning 2B. Hybrid Quantum-Classical Models 3C. Quantum Data 4D. TensorFlow Quantum 4

II. Software Architecture & Building Blocks 5A. Cirq 5B. TensorFlow 5C. Technical Hurdles in Combining Cirq with

TensorFlow 6

∗ [email protected]† [email protected]‡ [email protected]

D. TFQ architecture 61. Design Principles and Overview 62. The Abstract TFQ Pipeline for a specific

hybrid discriminator model 83. Hello Many-Worlds: Binary Classifier for

Quantum Data 8E. TFQ Building Blocks 10

1. Quantum Computations as Tensors 102. Composing Quantum Models 103. Sampling and Expectation Values 104. Differentiating Quantum Circuits 115. Simplified Layers 12

F. Quantum Circuit Simulation with qsim 121. Comment on the Simulation of Quantum

Circuits 122. Gate Fusion with qsim 123. Hardware-Specific Instruction Sets 134. Benchmarks 13

arX

iv:2

003.

0298

9v1

[qu

ant-

ph]

6 M

ar 2

020

mailto:[email protected]



2

III. Theory of Hybrid Quantum-Classical MachineLearning 14

A. Quantum Neural Networks 14

B. Sampling and Expectations 14

C. Gradients of Quantum Neural Networks 15

1. Finite difference methods 15

2. Parameter shift methods 15

3. Stochastic Parameter Shift GradientEstimation 16

D. Hybrid Quantum-Classical ComputationalGraphs 17

1. Hybrid Quantum-Classical NeuralNetworks 17

E. Autodifferentiation through hybridquantum-classical backpropagation 18

IV. Basic Quantum Applications 19

A. Hybrid Quantum-Classical ConvolutionalNeural Network Classifier 19

1. Background 19

2. Implementations 20

B. Hybrid Machine Learning for QuantumControl 23

1. Time-Constant Hamiltonian Control 23

2. Time-dependent Hamiltonian Control 24

C. Quantum Approximate Optimization 25

1. Background 25

2. Implementation 26

V. Advanced Quantum Applications 27

A. Meta-learning for Variational QuantumOptimization 27

B. Vanishing Gradients and Adaptive LayerwiseTraining Strategies 28

1. Random Quantum Circuits and BarrenPlateaus 28

2. Layerwise quantum circuit learning 29

C. Hamiltonian Learning with Quantum GraphRecurrent Neural Networks 30

1. Motivation: Learning Quantum Dynamicswith a Quantum Representation 30

2. Implementation 31

D. Generative Modelling of Quantum MixedStates with Hybrid Quantum-ProbabilisticModels 33

1. Background 33

2. Variational Quantum Thermalizer 34

3. Quantum Generative Learning fromQuantum Data 36

VI. Closing Remarks 36

VII. Acknowledgements 37

References 37

I. INTRODUCTION

A. Quantum Machine Learning

Machine learning (ML) is the construction of algo-rithms and statistical models which can extract infor-mation hidden within a dataset. By learning a modelfrom a dataset, one then has the ability to make predic-tions on unseen data from the same underlying probabil-ity distribution. For several decades, research in machinelearning was focused on models that can provide theoret-ical guarantees for their performance [1–4]. However, inrecent years, methods based on heuristics have becomedominant, partly due to an abundance of data and com-putational resources [5].

Deep learning is one such heuristic method which hasseen great success [6, 7]. Deep learning methods arebased on learning a representation of the dataset in theform of networks of parameterized layers. These parame-ters are then tuned by minimizing a function of the modeloutputs, called the loss function. This function quantifiesthe fit of the model to the dataset.

In parallel to the recent advances in deep learning,there has been a significant growth of interest in quantumcomputing in both academia and industry [8]. Quan-tum computing is the use of engineered quantum sys-tems to perform computations. Quantum systems aredescribed by a generalization of probability theory al-lowing novel behavior such as superposition and entan-glement, which are generally difficult to simulate witha classical computer [9]. The main motivation to builda quantum computer is to access efficient simulation ofthese uniquely quantum mechanical behaviors. Quan-tum computers could one day accelerate computationsfor chemical and materials development [10], decryption[11], optimization [12], and many other tasks. Google’srecent achievement of quantum supremacy [13] markedthe first glimpse of this promised power.

How may one apply quantum computing to practicaltasks? One area of research that has attracted consider-able interest is the design of machine learning algorithmsthat inherently rely on quantum properties to acceleratetheir performance. One key observation that has led tothe application of quantum computers to machine learn-ing is their ability to perform fast linear algebra on astate space that grows exponentially with the number ofqubits. These quantum accelerated linear-algebra basedtechniques for machine learning can be considered thefirst generation of quantum machine learning (QML) al-gorithms tackling a wide range of applications in both su-pervised and unsupervised learning, including principalcomponent analysis [14], support vector machines [15], k-means clustering [16], and recommendation systems [17].These algorithms often admit exponentially faster solu-tions compared to their classical counterparts on certaintypes of quantum data. This has led to a significantsurge of interest in the subject [18]. However, to applythese algorithms to classical data, the data must first

3

be embedded into quantum states [19], a process whosescalability is under debate [20]. Additionally, there is ascaling variation when these algorithms are applied toclassical data mostly rendering the quantum advantageto become polynomial [21]. Continuing debates aroundspeedups and assumptions make it prudent to look be-yond classical data for applications of quantum compu-tation to machine learning.

With the availability of Noisy Intermediate-ScaleQuantum (NISQ) processors [22], the second generationof QML has emerged [8, 12, 18, 23–43]. In contrast tothe first generation, this new trend in QML is based onheuristic methods which can be studied empirically dueto the increased computational capability of quantumhardware. This is reminiscent of how machine learningevolved towards deep learning with the advent of newcomputational capabilities [44]. These new algorithmsuse parameterized quantum transformations called pa-rameterized quantum circuits (PQCs) or Quantum Neu-ral Networks (QNNs) [23, 42]. In analogy with classicaldeep learning, the parameters of a QNN are then opti-mized with respect to a cost function via either black-boxoptimization heuristics [45] or gradient-based methods[46], in order to learn a representation of the trainingdata. In this paradigm, quantum machine learning is thedevelopment of models, training strategies, and inferenceschemes built on parameterized quantum circuits.

B. Hybrid Quantum-Classical Models

Near-term quantum processors are still fairly small andnoisy, thus quantum models cannot disentangle and gen-eralize quantum data using quantum processors alone.NISQ processors will need to work in concert with clas-sical co-processors to become effective. We anticipatethat investigations into various possible hybrid quantum-classical machine learning algorithms will be a produc-tive area of research and that quantum computers willbe most useful as hardware accelerators, working in sym-biosis with traditional computers. In order to understandthe power and limitations of classical deep learning meth-ods, and how they could be possibly improved by in-corporating parameterized quantum circuits, it is worthdefining key indicators of learning performance:

Representation capacity : the model architecture hasthe capacity to accurately replicate, or extract useful in-formation from, the underlying correlations in the train-ing data for some value of the model’s parameters.

Training efficiency : minimizing the cost function viastochastic optimization heuristics should converge to anapproximate minimum of the loss function in a reason-able number of iterations.

Inference tractability: the ability to run inference ona given model in a scalable fashion is needed in order tomake predictions in the training or test phase.

Generalization power : the cost function for a givenmodel should yield a landscape where typically initialized

and trained networks find approximate solutions whichgeneralize well to unseen data.

In principle, any or all combinations of these at-tributes could be susceptible to possible improvementsby quantum computation. There are many ways to com-bine classical and quantum computations. One well-known method is to use classical computers as outer-loop optimizers for QNNs. When training a QNN witha classical optimizer in a quantum-classical loop, theoverall algorithm is sometimes referred to as a Varia-tional Quantum-Classical Algorithm. Some recently pro-posed architectures of QNN-based variational quantum-classical algorithms include Variational Quantum Eigen-solvers (VQEs) [28, 47], Quantum Approximate Opti-mization Algorithms (QAOAs) [12, 27, 48, 49], Quan-tum Neural Networks (QNNs) for classification [50, 51],Quantum Convolutional Neural Networks (QCNN) [52],and Quantum Generative Models [53]. Generally, thegoal is to optimize over a parameterized class of compu-tations to either generate a certain low-energy wavefunc-tion (VQE/QAOA), learn to extract non-local informa-tion (QNN classifiers), or learn how to generate a quan-tum distribution from data (generative models). It isimportant to note that in the standard model architec-ture for these applications, the representation typicallyresides entirely on the quantum processor, with classicalheuristics participating only as optimizers for the tun-able parameters of the quantum model. Various formsof gradient descent are the most popular optimizationheuristics, but an obstacle to the use of gradient descentis the effect of barren plateaus [51], which generally ariseswhen a network lacking structure is randomly initialized.Strategies for overcoming these issues are discussed indetail in section V B.

While the use of classical processors as outer-loop op-timizers for quantum neural networks is promising, thereality is that near-term quantum devices are still fairlynoisy, thus limiting the depth of quantum circuit achiev-able with acceptable fidelity. This motivates allowing asmuch of the model as possible to reside on classical hard-ware. Several applications of quantum computation haveventured beyond the scope of typical variational quantumalgorithms to explore this combination. Instead of train-ing a purely quantum model via a classical optimizer,one then considers scenarios where the model itself is ahybrid between quantum computational building blocksand classical computational building blocks [54–57] andis trained typically via gradient-based methods. Suchscenarios leverage a new form of automatic differentia-tion that allows the backwards propagation of gradientsin between parameterized quantum and classical compu-tations. The theory of such hybrid backpropagation willbe covered in section III C.

In summary, a hybrid quantum-classical model is alearning heuristic in which both the classical and quan-tum processors contribute to the indicators of learningperformance defined above.

4

C. Quantum Data

Although it is not yet proven that heuristic QML couldprovide a speedup on practical classical ML applications,there is some evidence that hybrid quantum-classical ma-chine learning applications on “quantum data” could pro-vide a quantum advantage over classical-only machinelearning for reasons described below. Abstractly, anydata emerging from an underlying quantum mechanicalprocess can be considered quantum data. This can be theclassical data resulting from quantum mechanical experi-ments, or data which is directly generated by a quantumdevice and then fed into an algorithm as input. A quan-tum or hybrid quantum-classical model will be at leastpartially represented by a quantum device, and thereforehave the inherent capacity to capture the characteristicsof a quantum mechanical process. Concretely, we listpractical examples of classes of quantum data, which canbe routinely generated or simulated on existing quantumdevices or processors:

Quantum simulations: These can include output statesof quantum chemistry simulations used to extract infor-mation about chemical structures and chemical reactions.Potential applications include material science, computa-tional chemistry, computational biology, and drug discov-ery. Another example is data from quantum many-bodysystems and quantum critical systems in condensed mat-ter physics, which could be used to model and designexotic states of matter which exhibit many-body quan-tum effects.

Quantum communication networks: Machine learn-ing in this class of systems will be related to distillingsmall-scale quantum data; e.g., to discriminate amongnon-orthogonal quantum states [42, 58], with applicationto design and construction of quantum error correctingcodes for quantum repeaters, quantum receivers, and pu-rification units.

Quantum metrology : Quantum-enhanced high preci-sion measurements such as quantum sensing and quan-tum imaging are inherently done on probes that aresmall-scale quantum devices and could be designed orimproved by variational quantum models.

Quantum control : Variationally learning hybridquantum-classical algorithms can lead to new optimalopen or closed-loop control [59], calibration, and errormitigation, correction, and verification strategies [60] fornear-term quantum devices and quantum processors.

Of course, this is not a comprehensive list of quantumdata. We hope that, with proper software tooling, re-searchers will be able to find applications of QML in allof the above areas and other categories of applicationsbeyond what we can currently envision.

D. TensorFlow Quantum

Today, exploring new hybrid quantum-classical mod-els is a difficult and error-prone task. The engineering

effort required to manually construct such models, de-velop quantum datasets, and set up training and valida-tion stages decreases a researcher’s ability to iterate anddiscover. TensorFlow has accelerated the research andunderstanding of deep learning in part by automatingcommon model building tasks. Development of softwaretooling for hybrid quantum-classical models should simi-larly accelerate research and understanding for quantummachine learning.

To develop such tooling, the requirement of accommo-dating a heterogeneous computational environment in-volving both classical and quantum processors is key.This computational heterogeneity suggested the need toexpand TensorFlow, which is designed to distribute com-putations across CPUs, GPUs, and TPUs [61], to also en-compass quantum processing units (QPUs). This projecthas evolved into TensorFlow Quantum. TFQ is an inte-gration of Cirq with TensorFlow that allows researchersand students to simulate QPUs while designing, training,and testing hybrid quantum-classical models, and even-tually run the quantum portions of these models on ac-tual quantum processors as they come online. A core con-tribution of TFQ is seamless backpropagation throughcombinations of classical and quantum layers in hybridquantum-classical models. This allows QML researchersto directly harness the rich set of tools already availablein TF and Keras.

The remainder of this document describes TFQ and aselection of applications demonstrating some of the chal-lenges TFQ can help tackle. In section II, we introducethe software architecture of TFQ. We highlight its mainfeatures including batched circuit execution, automatedexpectation estimation, estimation of quantum gradients,hybrid quantum-classical automatic differentiation, andrapid model construction, all from within TensorFlow.We also present a simple “Hello, World” example for bi-nary quantum data classification on a single qubit. Bythe end of section II, we expect most readers to have suf-ficient knowledge to begin development with TFQ. Forreaders who are interested in a more theoretical under-standing of QNNs, we provide in section III an overviewof QNN models and hybrid quantum-classical backprop-agation. For researchers interested in applying TFQ totheir own projects, we provide various applications insections IV and V. In section IV, we describe hybridquantum-classical CNNs for binary classification of quan-tum phases, hybrid quantum-classical ML for quantumcontrol, and MaxCut QAOA. In the advanced applica-tions section V, we describe meta-learning for quantumapproximate optimization, discuss issues with vanishinggradients and how we can overcome them by adaptivelayer-wise learning schemes, Hamiltonian learning withquantum graph networks, and quantum mixed state gen-eration via classical energy-based models.

We hope that TFQ enables the machine learning andquantum computing communities to work together moreclosely on important challenges and opportunities in thenear-term and beyond.

5

II. SOFTWARE ARCHITECTURE & BUILDINGBLOCKS

As stated in the introduction, the goal of TFQ isto bridge the quantum computing and machine learn-ing communities. Google already has well-establishedproducts for these communities: Cirq, an open sourcelibrary for invoking quantum circuits [62], and Tensor-Flow, an end-to-end open source machine learning plat-form [61]. However, the emerging community of quantummachine learning researchers requires the capabilities ofboth products. The prospect of combining Cirq and Ten-sorFlow then naturally arises.

First, we review the capabilities of Cirq and Tensor-Flow. We confront the challenges that arise when oneattempts to combine both products. These challengesinform the design goals relevant when building softwarespecific to quantum machine learning. We provide anoverview of the architecture of TFQ and describe a par-ticular abstract pipeline for building a hybrid model forclassification of quantum data. Then we illustrate thispipeline via the exposition of a minimal hybrid modelwhich makes use of the core features of TFQ. We con-clude with a description of our performant C++ simula-tor for quantum circuits and provide benchmarks of per-formance on two complementary classes of random andstructured quantum circuits.

A. Cirq

Cirq is an open-source framework for invoking quan-tum circuits on near term devices [62]. It containsthe basic structures, such as qubits, gates, circuits, andmeasurement operators, that are required for specifyingquantum computations. User-specified quantum compu-tations can then be executed in simulation or on realhardware. Cirq also contains substantial machinery thathelps users design efficient algorithms for NISQ machines,such as compilers and schedulers. Below we show ex-ample Cirq code for calculating the expectation value ofZ1Z2 for a Bell state:

(q1 , q2) = cirq.GridQubit.rect (1,2)

c = cirq.Circuit(cirq.H(q1), cirq.CNOT(q1 , q2))

ZZ = cirq.Z(q1) * cirq.Z(q2)

bell = cirq.Simulator ().simulate(c).final_state

expectation = ZZ.expectation_from_wavefunction(

bell , dict(zip([q1,q2],[0,1])))

Cirq uses SymPy [63] symbols to represent free param-eters in gates and circuits. You replace free parame-ters in a circuit with specific numbers by passing a CirqParamResolver object with your circuit to the simulator.

Below we construct a parameterized circuit and simulatethe output state for θ = 1:

theta = sympy.Symbol(’theta ’)

c = cirq.Circuit(cirq.Rx(theta).on(q1))

resolver = cirq.ParamResolver (theta :1)

results = cirq.Simulator ().simulate(c, resolver)

B. TensorFlow

TensorFlow is a language for describing computationsas stateful dataflow graphs [61]. Describing machinelearning models as dataflow graphs is advantageous forperformance during training. First, it is easy to ob-tain gradients of dataflow graphs using backpropagation[64], allowing efficient parameter updates. Second, inde-pendent nodes of the computational graph may be dis-tributed across independent machines, including GPUsand TPUs, and run in parallel. These computational ad-vantages established TensorFlow as a powerful tool formachine learning and deep learning.

TensorFlow constructs this dataflow graph using ten-sors for the directed edges and operations (ops) for thenodes. For our purposes, a rank n tensor is simply an n-dimensional array. In TensorFlow, tensors are addition-ally associated with a data type, such as integer or string.Tensors are a convenient way of thinking about data; inmachine learning, the first index is often reserved for iter-ation over the members of a dataset. Additional indicescan indicate the application of several filters, e.g., in con-volutional neural networks with several feature maps.

In general, an op is a function mapping input tensorsto output tensors. Ops may act on zero or more inputtensors, always producing at least one tensor as output.For example, the addition op ingests two tensors and out-puts one tensor representing the elementwise sum of theinputs, while a constant op ingests no tensors, taking therole of a root node in the dataflow graph. The combina-tion of ops and tensors gives the backend of TensorFlowthe structure of a directed acyclic graph. A visualiza-tion of the backend structure corresponding to a simplecomputation in TensorFlow is given in Fig. 1.

Figure 1. A simple example of the TensorFlow computationalmodel. Two tensor inputs A and B are added and then mul-tiplied against a third tensor input C, before flowing on tofurther nodes in the graph. Blue nodes are tensor injections(ops), arrows are tensors flowing through the computationalgraph, and orange nodes are tensor transformations (ops).Tensor injections are ops in the sense that they are functionswhich take in zero tensors and output one tensor.

It is worth noting that this tensorial data format isnot to be confused with Tensor Networks [65], which area mathematical tool used in condensed matter physicsand quantum information science to efficiently representmany-body quantum states and operations. Recently, li-braries for building such Tensor Networks in TensorFlowhave become available [66], we refer the reader to thecorresponding blog post for better understanding of the

6

difference between TensorFlow tensors and the tensor ob-jects in Tensor Networks [67].

The recently announced TensorFlow 2 [68] takes thedataflow graph structure as a foundation and adds high-level abstractions. One new feature is the Python func-tion decorator @tf.function , which automatically con-verts the decorated function into a graph computation.Also relevant is the native support for Keras [69], whichprovides the Layer and Model constructs. These ab-

stractions allow the concise definition of machine learn-ing models which ingest and process data, all backed bydataflow graph computation. The increasing levels of ab-straction and heterogenous hardware backing which to-gether constitute the TensorFlow stack can be visualizedwith the orange and gray boxes in our stack diagram inFig. 4. The combination of these high-level abstractionsand efficient dataflow graph backend makes TensorFlow2 an ideal platform for data-driven machine learning re-search.

C. Technical Hurdles in Combining Cirq withTensorFlow

There are many ways one could imagine combining thecapabilities of Cirq and TensorFlow. One possible ap-proach is to let graph edges represent quantum states andlet ops represent transformations of the state, such as ap-plying circuits and taking measurements. This approachcan be called the “states-as-edges” architecture. We showin Fig. 2 how to reformulate the Bell state preparationand measurement discussed in section II A within thisproposed architecture.

Figure 2. The states-as-edges approach to embedding quan-tum computation in TensorFlow. Blue nodes are input ten-sors, arrows are tensors flowing through the graph, and orangenodes are TF Ops transforming the simulated quantum state.Note that the above is not the architecture used in TFQ butrather an alternative which was considered, see Fig. 3 for theequivalent diagram for the true TFQ architecture.

This architecture may at first glance seem like an at-tractive option as it is a direct formulation of quantumcomputation as a dataflow graph. However, this ap-proach is suboptimal for several reasons. First, in thisarchitecture, the structure of the circuit being run isstatic in the computational graph, thus running a differ-ent circuit would require the graph to be rebuilt. This isfar from ideal for variational quantum algorithms whichlearn over many iterations with a slightly modified quan-tum circuit at each iteration. A second problem is thelack of a clear way to embed such a quantum dataflowgraph on a real quantum processor: the states wouldhave to remain held in quantum memory on the quan-

tum device itself, and the high latency between classi-cal and quantum processors makes sending transforma-tions one-by-one prohibitive. Lastly, one needs a wayto specify gates and measurements within TF. One maybe tempted to define these directly; however, Cirq al-ready has the necessary tools and objects defined whichare most relevant for the near-term quantum computingera. Duplicating Cirq functionality in TF would lead toseveral issues, requiring users to re-learn how to inter-face with quantum computers in TFQ versus Cirq, andadding to the maintenance overhead by needing to keeptwo separate quantum circuit construction frameworksup-to-date as new compilation techniques arise. Theseconsiderations motivate our core design principles.

D. TFQ architecture

1. Design Principles and Overview

To avoid the aforementioned technical hurdles and inorder to satisfy the diverse needs of the research commu-nity, we have arrived at the following four design princi-ples:

1. Differentiability. As described in the introduc-tion, gradient-based methods leveraging autodiffer-entiation have become the leading heuristic for op-timization of machine learning models. A softwareframework for QML must support differentiation ofquantum circuits so that hybrid quantum-classicalmodels can participate in backpropagation.

2. Circuit Batching. Learning on quantum data re-quires re-running parameterized model circuits oneach quantum data point. A QML software frame-work must be optimized for running large num-bers of such circuits. Ideally, the semantics shouldmatch established TensorFlow norms for batchingover data.

3. Execution Backend Agnostic. Experimentalquantum computing often involves reconciling per-fectly simulated algorithms with the outputs ofreal, noisy devices. Thus, QML software must al-low users to easily switch between running modelsin simulation and running models on real hardware,such that simulated results and experimental re-sults can be directly compared.

4. Minimalism. Cirq provides an extensive set oftools for preparing quantum circuits. TensorFlowprovides a very complete machine learning toolkitthrough its hundreds of ops and Keras high-levelAPI, with a massive community of active users. Ex-isting functionality in Cirq and TensorFlow shouldbe used as much as possible. TFQ should serve as abridge between the two that does not require users

7

Figure 3. The TensorFlow graph generated to calculate theexpectation value of a parameterized circuit. The symbolvalues can come from other TensorFlow ops, such as fromthe outputs of a classical neural network. The output can bepassed on to other ops in the graph; here, for illustration, theoutput is passed to the absolute value op.

to re-learn how to interface with quantum comput-ers or re-learn how to solve problems using machinelearning.

First, we provide a bottom-up overview of TFQ toprovide intuition on how the framework functions at afundamental level. In TFQ, circuits and other quantumcomputing constructs are tensors, and converting thesetensors into classical information via simulation or exe-cution on a quantum device is done by ops. These ten-sors are created by converting Cirq objects to TensorFlowstring tensors, using the tfq.convert_to_tensor function.

This takes in a cirq.Circuit or cirq.PauliSum object and

creates a string tensor representation. The cirq.Circuit

objects may be parameterized by SymPy symbols.These tensors are then converted to classical informa-

tion via state simulation, expectation value calculation,or sampling. TFQ provides ops for each of these compu-tations. The following code snippet shows how a simpleparameterized circuit may be created using Cirq, andits Z expectation evaluated at different parameter valuesusing the tfq expectation value calculation op. We feedthe output into the tf.math.abs op to show that tfq opsintegrate naively with tf ops.

qubit = cirq.GridQubit(0, 0)


c = cirq.Circuit(cirq.X(qubit) ** theta)

c_tensor = tfq.convert_to_tensor ([c] * 3)

theta_values = tf.constant ([[0] ,[1] ,[2]])

m = cirq.Z(qubit)

paulis = tfq.convert_to_tensor ([m] * 3)

expectation_op = tfq.get_expectation_op ()

output = expectation_op(

c_tensor , [’theta ’], theta_values , paulis)

abs_output = tf.math.abs(output)

We supply the expectation op with a tensor of parame-terized circuits, a list of symbols contained in the circuits,a tensor of values to use for those symbols, and tensoroperators to measure with respect to. Given this, it out-puts a tensor of expectation values. The graph this codegenerates is given by Fig. 3.

The expectation op is capable of running circuits ona simulated backend, which can be a Cirq simulator orour native TFQ simulator qsim (described in detail in

TF Keras Models

TF Layers

TF Execution Engine

TPU

Cirq

TFQ Ops

TFQ LayersTFQ

DifferentiatorsTFQ

TensorFlow

Classical hardwareQuantum hardware

TFQ qsim

GPU CPU QPU

TF Ops

Classical Data: integers/floats/strings

Quantum Data: Circuits/Operators

Figure 4. The software stack of TFQ, showing its interactionswith TensorFlow, Cirq, and computational hardware. At thetop of the stack is the data to be processed. Classical datais natively processed by TensorFlow; TFQ adds the ability toprocess quantum data, consisting of both quantum circuitsand quantum operators. The next level down the stack is theKeras API in TensorFlow. Since a core principle of TFQ isnative integration with core TensorFlow, in particular withKeras models and optimizers, this level spans the full widthof the stack. Underneath the Keras model abstractions areour quantum layers and differentiators, which enable hybridquantum-classical automatic differentiation when connectedwith classical TensorFlow layers. Underneath the layers anddifferentiators, we have TensorFlow ops, which instantiate thedataflow graph. Our custom ops control quantum circuit ex-ecution. The circuits can be run in simulation mode, by in-voking qsim or Cirq, or eventually will be executed on QPUhardware.

section II F), or on a real device. This is configured oninstantiation.

The expectation op is fully differentiable. Giventhat there are many ways to calculate the gradient ofa quantum circuit with respect to its input parame-ters, TFQ allows expectation ops to be configured withone of many built-in differentiation methods using thetfq.Differentiator interface, such as finite differencing,

parameter shift rules, and various stochastic methods.The tfq.Differentiator interface also allows users to de-

fine their own gradient calculation methods for their spe-cific problem if they desire.

The tensor representation of circuits and Paulis alongwith the execution ops are all that are required to solveany problem in QML. However, as a convenience, TFQprovides an additional op for in-graph circuit construc-tion. This was found to be convenient when solving prob-lems where most of the circuit being run is static andonly a small part of it is being changed during train-ing or inference. This functionality is provided by thetfq.tfq_append_circuit op. It is expected that all but

the most dedicated users will never touch these low-level ops, and instead will interface with TFQ using ourtf.keras.layers that provide a simplified interface.

The tools provided by TFQ can interact with bothcore TensorFlow and, via Cirq, real quantum hardware.

8

Prepare Quantum Dataset

Evaluate Quantum

Model

Sample or

Average

Evaluate Classical

Model

Evaluate Cost

Function

𝛉

Evaluate Gradients & Update Parameters

Figure 5. Abstract pipeline for inference and training ofa hybrid discriminative model in TFQ. Here, Φ representsthe quantum model parameters and θ represents the classicalmodel parameters.

The functionality of all three software products and theinterfaces between them can be visualized with the helpof a “software-stack” diagram, shown in Fig. 4.

In the next section, we describe an example of anabstract quantum machine learning pipeline for hybriddiscriminator model that TFQ was designed to support.Then we illustrate the TFQ pipeline via a Hello Many-Worlds example, which involves building the simplestpossible hybrid quantum-classical model for a binaryclassification task on a single qubit. More detailed in-formation on the building blocks of TFQ features will begiven in section II E.

2. The Abstract TFQ Pipeline for a specific hybriddiscriminator model

Here, we provide a high-level abstract overview of thecomputational steps involved in the end-to-end pipelinefor inference and training of a hybrid quantum-classicaldiscriminative model for quantum data in TFQ.

(1) Prepare Quantum Dataset: In general, thismight come from a given black-box source. However,as current quantum computers cannot import quantumdata from external sources, the user has to specify quan-tum circuits which generate the data. Quantum datasetsare prepared using unparameterized cirq.Circuit ob-

jects and are injected into the computational graph usingtfq.convert_to_tensor .

(2) Evaluate Quantum Model: Parameterizedquantum models can be selected from several categoriesbased on knowledge of the quantum data’s structure.The goal of the model is to perform a quantum compu-tation in order to extract information hidden in a quan-tum subspace or subsystem. In the case of discrimina-tive learning, this information is the hidden label pa-rameters. To extract a quantum non-local subsystem,the quantum model disentangles the input data, leavingthe hidden information encoded in classical correlations,

thus making it accessible to local measurements and clas-sical post-processing. Quantum models are constructedusing cirq.Circuit objects containing SymPy symbols,

and can be attached to quantum data sources using thetfq.AddCircuit layer.

(3) Sample or Average: Measurement of quantumstates extracts classical information, in the form of sam-ples from a classical random variable. The distributionof values from this random variable generally dependson both the quantum state itself and the measured ob-servable. As many variational algorithms depend onmean values of measurements, TFQ provides methodsfor averaging over several runs involving steps (1) and(2). Sampling or averaging are performed by feedingquantum data and quantum models to the tfq.Sample

or tfq.Expectation layers.

(4) Evaluate Classical Model: Once classicalinformation has been extracted, it is in a formatamenable to further classical post-processing. As theextracted information may still be encoded in classi-cal correlations between measured expectations, clas-sical deep neural networks can be applied to distillsuch correlations. Since TFQ is fully compatible withcore TensorFlow, quantum models can be attached di-rectly to classical tf.keras.layers.Layer objects such as

tf.keras.layers.Dense .

(5) Evaluate Cost Function: Given the results ofclassical post-processing, a cost function is calculated.This may be based on the accuracy of classification if thequantum data was labeled, or other criteria if the taskis unsupervised. Wrapping the model built in stages (1)through (4) inside a tf.keras.Model gives the user accessto all the losses in the tf.keras.losses module.

(6) Evaluate Gradients & Update Parameters:After evaluating the cost function, the free parame-ters in the pipeline is updated in a direction expectedto decrease the cost. This is most commonly per-formed via gradient descent. To support gradient de-scent, TFQ exposes derivatives of quantum operationsto the TensorFlow backpropagation machinery via thetfq.differentiators.Differentiator interface. This allows

both the quantum and classical models’ parameters tobe optimized against quantum data via hybrid quantum-classical backpropagation. See section III for details onthe theory.

In the next section, we illustrate this abstract pipelineby applying it to a specific example. While simple, theexample is the minimum instance of a hybrid quantum-classical model operating on quantum data.

3. Hello Many-Worlds: Binary Classifier for QuantumData

Binary classification is a basic task in machine learn-ing that can be applied to quantum data as well. As aminimal example of a hybrid quantum-classical model,

9

Figure 6. Quantum data represented on the Bloch sphere.States in category a are blue, while states in category b areorange. The vectors are the states around which the sampleswere taken. The parameters used to generate this data are:θa = 1, θb = 4, and N = 200.

we present here a binary classifier for regions on a sin-gle qubit. In this task, two random vectors in the X-Zplane of the Bloch sphere are chosen. Around these twovectors, we randomly sample two sets of quantum datapoints; the task is to learn to distinguish the two sets. Anexample quantum dataset of this type is shown in Fig. 6.The following can all be run in-browser by navigating tothe Colab example notebook at

research/binary classifier/binary classifier.ipynb

Additionally, the code in this example can be copy-pastedinto a python script after installing TFQ.

To solve this problem, we use the pipeline shown inFig. 5, specialized to one-qubit binary classification. Thisspecialization is shown in Fig. 7.

The first step is to generate the quantum data. We canuse Cirq for this task. The common imports required forworking with TFQ are shown below:

import cirq , random , sympy

import numpy as np

import tensorflow as tf

import tensorflow_quantum as tfq

The function below generates the quantum dataset; la-bels use a one-hot encoding:

def generate_dataset(

qubit , theta_a , theta_b , num_samples):

q_data = []

labels = []

blob_size = abs(theta_a - theta_b) / 5

for _ in range(num_samples):

coin = random.random ()

spread_x , spread_y = np.random.uniform(

-blob_size , blob_size , 2)

if coin < 0.5:

label = [1, 0]

angle = theta_a + spread_y

else:

label = [0, 1]

angle = theta_b + spread_y

labels.append(label)

q_data.append(cirq.Circuit(

cirq.Ry(-angle)(qubit),

cirq.Rx(-spread_x)(qubit)))

Figure 7. (1) Quantum data to be classified. (2) Parame-terized rotation gate, whose job is to remove superpositionsin the quantum data. (3) Measurement along the Z axis ofthe Bloch sphere converts the quantum data into classicaldata. (4) Classical post-processing is a two-output SoftMaxlayer, which outputs probabilities for the data to come fromcategory a or category b. (5) Categorical cross entropy iscomputed between the predictions and the labels. The Adamoptimizer [70] is used to update both the quantum and clas-sical portions of the hybrid model.

return (tfq.convert_to_tensor(q_data),

np.array(labels))

We can generate a dataset and the associated labels afterpicking some parameter values:

qubit = cirq.GridQubit (0, 0)

theta_a = 1

theta_b = 4

num_samples = 200

q_data , labels = generate_dataset(

qubit , theta_a , theta_b , num_samples)

As our quantum parametric model, we use the simplestcase of a universal quantum discriminator [42, 60], asingle parameterized rotation (linear) and measurementalong the Z axis (non-linear):


q_model = cirq.Circuit(cirq.Ry(theta)(qubit))

q_data_input = tf.keras.Input(

shape =(), dtype=tf.dtypes.string)

expectation = tfq.layers.PQC(

q_model , cirq.Z(qubit))

expectation_output = expectation(q_data_input)

The purpose of the rotation gate is to minimize the su-perposition from the input quantum data such that wecan get maximum useful information from the measure-ment. This quantum model is then attached to a smallclassifier NN to complete our hybrid model. Notice inthe code below that quantum layers can appear amongclassical layers inside a standard Keras model:

classifier = tf.keras.layers.Dense(

2, activation=tf.keras.activations.softmax)

classifier_output = classifier(

expectation_output)

model = tf.keras.Model(inputs=q_data_input ,

outputs=classifier_output)

We can train this hybrid model on the quantum datadefined earlier. Below we use as our loss function thecross entropy between the labels and the predictions ofthe classical NN; the ADAM optimizer is chosen for pa-rameter updates.

optimizer=tf.keras.optimizers.Adam(

learning_rate =0.1)

https://github.com/tensorflow/quantum/blob/research/binary_classifier/binary_classifier.ipynb

10

loss=tf.keras.losses.CategoricalCrossentropy ()

model.compile(optimizer=optimizer , loss=loss)

history = model.fit(

x=q_data , y=labels , epochs =50)

Finally, we can use our trained hybrid model to classifynew quantum datapoints:

test_data , _ = generate_dataset(

qubit , theta_a , theta_b , 1)

p = model.predict(test_data)[0]

print(f"prob(a)=p[0]:.4f, prob(b)=p[1]:.4f")

This section provided a rapid introduction to just thatcode needed to complete the task at hand. The follow-ing section reviews the features of TFQ in a more APIreference inspired style.

E. TFQ Building Blocks

Having provided a minimum working example in theprevious section, we now seek to provide more detailsabout the components of the TFQ framework. First, wedescribe how quantum computations specified in Cirq areconverted to tensors for use inside the TensorFlow graph.Then, we describe how these tensors can be combined in-graph to yield larger models. Next, we show how circuitsare simulated and measured in TFQ. The core functional-ity of the framework, differentiation of quantum circuits,is then explored. Finally, we describe our more abstractlayers, which can be used to simplify many QML work-flows.

1. Quantum Computations as Tensors

As pointed out in section II A, Cirq already containsthe language necessary to express quantum computa-tions, parameterized circuits, and measurements. Guidedby principle 4, TFQ should allow direct injection of Cirqexpressions into the computational graph of TensorFlow.This is enabled by the tfq.convert_to_tensor function.

We saw the use of this function in the quantum binaryclassifier, where a list of data generation circuits specifiedin Cirq was wrapped in this function to promote themto tensors. Below we show how a quantum data point,a quantum model, and a quantum measurement can beconverted into tensors:

q0 = cirq.GridQubit (0, 0)

q_data_raw = cirq.Circuit(cirq.H(q0))

q_data = tfq.convert_to_tensor ([ q_data_raw ])


q_model_raw = cirq.Circuit(

cirq.Ry(theta).on(q0))

q_model = tfq.convert_to_tensor ([ q_model_raw ])

q_measure_raw = 0.5 * cirq.Z(q0)

q_measure = tfq.convert_to_tensor(

[q_measure_raw ])

This conversion is backed by our custom serializers. Oncea Circuit or PauliSum is serialized, it becomes a ten-sor of type tf.string . This is the reason for the use

of tf.keras.Input(shape=(), dtype=tf.dtypes.string) when

creating inputs to Keras models, as seen in the quantumbinary classifier example.

2. Composing Quantum Models

After injecting quantum data and quantum mod-els into the computational graph, a custom Tensor-Flow operation is required to combine them. Insupport of guiding principle 2, TFQ implements thetfq.layers.AddCircuit layer for combining tensors of cir-

cuits. In the following code, we use this functionalityto combine the quantum data point and quantum modeldefined in subsection II E 1:

add_op = tfq.layers.AddCircuit ()

data_and_model = add_op(q_data , append=q_model)

To quantify the performance of a quantum model on aquantum dataset, we need the ability to define loss func-tions. This requires converting quantum information intoclassical information. This conversion process is accom-plished by either sampling the quantum model, whichstochastically produces bitstrings according to the prob-ability amplitudes of the model, or by specifying a mea-surement and taking expectation values.

3. Sampling and Expectation Values

Sampling from quantum circuits is an important usecase in quantum computing. The recently achieved mile-stone of quantum supremacy [13] is one such application,in which the difficulty of sampling from a quantum modelwas used to gain a computational edge over classical ma-chines.

TFQ implements tfq.layers.Sample , a Keras layer

which enables sampling from batches of circuits in sup-port of design objective 2. The user supplies a ten-sor of parameterized circuits, a list of symbols con-tained in the circuits, and a tensor of values to sub-stitute for the symbols in the circuit. Given these,the Sample layer produces a tf.RaggedTensor of shape

[batch_size, num_samples, n_qubits] , where the n qubits

dimension is ragged to account for the possibly varyingcircuit size over the input batch of quantum data. Forexample, the following code takes the combined data andmodel from section II E 2 and produces a tensor of size[1, 4, 1] containing four single-bit samples:

sample_layer = tfq.layers.Sample ()

samples = sample_layer(

data_and_model , symbol_names =[’theta’],

symbol_values =[[0.5]] , repetitions =4)

11

Though sampling is the fundamental interface betweenquantum and classical information, differentiability ofquantum circuits is much more convenient when usingexpectation values, as gradient information can thenbe backpropagated (see section III for more details).

In the simplest case, expectation values are simply av-erages over samples. In quantum computing, expectationvalues are typically taken with respect to a measurementoperator M . This involves sampling bitstrings from thequantum circuit as described above, applying M to thelist of bitstring samples to produce a list of numbers,then taking the average of the result. TFQ provides tworelated layers with this capability:

In contrast to sampling (which is by default in the stan-

dard computational basis, the Z eigenbasis of all qubits),taking expectation values requires defining a measure-ment. As discussed in section II A, these are first definedas cirq.PauliSum objects and converted to tensors. TFQ

implements tfq.layers.Expectation , a Keras layer which

enables the extraction of measurement expectation val-ues from quantum models. The user supplies a tensor ofparameterized circuits, a list of symbols contained in thecircuits, a tensor of values to substitute for the symbolsin the circuit, and a tensor of operators to measure withrespect to them. Given these inputs, the layer outputsa tensor of expectation values. Below, we show how totake an expectation value of the measurement defined insection II E 1:

expectation_layer = tfq.layers.Expectation ()

expectations = expectation_layer(

circuit = data_and_model ,

symbol_names = [’theta’],

symbol_values = [[0.5]] ,

operators = q_measure)

In Fig. 3, we illustrate the dataflow graph which backsthe expectation layer, when the parameter values are sup-plied by a classical neural network. The expectation layeris capable of using either a simulator or a real device forexecution, and this choice is simply specified at run time.While Cirq simulators may be used for the backend, TFQprovides its own native TensorFlow simulator written inperformant C++. A description of our quantum circuitsimulation code is given in section II F.

Having converted the output of a quantum model intoclassical information, the results can be fed into subse-quent computations. In particular, they can be fed intofunctions that produce a single number, allowing us todefine loss functions over quantum models in the sameway we do for classical models.

4. Differentiating Quantum Circuits

We have taken the first steps towards implementationof quantum machine learning, having defined quantummodels over quantum data and loss functions over thosemodels. As described in both the introduction and our

first guiding principle, differentiability is the critical ma-chinery needed to allow training of these models. As de-scribed in section II B, the architecture of TensorFlow isoptimized around backpropagation of errors for efficientupdates of model parameters; one of the core contribu-tions of TFQ is integration with TensorFlow’s backprop-agation mechanism. TFQ implements this functionalitywith our differentiators module. The theory of quan-tum circuit differentiation will be covered in section III C;here, we overview the software that implements the the-ory.

Since there are many ways to calculate gra-dients of quantum circuits, TFQ provides thetfq.differentiators.Differentiator interface. Our

Expectation and SampledExpectation layers rely on

classes inheriting from this interface to specify howTensorFlow should compute their gradients. Whileadvanced users can implement their own custom differ-entiators by inheriting from the interface, TFQ comeswith several built-in options, two of which we highlighthere. These two methods are instances of the two maincategories of quantum circuit differentiators: the finitedifference methods and the parameter shift methods.

The first class of quantum circuit differentiators is thefinite difference methods. This class samples the primaryquantum circuit for at least two different parameter set-tings, then combines them to estimate the derivative.The ForwardDifference differentiator provides most ba-sic version of this. For each parameter in the circuit, thecircuit is sampled at the current setting of the parame-ter. Then, each parameter is perturbed separately andthe circuit resampled.

For the 2-local circuits implementable on near-termhardware, methods more sophisticated than finite differ-ences are possible. These methods involve running anancillary quantum circuit, from which the gradient of thethe primary circuit with respect to some parameter canbe directly measured. One specific method is gate de-composition and parameter shifting [71], implemented inTFQ as the ParameterShift differentiator. For in-depthdiscussion of the theory, see section III C 2.

The differentiation rule used by our layers is specifiedthrough an optional keyword argument. Below, we showthe expectation layer being called with our parametershift rule:

diff = tfq.differentiators.ParameterShift ()

expectation = tfq.layers.Expectation(

differentiator=diff)

For further discussion of the trade-offs when choosingbetween differentiators, see the gradients tutorial on ourGitHub website:

docs/tutorials/gradients.ipynb

https://github.com/tensorflow/quantum/blob/master/docs/tutorials/gradients.ipynb

12

5. Simplified Layers

Some workflows do not require control as sophisticatedas our Expectation , Sample , and SampledExpectation lay-

ers allow. For these workflows we provide the PQC and

ControlledPQC layers. Both of these layers allow parame-terized circuits to be updated by hybrid backpropagationwithout the user needing to provide the list of symbolsassociated with the circuit. The PQC layer provides au-tomated Keras management of the variables in a param-eterized circuit:

q = cirq.GridQubit(0, 0)

(a, b) = sympy.symbols("a b")

circuit = cirq.Circuit(

cirq.Rz(a)(q), cirq.Rx(b)(q))

outputs = tfq.layers.PQC(circuit , cirq.Z(q))

quantum_data = tfq.convert_to_tensor ([

cirq.Circuit (), cirq.Circuit(cirq.X(q))])

res = outputs(quantum_data)

When the variables in a parameterized circuit will becontrolled completely by other user-specified machinery,for example by a classical neural network, then the usercan call our ControlledPQC layer:

outputs = tfq.layers.ControlledPQC(

circuit , cirq.Z(bit))

model_params = tf.convert_to_tensor(

[[0.5, 0.5], [0.25, 0.75]])

res = outputs ([ quantum_data , model_params ])

Notice that the call is similar to that for PQC, exceptthat we provide parameter values for the symbols in thecircuit. These two layers are used extensively in the ap-plications highlighted in the following sections.

F. Quantum Circuit Simulation with qsim

Concurrently with TFQ, we are open sourcing qsim,a software package for simulating quantum circuits onclassical computers. We have adapted its C++ imple-mentation to work inside TFQ’s TensorFlow ops. Theperformance of qsim derives from two key ideas that canbe seen in the literature on classical simulators for quan-tum circuits [72, 73]. The first idea is the fusion of gatesin a quantum circuit with their neighbors to reduce thenumber of matrix multiplications required when apply-ing the circuit to a wavefunction. The second idea isto create a set of matrix multiplication functions specifi-cally optimized for the application of two-qubit gates tostate vectors, to take maximal advantage of gate fusion.We discuss these points in detail below. To quantify theperformance of qsim, we also provide an initial bench-mark comparing qsim to Cirq. We note that qsim issignificantly faster on both random and structured cir-cuits. We further note that the qsim benchmark timesinclude the full TFQ software stack of serializing a cir-cuit to ProtoBuffs in Python, conversion of ProtoBuffsto C++ objects inside the dataflow graph for our cus-

Figure 8. Visualization of the qsim fusing algorithm. Aftercontraction, only two-qubit gates remain, increasing the speedof subsequent circuit simulation.

tom TensorFlow ops, and the relaying of results back toPython.

1. Comment on the Simulation of Quantum Circuits

To motivate the qsim fusing algorithm, consider howcircuits are applied to states in simulation. Supposewe wish to apply two gates G1 and G2 to our initialstate |ψ〉, and suppose these gates act on the same twoqubits. Since gate application is associative, we have(G2G1) |ψ〉 = G2(G1 |ψ〉). However, as the number ofqubits n supporting |ψ〉 grows, the left side of the equal-ity becomes much faster to compute. This is becauseapplying a gate to a state requires broadcasting the pa-rameters of the gate to all 2n elements of the state vector,so that each gate application incurs a cost scaling as 2n.In contrast, multiplying two-qubit matrices incurs a smallconstant cost. This means a simulation of a circuit willbe fastest if we pre-multiply as many gates as possible,while keeping the matrix size small, before applying themto a state. This pre-multiplication is called gate fusionand is accomplished with the qsim fusion algorithm.

2. Gate Fusion with qsim

The core idea of the fusion algorithm is to multiplyeach two-qubit gate in the circuit with all of the one-qubitgates around it. Suppose we are given a circuit with Nqubits and T timesteps. The circuit can be interpretedas a two dimensional lattice: the row index n labels thequbit, and the column index t labels the timestep. Fu-sion is initialized by setting a counter cn = 0 for eachn ∈ 1, . . . , N. Then, for each timestep, check each rowfor a two-qubit gate A; when such a gate is found, thesearch pauses and A becomes the anchor gate for a roundof gate fusion. Suppose A is supported on qubits na andnb and resides at timestep t. By construction, on row naall gates from timestep cna to t−1 are single qubit gates;thus we can multiply them all together and into A. Wethen continue down row na starting at timestep t + 1,multiplying single qubit gates into A, until we encounteranother two-qubit gate, at time ta. The fusing on thisrow stops, and we set cna = ta. Then we do the same forrow nb. After this process, all one-qubit gates that canbe reached from A have been absorbed. If at this point

13

cna = cnb , then the gate at timestep ta is a two-qubitgate which is also supported on qubits na and nb; thusthis gate can also be multiplied into the anchor gate A.If a two-qubit gate is absorbed in this manner, the pro-cess of absorbing one-qubit gates starts again from ta+1.This continues until either cna 6= cnb , which means a two-qubit gate has been encountered which is not supportedon both na and nb, or cna = cnb = T , which means theend of the circuit has been reached on these two qubits.Finally, gate A has completed its fusion process, and wecontinue our search through the lattice for unprocessedtwo-qubit anchor gates. The qsim fusion algorithm isillustrated on a simple three-qubit circuit in Fig. 8. Af-ter this process of gate fusion, we can run simulatorsoptimized for the application of two-qubit gates. Thesesimulators are described next.

3. Hardware-Specific Instruction Sets

With the given quantum circuit fused into the minimalnumber of two-qubit gates, we need simulators optimizedfor applying 4 × 4 matrices to state vectors. TFQ offersthree instruction set tiers, each taking advantage of in-creasingly modern CPU commands for increasing simu-lation speed. The simulators are adaptive to the user’savailable hardware. The first tier is a general simulatordesigned to work on any CPU architecture. The nextis the SSE2 instruction set [74], and the fastest uses themodern AVX2 instruction set [75]. These three availableinstruction sets, combined with the fusing algorithm dis-cussed above, ensures users have maximal performancewhen running algorithms in simulation. The next sec-tion illustrates this power with benchmarks comparingthe performance of TFQ to the performance of paral-lelized Cirq running in simulation mode. In the future,we hope to expand the range of custom simulation hard-ware supported to include GPU and TPU integration.

4. Benchmarks

Here, we demonstrate the performance of TFQ, backedby qsim, relative to Cirq on two benchmark simula-tion tasks. As detailed above, the performance differ-ence is due to circuit pre-processing via gate fusion com-bined with performant C++ simulators. The bench-marks were performed on a desktop equipped with anIntel(R) Xeon(R) W-2135 CPU running at 3.70 GHz.This CPU supports the AVX2 instruction set, which atthe time of writing is the fastest hardware tier supportedby TFQ. In the following simulations, TFQ uses a sin-gle core, while Cirq simulation is allowed to parallelizeover all available cores using the Python multiprocessing

module and the map function.

The first benchmark task is simulation of 500supremacy-style circuits batched 50 at a time. Thesecircuits were generated using the Cirq function

5 10 15 20Number of Qubits

0

1

2

3

Tot

alS

imul

atio

nT

ime

(Log

10(s

))

Random Quantum Circuits

5 10 15 20Number of Qubits

0

1

2

3

Tot

alS

imul

atio

nT

ime

(Log

10(s

))

Structured Quantum Circuits

Figure 9. Performance of TFQ (orange) and Cirq (blue) onsimulation tasks. The plots show the base 10 logarithm ofthe total time to solution (in seconds) versus the number ofqubits simulated. Simulation of 500 random circuits, taken inbatches of 50. Circuits were of depth 40. (a) At the largesttested size of 20 qubits, we see approximately 7 times savingsin simulation time versus Cirq. (b) Simulation of structuredcircuits. The smaller scale of entanglement makes these cir-cuits more amenable to compaction via the qsim Fusion algo-rithm. At the largest tested size of 20 qubits, we see TFQ isup to 100 times faster than parallelized Cirq.

cirq.generate_boixo_2018_supremacy_circuits_v2 . These

circuits are only tractable for benchmarking due to thesmall numbers of qubits involved here. These circuitshave very little structure, involving many interleavedtwo-qubit gates to generate entanglement as quickly aspossible. In summary, at the largest benchmarked prob-lem size of 20 qubits, qsim achieves an approximately7-fold improvement in simulation time over Cirq. Theperformance curves are shown in Fig. 9. We note that theinterleaved structure of these circuits means little gate fu-sion is possible, so that this performance boost is mostlydue to the speed of our C++ simulators.

When the simulated circuits have more structure, theFusion algorithm allows us to achieve a larger perfor-mance boost by reducing the number of gates that ulti-mately need to be simulated. The circuits for this taskare a factorized version of the supremacy-style circuitswhich generate entanglement only on small subsets of thequbits. In summary, for these circuits, we find a roughly100 times improvement in simulation time in TFQ versusCirq. The performance curves are shown in Fig. 9.

Thus we see that in addition to our core functionalityof implementing native TensorFlow gradients for quan-tum circuits, TFQ also provides a significant boost inperformance over Cirq when running in simulation mode.Additionally, as noted before, this performance boost isdespite the additional overhead of serialization betweenthe TensorFlow frontend and qsim proper.

14

III. THEORY OF HYBRIDQUANTUM-CLASSICAL MACHINE LEARNING

In the previous section, we reviewed the buildingblocks required for use of TFQ. In this section, we con-sider the theory behind the software. We define quantumneural networks as products of parameterized unitarymatrices. Samples and expectation values are defined byexpressing the loss function as an inner product. Withquantum neural networks and expectation values defined,we can then define gradients. We finally combine quan-tum and classical neural networks and formalize hybridquantum-classical backpropagation, one of core compo-nents of TFQ.

A. Quantum Neural Networks

A Quantum Neural Network ansatz can generally bewritten as a product of layers of unitaries in the form

U(θ) =

L∏`=1

V `U `(θ`), (1)

where the `th layer of the QNN consists of the prod-uct of V `, a non-parametric unitary, and U `(θ`) a uni-tary with variational parameters (note the superscriptshere represent indices rather than exponents). The multi-parameter unitary of a given layer can itself be generallycomprised of multiple unitaries U `j (θ`j)

M`j=1 applied in

parallel:

U `(θ`) ≡M⊗j=1

U `j (θ`j). (2)

Finally, each of these unitaries U `j can be expressed asthe exponential of some generator gj`, which itself canbe any Hermitian operator on n qubits (thus expressibleas a linear combination of n-qubit Pauli’s),

U `j (θ`j) = e−iθ`j g`j , g`j =

Kj`∑k=1

βj`k Pk, (3)

where Pk ∈ Pn, here Pn denotes the Paulis on n-qubits

[76], and βj`k ∈ R for all k, j, `. For a given j and `, in the

case where all the Pauli terms commute, i.e. [Pk, Pm] =

0 for all m, k such that βj`m , βj`k 6= 0, one can simply

decompose the unitary into a product of exponentials ofeach term,

U `j (θ`j) =∏k

e−iθ`jβj`k Pk . (4)

Otherwise, in instances where the various terms do notcommute, one may apply a Trotter-Suzuki decomposi-tion of this exponential [77], or other quantum simula-tion methods [78]. Note that in the above case where

Figure 10. High-level depiction of a multilayer quantumneural network (also known as a parameterized quantum cir-cuit), at various levels of abstraction. (a) At the most detailedlevel we have both fixed and parameterized quantum gates,any parameterized operation is compiled into a combinationof parameterized single-qubit operations. (b) Many singly-parameterized gates Wj(θj) form a multi-parameterized uni-

tary Vl(~θl) which performs a specific function. (c) The prod-uct of multiple unitaries Vl generates the quantum modelU(θ) shown in (d).

the unitary of a given parameter is decomposable as theproduct of exponentials of Pauli terms, one can explicitlyexpress the layer as

U `j (θ`j) =∏k

[cos(θ`jβ

j`k )I − i sin(θ`jβ

j`k )Pk

]. (5)

The above form will be useful for our discussion of gra-dients of expectation values below.

B. Sampling and Expectations

To optimize the parameters of an ansatz from equation(1), we need a cost function to optimize. In the case ofstandard variational quantum algorithms, this cost func-tion is most often chosen to be the expectation value ofa cost Hamiltonian,

f(θ) = 〈H〉θ ≡ 〈Ψ0| U†(θ)HU(θ) |Ψ0〉 (6)

where |Ψ0〉 is the input state to the parameterized circuit.In general, the cost Hamiltonian can be expressed as alinear combination of operators, e.g. in the form

H =

N∑k=1

αkhk ≡ α · h, (7)

where we defined a vector of coefficients α ∈ RN anda vector of N operators h. Often this decomposition ischosen such that each of these sub-Hamiltonians is in then-qubit Pauli group hk ∈ Pn. The expectation value ofthis Hamiltonian is then generally evaluated via quantum

15

expectation estimation, i.e. by taking the linear combi-nation of expectation values of each term

f(θ) = 〈H〉θ =

N∑k=1

αk 〈hk〉θ ≡ α · hθ, (8)

where we introduced the vector of expectations hθ ≡〈h〉θ. In the case of non-commuting terms, the vari-

ous expectation values 〈hk〉θ are estimated over separateruns.

Note that, in practice, each of these quantum expecta-tions is estimated via sampling of the output of the quan-tum computer [28]. Even assuming a perfect fidelity ofquantum computation, sampling measurement outcomesof eigenvalues of observables from the output of the quan-tum computer to estimate an expectation will have somenon-negligible variance for any finite number of samples.Assuming each of the Hamiltonian terms of equation (7)admit a Pauli operator decomposition as

hj =

Jj∑k=1

γjkPk, (9)

where the γjk’s are real-valued coefficients and the Pj ’sare Paulis that are Pauli operators [76], then to get an

estimate of the expectation value 〈hj〉 within an accu-racy ε, one needs to take a number of measurement sam-

ples scaling as ∼ O(‖hk‖2∗/ε2), where ‖hk‖∗ =∑Jjk=1 |γ

jk|

is the Pauli coefficient norm of each Hamiltonian term.Thus, to estimate the expectation value of the full Hamil-tonian (7) accurately within a precision ε = ε

∑k |αk|2,

we would need on the order of ∼ O( 1ε2

∑k |αk|2‖hk‖2∗)

measurement samples in total [26, 79], as we would needto measure each expectation independently if we are fol-lowing the quantum expectation estimation trick of (8).

This is in sharp contrast to classical methods for gradi-ents involving backpropagation, where gradients can beestimated to numerical precision; i.e. within a precisionε with ∼ O(PolyLog(ε)) overhead. Although there havebeen attempts to formulate a backpropagation principlefor quantum computations [54], these methods also relyon the measurement of a quantum observable, thus alsorequiring ∼ O( 1

ε2 ) samples.

As we will see in the following section III C, estimat-ing gradients of quantum neural networks on quantumcomputers involves the estimation of several expectationvalues of the cost function for various values of the pa-rameters. One trick that was recently pointed out [46, 80]and has been proven to be successful both theoreticallyand empirically to estimate such gradients is the stochas-tic selection of various terms in the quantum expectationestimation. This can greatly reduce the number of mea-surements needed per gradient update, we will cover thisin subsection III C 3.

C. Gradients of Quantum Neural Networks

Now that we have established how to evaluate theloss function, let us describe how to obtain gradients ofthe cost function with respect to the parameters. Whyshould we care about gradients of quantum neural net-works? In classical deep learning, the most common fam-ily of optimization heuristics for the minimization of costfunctions are gradient-based techniques [81–83], whichinclude stochastic gradient descent and its variants. Toleverage gradient-based techniques for the learning ofmultilayered models, the ability to rapidly differentiateerror functionals is key. For this, the backwards propa-gation of errors [84] (colloquially known as backprop), isa now canonical method to progressively calculate gradi-ents of parameters in deep networks. In its most generalform, this technique is known as automatic differentiation[85], and has become so central to deep learning that thisfeature of differentiability is at the core of several frame-works for deep learning, including of course TensorFlow(TF) [86], JAX [87], and several others.

To be able to train hybrid quantum-classical models(section III D), the ability to take gradients of quantumneural networks is key. Now that we understand thegreater context, let us describe a few techniques belowfor the estimation of these gradients.

1. Finite difference methods

A simple approach is to use simple finite-differencemethods, for example, the central difference method,

∂kf(θ) =f(θ + ε∆k)− f(θ − ε∆k)

2ε+O(ε2) (10)

which, in the case where there are M continuous param-eters, involves 2M evaluations of the objective function,each evaluation varying the parameters by ε in some di-rection, thereby giving us an estimate of the gradientof the function with a precision O(ε2). Here the ∆k

is a unit-norm perturbation vector in the kth directionof parameter space, (∆k)j = δjk. In general, one mayuse lower-order methods, such as forward difference withO(ε) error from M + 1 objective queries [23], or higherorder methods, such as a five-point stencil method, withO(ε4) error from 4M queries [88].

2. Parameter shift methods

As recently pointed out in various works [80, 89], givenknowledge of the form of the ansatz (e.g. as in (3)), onecan measure the analytic gradients of the expectationvalue of the circuit for Hamiltonians which have a single-term in their Pauli decomposition (3) (or, alternatively, ifthe Hamiltonian has a spectrum ±λ for some positiveλ). For multi-term Hamiltonians, in [89] a method to

16

obtain the analytic gradients is proposed which uses alinear combination of unitaries. Here, instead, we willsimply use a change of variables and the chain rule toobtain analytic gradients of parametric unitaries of theform (5) without the need for ancilla qubits or additionalunitaries.

For a parameter of interest θ`j appearing in a layer ` in aparametric circuit in the form (5), consider the change of

variables ηj`k ≡ θ`jβj`k , then from the chain rule of calculus

[90], we have

∂f

∂θ`j=∑k

∂f

∂ηj`k

∂ηj`k∂θ`j

=∑k

βj`k∂f

∂ηk. (11)

Thus, all we need to compute are the derivatives of the

cost function with respect to ηj`k . Due to this change of

variables, we need to reparameterize our unitary U(θ)from (1) as

U `(θ`) 7→ U `(η`) ≡⊗j∈I`

(∏k

e−iηj`k Pk

), (12)

where I` ≡ 1, . . . ,M` is an index set for the QNNlayers. One can then expand each exponential in theabove in a similar fashion to (5):

e−iηj`k Pk = cos(ηj`k )I − i sin(ηj`k )Pk. (13)

As can be shown from this form, the analytic derivativeof the expectation value f(η) ≡ 〈Ψ0| U†(η)HU(η) |Ψ0〉with respect to a component ηj`k can be reduced to fol-lowing parameter shift rule [46, 80, 91]:

∂

∂ηj`kf(η) = f(η + π

4 ∆j`k )− f(η − π

4 ∆j`k ) (14)

where ∆j`k is a vector representing unit-norm perturba-

tion of the variable ηj`k in the positive direction. We thussee that this shift in parameters can generally be muchlarger than that of the numerical differentiation parame-ter shifts as in equation (10). In some cases this is usefulas one does not have to resolve as fine-grained a differ-ence in the cost function as an infinitesimal shift, hencerequiring less runs to achieve a sufficiently precise esti-mate of the value of the gradient.

Note that in order to compute through the chain rulein (11) for a parametric unitary as in (3), we need toevaluate the expectation function 2K` times to obtainthe gradient of the parameter θ`j . Thus, in some caseswhere each parameter generates an exponential of manyterms, although the gradient estimate is of higher pre-cision, obtaining an analytic gradient can be too costlyin terms of required queries to the objective function.To remedy this additional overhead, Harrow et al. [80]proposed to stochastically select terms according to a dis-tribution weighted by the coefficients of each term in thegenerator, and to perform gradient descent from thesestochastic estimates of the gradient. Let us review thisstochastic gradient estimation technique as it is imple-mented in TFQ.

3. Stochastic Parameter Shift Gradient Estimation

Consider the full analytic gradient from (14) and(11), if we have M` parameters and L layers, there are∑L`=1M` terms of the following form to estimate:

∂f

∂θ`j=

Kj`∑k=1

βj`k∂f

∂ηk=

Kj`∑k=1

[∑±±βj`k f(η ± π

4 ∆j`k )]. (15)

These terms come from the components of the gradientvector itself which has the dimension equal to that of thetotal number of free parameters in the QNN, dim(θ).For each of these components, for the jth component ofthe `th layer, there 2Kj` parameter-shifted expectation

values to evaluate, thus in total there are∑L`=1 2Kj`M`

parameterized expectation values of the cost Hamiltonianto evaluate.

For practical implementation of this estimation proce-dure, we must expand this sum further. Recall that, asthe cost Hamiltonian generally will have many terms, foreach quantum expectation estimation of the cost functionfor some value of the parameters, we have

f(θ) = 〈H〉θ =

N∑m=1

αm 〈hm〉θ =

N∑m=1

Jm∑q=1

αmγmq 〈Pqm〉θ ,

(16)

which has∑Nj=1 Jj terms. Thus, if we consider that all

the terms in (15) are of the form of (16), we see that we

have a total number of∑Nm=1

∑L`=1 2JmKj`M` expecta-

tion values to estimate the gradient. Note that one ofthese sums comes from the total number of appearancesof parameters in front of Paulis in the generators of theparameterized quantum circuit, the second sum comesfrom the various terms in the cost Hamiltonian in thePauli expansion.

As the cost of accurately estimating all these termsone by one and subsequently linearly combining the val-ues such as to yield an estimate of the total gradientmay be prohibitively expensive in terms of numbers ofruns, instead, one can stochastically estimate this sum,by randomly picking terms according to their weighting[46, 80].

One can sample a distribution over the ap-pearances of a parameter in the QNN, k ∼Pr(k|j, `) = |βj`k |/(

∑Kj`o=1 |βj`o |), one then estimates the

two parameter-shifted terms corresponding to this in-dex in (15) and averages over samples. We considerthis case to be simply stochastic gradient estimation forthe gradient component corresponding to the parame-ter θ`j . One can go even further in this spirit, for eachof these sampled expectation values, by also samplingterms from (16) according to a similar distribution de-termined by the magnitude of the Pauli expansion co-efficients. Sampling the indices q,m ∼ Pr(q,m) =

|αmγmq |/(∑Nd=1

∑Jdr=1 |αdγdr |) and estimating the expec-

tation 〈Pqm〉θ for the appropriate parameter-shifted val-

17

Figure 11. High-level depiction of a quantum-classical neuralnetwork. Blue blocks represent Deep Neural Network (DNN)function blocks and orange boxes represent Quantum NeuralNetwork (QNN) function blocks. Arrows represent the flowof information during the feedforward (inference) phase. Foran example of the interface between quantum and classicalneural network blocks, see Fig. 12.

ues sampled from the terms of (15) according to the pro-cedure outlined above. This is considered doubly stochas-tic gradient estimation. In principle, one could go onestep further, and per iteration of gradient descent, ran-domly sample indices representing subsets of parame-ters for which we will estimate the gradient component,and set the non-sampled indices corresponding gradientcomponents to 0 for the given iteration. The distribu-tion we sample in this case is given by θ`j ∼ Pr(j, `) =∑Kj`k=1 |β

j`k |/(

∑Lu=1

∑Mu

i=1

∑Kiuo=1 |βiuo |). This is, in a sense,

akin to the SPSA algorithm [92], in the sense thatit is a gradient-based method with a stochastic mask.The above component sampling, combined with doublystochastic gradient descent, yields what we consider to betriply stochastic gradient descent. This is equivalent to si-multaneously sampling j, `, k, q,m ∼ Pr(j, `, k, q,m) =Pr(k|j, `)Pr(j, `)Pr(q,m) using the probabilities outlinedin the paragraph above, where j and ` index the param-eter and layer, k is the index from the sum in equation(15), and q and m are the indices of the sum in equation(16).

In TFQ, all three of the stochastic averaging meth-ods above can be turned on or off independently forstochastic parameter-shift gradients. See the details inthe Differentiator module of TFQ on GitHub.

Now that we have reviewed various ways of obtaininggradients of expectation values, let us consider how to gobeyond basic variational quantum algorithms and con-sider fully hybrid quantum-classical neural networks. Aswe will see, our general framework of gradients of costHamiltonians will carry over.

D. Hybrid Quantum-Classical ComputationalGraphs

1. Hybrid Quantum-Classical Neural Networks

Now, we are ready to formally introduce the no-tion of Hybrid Quantum-Classical Neural Networks(HQCNN’s). HQCNN’s are meta-networks comprisedof quantum and classical neural network-based functionblocks composed with one another in the topology of adirected graph. We can consider this a rendition of a hy-brid quantum-classical computational graph where theinner workings (variables, component functions) of vari-ous functions are abstracted into boxes (see Fig. 11 for adepiction of such a graph). The edges then simply repre-sent the flow of classical information through the meta-network of quantum and classical functions. The key willbe to construct parameterized (differentiable) functionsfθ : RM → RN from expectation values of parameterizedquantum circuits, then creating a meta-graph of quan-tum and classical computational nodes from these blocks.Let us first describe how to create these functions fromexpectation values of QNN’s.

As we saw in equations (6) and (7), we get a differ-entiable cost function f : RM 7→ R from taking the ex-pectation value of a single Hamiltonian at the end of theparameterized circuit, f(θ) = 〈H〉θ. As we saw in equa-tions (7) and (8), to compute this expectation value, asthe readout Hamiltonian is often decomposed into a lin-ear combination of operators H = α · h (see (7)), thenthe function is itself a linear combination of expectationvalues of multiple terms (see (8)), 〈H〉θ = α · hθ where

hθ ≡ 〈h〉θ ∈ RN is a vector of expectation values. Thus,before the scalar value of the cost function is evaluated,QNN’s naturally are evaluated as a vector of expectationvalues, hθ.

Hence, if we would like the QNN to become more likea classical neural network block, i.e. mapping vectors tovectors f : RM → RN , we can obtain a vector-valueddifferentiable function from the QNN by considering itas a function of the parameters which outputs a vectorof expectation values of different operators,

f : θ 7→ hθ (17)

where

(hθ)k = 〈hk〉θ ≡ 〈Ψ0| U†(θ)hkU(θ) |Ψ0〉 . (18)

We represent such a QNN-based function in Fig. 12. Note

that, in general, each of these hk’s could be comprised ofmultiple terms themselves,

hk =

Nk∑t=1

γtmt (19)

hence, one can perform Quantum Expectation Estima-

tion to estimate the expectation of each term as 〈hk〉 =∑t γt 〈mt〉.

18

Note that, in some cases, instead of the expectation

values of the set of operators hkMk=1, one may insteadwant to relay the histogram of measurement results ob-tained from multiple measurements of the eigenvaluesof each of these observables. This case can also bephrased as a vector of expectation values, as we willnow show. First, note that the histogram of the mea-

surement results of some hk with eigendecomposition

hk =∑rkj=1 λjk |λjk〉〈λjk| can be considered as a vec-

tor of expectation values where the observables are theeigenstate projectors |λjk〉〈λjk|. Instead of obtaining a

single real number from the expectation value of hk, we

can obtain a vector hk ∈ Rrk , where rk = rank(hk) andthe components are given by (hk)j ≡ 〈|λjk〉〈λjk|〉θ. Weare then effectively considering the categorical (empiri-cal) distribution as our vector.

Now, if we consider measuring the eigenvalues of mul-

tiple observables hkMk=1 and collecting the measure-ment result histograms, we get a 2-dimensional tensor(hθ)jk = 〈|λjk〉〈λjk|〉θ. Without loss of generality, we

can flatten this array into a vector of dimension RR where

R =∑Mk=1 rk. Thus, considering vectors of expectation

values is a relatively general way of representing the out-put of a quantum neural network. In the limit where theset of observables considered forms an informationally-complete set of observables [93], then the array of mea-surement outcomes would fully characterize the wave-function, albeit at an overhead exponential in the numberof qubits.

We should mention that in some cases, instead of ex-pectation values or histograms, single samples from theoutput of a measurement can be used for direct feedback-control on the quantum system, e.g. in quantum errorcorrection [76]. At least in the current implementation ofTFQuantum, since quantum circuits are built in Cirq andthis feature is not supported in the latter, such scenar-ios are currently out-of-scope. Mathematically, in sucha scenario, one could then consider the QNN and mea-surement as map from quantum circuit parameters θ tothe conditional random variable Λθ valued over RNk cor-responding to the measured eigenvalues λjk with a prob-ability density Pr[(Λθ)k ≡ λjk] = p(λjk|θ) which cor-responds to the measurement statistics distribution in-duced by the Born rule, p(λjk|θ) = 〈|λjk〉〈λjk|〉θ. ThisQNN and single measurement map from the parametersto the conditional random variable f : θ 7→ Λθ can beconsidered as a classical stochastic map (classical condi-tional probability distribution over output variables giventhe parameters). In the case where only expectation val-ues are used, this stochastic map reduces to a determinis-tic node through which we may backpropagate gradients,as we will see in the next subsection. In the case wherethis map is used dynamically per-sample, this remains astochastic map, and though there exists some algorithmsfor backpropagation through stochastic nodes [94], theseare not currently supported natively in TFQ.

E. Autodifferentiation through hybridquantum-classical backpropagation

As described above, hybrid quantum-classical neuralnetwork blocks take as input a set of real parameters θ ∈RM , apply a circuit U(θ) and take a set of expectationvalues of various observables

(hθ)k = 〈hk〉θ .

The result of this parameter-to-expected value map isa function f : RM → RN which maps parameters to areal-valued vector,

f : θ 7→ hθ.

This function can then be composed with other param-eterized function blocks comprised of either quantum orclassical neural network blocks, as depicted in Fig. 11.

To be able to backpropagate gradients through gen-eral meta-networks of quantum and classical neural net-work blocks, we simply have to figure out how to back-propagate gradients through a quantum parameterizedblock function when it is composed with other parame-terized block functions. Due to the partial ordering ofthe quantum-classical computational graph, we can fo-cus on how to backpropagate gradients through a QNNin the scenario where we consider a simplified quantum-classical network where we combine all function blocksthat precede and postcede the QNN block into mono-lithic function blocks. Namely, we can consider a sce-nario where we have fpre : xin 7→ θ (fpre : Rin → RM )as the block preceding the QNN, the QNN block asfqnn : θ 7→ hθ, (fqnn : RM → RN ), the post-QNNblock as fpost : hθ 7→ yout (fpost : RM → RNout) andfinally the loss function for training the entire networkbeing computed from this output L : RNout → R. Thecomposition of functions from the input data to the out-put loss is then the sequentially composited function(Lfpostfqnnfpre). This scenario is depicted in Fig. 12.

Now, let us describe the process of backpropagationthrough this composition of functions. As is standard inbackpropagation of gradients through feedforward net-works, we begin with the loss function evaluated at theoutput units and work our way back through the sev-eral layers of functional composition of the network toget the gradients. The first step is to obtain the gra-dient of the loss function ∂L/∂yout and to use classi-cal (regular) backpropagation of gradients to obtain thegradient of the loss with respect to the output of theQNN, i.e. we get ∂(L fpost)/∂h via the usual use ofthe chain rule for backpropagation, i.e., the contractionof the Jacobian with the gradient of the subsequent layer,

∂(L fpost)/∂h = ∂L∂y ·

∂fpost

∂h .

Now, let us label the evaluated gradient of the lossfunction with respect to the QNN’s expectation valuesas

g ≡ ∂(Lfpost)∂h

∣∣∣h=hθ

. (20)

19

We can now consider this value as effectively a constant.Let us then define an effective backpropagated Hamilto-nian with respect to this gradient as

Hg ≡∑k

gkhk,

where gk are the components of (20). Notice that expec-tations of this Hamiltonian are given by

〈Hg〉θ = g · hθ,

and so, the gradients of the expectation value of thisHamiltonian are given by

∂∂θj〈Hg〉θ = ∂

∂θj(g · hθ) =

∑k

gk∂hθ,k

∂θj

which is exactly the Jacobian of the QNN function fqnn

contracted with the backpropagated gradient of previouslayers. Explicitly,

∂(L fpost fqnn)/∂θ = ∂∂θ 〈Hg〉θ .

Thus, by taking gradients of the expectation value ofthe backpropagated effective Hamiltonian, we can get thegradients of the loss function with respect to QNN pa-rameters, thereby successfully backpropagating gradientsthrough the QNN. Further backpropagation through thepreceding function block fpre can be done using standardclassical backpropagation by using this evaluated QNNgradient.

Note that for a given value of g, the effective back-propagated Hamiltonian is simply a fixed Hamiltonianoperator, as such, taking gradients of the expectationof a single multi-term operator can be achieved by anychoice in a multitude of the methods for taking gradientsof QNN’s described earlier in this section.

Backpropagation through parameterized quantum cir-cuits is enabled by our Differentiator interface.

We offer both finite difference, regular parameter shift,and stochastic parameter shift gradients, while the gen-eral interface allows users to define custom gradientmethods.

IV. BASIC QUANTUM APPLICATIONS

The following examples show how one may use the var-ious features of TFQ to reproduce and extend existingresults in second generation quantum machine learning.Each application has an associated Colab notebook whichcan be run in-browser to reproduce any results shown.Here, we use snippets of code from those example note-books for illustration; please see the example notebooksfor full code details.

Figure 12. Example of inference and hybrid backpropaga-tion at the interface of a quantum and classical part of a hy-brid computational graph. Here we have classical deep neuralnetworks (DNN) both preceding and postceding the quan-tum neural network (QNN). In this example, the precedingDNN outputs a set of parameters θ which are used as thenused by the QNN as parameters for inference. The QNNoutputs a vector of expectation values (estimated through

several runs) whose components are (hθ)k = 〈hk〉θ. Thisvector is then fed as input to another (post-ceding) DNN,and the loss function L is computed from this output. Forbackpropagation through this hybrid graph, one first back-propagates the gradient of the loss through the post-cedingDNN to obtain gk = ∂L/∂hk. Then, one takes the gradi-ent of the following functional of the output of the QNN:fθ = g · hθ =

∑k gkhθ,k =

∑k gk 〈hk〉θ with respect to the

QNN parameters θ (which can be achieved with any of themethods for taking gradients of QNN’s described in previoussubsections of this section). This completes the backprop-agation of gradients of the loss function through the QNN,the preceding DNN can use the now computed ∂L/∂θ to fur-ther backpropagate gradients to preceding nodes of the hybridcomputational graph.

A. Hybrid Quantum-Classical ConvolutionalNeural Network Classifier

To run this example in the browser through Colab,follow the link:

docs/tutorials/qcnn.ipynb

1. Background

Supervised classification is a canonical task in classicalmachine learning. Similarly, it is also one of the mostwell-studied applications for QNNs [15, 30, 42, 95, 96].As such, it is a natural starting point for our explorationof applications for quantum machine learning. Discrimi-native machine learning with hierarchical models can beunderstood as a form of compression to isolate the infor-mation containing the label [97]. In the case of quantumdata, the hidden classical parameter (real scalar in thecase of regression, discrete label in the case of classifica-tion) can be embedded in a non-local subsystem or sub-space of the quantum system. One then has to performsome disentangling quantum transformation to extractinformation from this non-local subspace.

https://github.com/tensorflow/quantum/blob/master/docs/tutorials/qcnn.ipynb

20

To choose an architecture for a neural network model,one can draw inspiration from the symmetries in thetraining data. For example, in computer vision, one of-ten needs to detect corners and edges regardless of theirposition in an image; we thus postulate that a neural net-work to detect these features should be invariant undertranslations. In classical deep learning, an example ofsuch translationally-invariant neural networks are convo-lutional neural networks. These networks tie parametersacross space, learning a shared set of filters which areapplied equally to all portions of the data.

To the best of the authors’ knowledge, there is nostrong indication that we should expect a quantum ad-vantage for the classification of classical data using QNNsin the near term. For this reason, we focus on classifyingquantum data as defined in section I C. There are manykinds of quantum data with translational symmetry. Oneexample of such quantum data are cluster states. Thesestates are important because they are the initial statesfor measurement-based quantum computation [98, 99].In this example we will tackle the problem of detect-ing errors in the preparation of a simple cluster state.We can think of this as a supervised classification task:our training data will consist of a variety of correctlyand incorrectly prepared cluster states, each paired withtheir label. This classification task can be generalizedto condensed matter physics and beyond, for example tothe classification of phases near quantum critical points,where the degree of entanglement is high.

Since our simple cluster states are translationally in-variant, we can extend the spatial parameter-tying ofconvolutional neural networks to quantum neural net-works, using recent work by Cong, et al. [52], whichintroduced a Quantum Convolutional Neural Network(QCNN) architecture. QCNNs are essentially a quan-tum circuit version of a MERA (Multiscale EntanglementRenormalization Ansatz) network [100]. MERA has beenextensively studied in condensed matter physics. It is ahierarchical representation of highly entangled wavefunc-tions. The intuition is that as we go higher in the net-work, the wavefunction’s entanglement gets renormalized(coarse grained) and simultaneously a compressed repre-sentation of the wavefunction is formed. This is akin tothe compression effects encountered in deep neural net-works [101].

Here we extend the QCNN architecture to include clas-sical neural network postprocessing, yielding a HybridQuantum Convolutional Neural Network (HQCNN). Weperform several low-depth quantum operations in orderto begin to extract hidden parameter information from awavefunction, then pass the resulting statistical informa-tion to a classical neural network for further processing.Specifically, we will apply one layer of the hierarchy of theQCNN. This allows us to partially disentangle the inputstate and obtain statistics about values of multi-local ob-servables. In this strategy, we deviate from the originalconstruction of Cong et al. [52]. Indeed, we are more inthe spirit of classical convolutional networks, where there

are several families of filters, or feature maps, applied ina translationally-invariant fashion. Here, we apply a sin-gle QCNN layer followed by several feature maps. Theoutputs of these feature maps can then be fed into clas-sical convolutional network layers, or in this particularsimplified example directly to fully-connected layers.

Target problems:

1. Learn to extract classical information hidden in cor-relations of a quantum system

2. Utilize shallow quantum circuits via hybridizationwith classical neural networks to extract informa-tion

Required TFQ functionalities:

1. Hybrid quantum-classical network models2. Batch quantum circuit simulator3. Quantum expectation based back-propagation4. Fast classical gradient-based optimizer

2. Implementations

As discussed in section II D 2, the first step in the QMLpipeline is the preparation of quantum data. In this ex-ample, our quantum dataset is a collection of correctlyand incorrectly prepared cluster states on 8 qubits, andthe task is to classify theses states. The dataset prepara-tion proceeds in two stages; in the first stage, we generatea correctly prepared cluster state:

def cluster_state_circuit(bits):

circuit = cirq.Circuit ()

circuit.append(cirq.H.on_each(bits))

for this_bit , next_bit in zip(

bits , bits [1:] + [bits [0]]):

circuit.append(

cirq.CZ(this_bit , next_bit))

return circuit

Errors in cluster state preparation will be simulated byapplying Rx(θ) gates that rotate a qubit about the X-axisof the Bloch sphere by some amount 0 ≤ θ ≤ 2π. Theseexcitations will be labeled 1 if the rotation is larger thansome threshold, and -1 otherwise. Since the correctlyprepared cluster state is always the same, we can thinkof it as the initial state in the pipeline and append theexcitation circuits corresponding to various error states:

cluster_state_bits = cirq.GridQubit.rect(1, 8)

excitation_input = tf.keras.Input(


cluster_state = tfq.layers.AddCircuit ()(

excitation_input , prepend=

cluster_state_circuit(cluster_state_bits))

Note how excitation_input is a standard Keras data in-

gester. The datatype of the input is string to account

for our circuit serialization mechanics described in sec-tion II E 1.

Having prepared our dataset, we begin construction ofour model. The quantum portion of all the models we

21

Figure 13. The quantum portion of our classifiers, shown on4 qubits. The combination of quantum convolution (blue)and quantum pooling (orange) reduce the system size from 4qubits to 2 qubits.

consider in this section will be made of the same oper-ations: quantum convolution and quantum pooling. Avisualization of these operations on 4 qubits is shown inFig. 13. Quantum convolution layers are enacted by ap-

plying a 2 qubit unitary U(~θ) with a stride of 1. In anal-ogy with classical convolutional layers, the parameters ofthe unitaries are tied, so that the same operation is ap-plied to every nearest-neighbor pair of qubits. Pooling is

achieved using a different 2 qubit unitary G(~φ) designedto disentangle, allowing information to be projected from2 qubits down to 1. The code below defines the quantumconvolution and quantum pooling operations:

def quantum_conv_circuit(bits , syms):


for a, b in zip(bits [0::2] , bits [1::2]):

circuit += two_q_unitary ([a, b], syms)

for a, b in zip(bits [1::2] , bits [2::2] + [

bits [0]]):

circuit += two_q_unitary ([a, b], syms)

return circuit

def quantum_pool_circuit(srcs , snks , syms):


for src , snk in zip(srcs , snks):

circuit += two_q_pool(src , snk , syms)

return circuit

In the code, two_q_unitary constructs a general param-

eterized two qubit unitary [102], while two_q_pool rep-

resents a CNOT with general one qubit unitaries on thecontrol and target qubits, allowing for variational selec-tion of control and target basis.

With the quantum portion of our model defined, wemove on to the third and fourth stages of the QMLpipeline, measurement and classical post-processing. Weconsider three classifier variants, each containing a dif-ferent degree of hybridization with classical networks:

1. Purely quantum CNN2. Hybrid CNN in which the outputs of a truncated

QCNN are fed into a standard densely connectedneural net

3. Hybrid CNN in which the outputs of multiple trun-cated QCNNs are fed into a standard densely con-nected neural net

1-1

11

Prepare Cluster State

Qconv +

QPoolQconv

+ QPool MSE Loss

-11

Figure 14. Architecture of the purely quantum CNN for de-tecting excited cluster states.

The first model we construct uses only quantum op-erations to decorrelate the inputs. After preparing thecluster state dataset on N = 8 qubits, we repeatedly ap-ply the quantum convolution and pooling layers until thesystem size is reduced to 1 qubit. We average the outputof the quantum model by measuring the expectation ofPauli-Z on this final qubit. Measurement and parametercontrol are enacted via our tfq.layers.PQC object. The

code for this model is shown below:

readout_operators = cirq.Z(

cluster_state_bits [-1])

quantum_model = tfq.layers.PQC(

create_model_circuit(cluster_state_bits),

readout_operators)(cluster_state)

qcnn_model = tf.keras.Model(

inputs =[ excitation_input],

outputs =[ quantum_model ])

In the code, create_model_circuit is a function which ap-plies the successive layers of quantum convolution andquantum pooling. A simplified version of the resultingmodel on 4 qubits is shown in Fig. 14.

With the model constructed, we turn to training andvalidation. These steps can be accomplished using stan-dard Keras tools. During training, the model output oneach quantum datapoint is compared against the label;the cost function used is the mean squared error betweenthe model output and the label, where the mean is takenover each batch from the dataset. The training and val-idation code is shown below:

qcnn_model.compile(optimizer=tf.keras.Adam ,

loss=tf.losses.mse)

(train_excitations , train_labels ,

test_excitations , test_labels

) = generate_data(cluster_state_bits)

history = qcnn_model.fit(

x=train_excitations ,

y=train_labels ,

batch_size =16,

epochs =25,

validation_data =(

test_excitations , test_labels))

In the code, the generate_data function builds the exci-

tation circuits that are applied to the initial cluster state

22

Prepare Cluster

Test

Qconv +

QPoolMSE Loss

1-1

11

-11

Figure 15. A simple hybrid architecture in which the outputsof a truncated QCNN are fed into a classical neural network.

input, along with the associated labels. The loss plotsfor both the training and validation datasets can be gen-erated by running the associated example notebook.

We now consider a hybrid classifier. Instead of us-ing quantum layers to pool all the way down to 1 qubit,we can truncate the QCNN and measure a vector of op-erators on the remaining qubits. The resulting vectorof expectation values is then fed into a classical neuralnetwork. This hybrid model is shown schematically inFig. 15.

This can be achieved in TFQ with a few simple mod-ifications to the previous model, implemented with thecode below:

# Build multi -readout quantum layer

readouts = [cirq.Z(bit) for bit in

cluster_state_bits [4:]]

quantum_model_dual = tfq.layers.PQC(

multi_readout_model_circuit(

cluster_state_bits),

readouts)(cluster_state)

# Build classical neural network layers

d1_dual = tf.keras.layers.Dense (8)(

quantum_model_dual)

d2_dual = tf.keras.layers.Dense (1)(d1_dual)

hybrid_model = tf.keras.Model(inputs =[

excitation_input], outputs =[ d2_dual ])

In the code, multi_readout_model_circuit applies just oneround of convolution and pooling, reducing the systemsize from 8 to 4 qubits. This hybrid model can be trainedusing the same Keras tools as the purely quantum model.Accuracy plots can be seen in the example notebook.

The third architecture we will explore creates threeindependent quantum filters, and combines the outputsfrom all three with a single classical neural network. Thisarchitecture is shown in Fig. 16. This multi-filter archi-tecture can be implemented in TFQ as below:

# Build 3 quantum filters

QCNN_1 = tfq.layers.PQC(







Prepare Cluster State

Qconv +

QPool

MSE Loss

Qconv +

QPool

Qconv +

QPool

1-1

11

-11

Figure 16. A hybrid architecture in which the outputs of3 separate truncated QCNNs are fed into a classical neuralnetwork.






# Feed all QCNNs into a classical NN

concat_out = tf.keras.layers.concatenate(

[QCNN_1 , QCNN_2 , QCNN_3 ])

dense_1 = tf.keras.layers.Dense (8)(concat_out)

dense_2 = tf.keras.layers.Dense (1)(dense_1)

multi_qconv_model = tf.keras.Model(

inputs =[ excitation_input],

outputs =[ dense_2 ])

We find that that for the same optimization settings, thepurely quantum model trains the slowest, while the three-quantum-filter hybrid model trains the fastest. This datais shown in Fig. 17. This demonstrates the advantageof exploring hybrid quantum-classical architectures forclassifying quantum data.

Figure 17. Mean squared error loss as a function of trainingepoch for three different hybrid classifiers. We find that thepurely quantum classifier trains the slowest, while the hybridarchitecture with multiple quantum filters trains the fastest.

23

B. Hybrid Machine Learning for Quantum Control


research/control/control.ipynb

Recently, neural networks have been successfully de-ployed for solving quantum control problems rangingfrom optimizing gate decomposition, error correctionsubroutines, to continuous Hamiltonian controls. To fullyleverage the power of neural networks without being hob-bled by possible computational overhead, it is essentialto obtain a deeper understanding of the connection be-tween various neural network representations and differ-ent types of quantum control dynamics. We demonstratetailoring machine learning architectures to underlyingquantum dynamics using TFQ in [103]. As a summaryof how the unique functionalities of TFQ ease quantumcontrol optimization, we list the problem definition andrequired TFQ toolboxes as follows.

Target problems:

1. Learning quantum dynamics.2. Optimizing quantum control signal with regard to

a cost objective3. Error mitigation in realistic quantum device


1. Hybrid quantum-classical network model2. Batch quantum circuit simulator3. Quantum expectation-based backpropagation4. Fast classical optimizers, both gradient based and

non-gradient based

We exemplify the importance of appropriately choos-ing the right neural network architecture for the corre-sponding quantum control problems with two simple butrealistic control optimizations. The two types of con-trols we have considered cover the full range of quan-tum dynamics: constant Hamiltonian evolution vs timedependent Hamiltonian evolution. In the first problem,we design a DNN to machine-learn (noise-free) controlof a single qubit. In the second problem, we design anRNN with long-term memory to learn a stochastic non-Markovian control noise model.

1. Time-Constant Hamiltonian Control

If the underlying system Hamiltonian is time-invariantthe task of quantum control can be simplified with open-loop optimization. Since the optimal solution is indepen-dent of instance by instance control actualization, controloptimization can be done offline. Let x be the input to acontroller, which produces some control vector g = F (x).This control vector actuates a system which then pro-duces a vector output H(g). For a set of control inputs

MSE

Figure 18. Architecture for hybrid quantum-classical neuralnetwork model for learning a quantum control decomposition.

xj and desired outputs yj, we can then define controllererror as ej(F ) = |yj − H(F (xj))|. The optimal controlproblem can then be defined as minimizing

∑j ej(F ) for

F . This optimal controller will produce the optimal con-trol vector g∗j given xj

This problem can be solved exactly if H and the re-lationships between g∗j and xj are well understood andinvertible. Alternatively, one can find an approximatesolution using a parameterized controller F in the formof a feed-forward neural network. We can calculate a setof control pairs of size N : xi,yi with i ∈ [N ]. We caninput xi into F which is parameterized by its weight ma-trices Wi and biases bi of each ith layer. A successfultraining will find network parameters given by Wi andbi such that for any given input xi the network out-puts gi which leads to a system output H(gi) u yi. Thisarchitecture is shown schematically in Fig. 18

There are two important reasons behind the use ofsupervised learning for practical control optimization.Firstly, not all time-invariant Hamiltonian control prob-lems permit analytical solutions, so inverting the con-trol error function map can be costly in computation.Secondly, realistic deployment of even a time-constantquantum controller faces stochastic fluctuations due tonoise in the classical electronics and systematic errorswhich cause the behavior of the system H to deviatefrom ideal. Deploying supervised learning with experi-mentally measured control pairs will therefore enable thefinding of robust quantum control solutions facing sys-tematic control offset and electronic noise. However, thisnecessitates seamlessly connecting the output of a clas-sical neural network with the execution of a quantumcircuit in the laboratory. This functionality is offered byTFQ through ControlledPQC .

We showcase the use of supervised learning with hy-brid quantum-classical feed-forward neural networks inTFQ for the single-qubit gate decomposition problem. Ageneral unitary transform on one qubit can be specifiedby the exponential

U(φ, θ1, θ2) = e−iφ(cos θ1Z+sin θ1(cos θ2X+sin θ2Y )). (21)

However, it is often preferable to enact single qubit uni-

https://github.com/tensorflow/quantum/blob/research/control/control.ipynb

24

taries using rotations about a single axis at a time.Therefore, given a single qubit gate specified by the vec-tor of three rotations φ, θ1, θ2, we want find the controlsequence that implements this gate in the form

U(β, γ, δ) = eiαe−iβ2 Ze−i

γ2 Y e−i

δ2 Z . (22)

This is the optimal decomposition, namely the Bloch the-orem, given any unknown single-qubit unitary into hard-ware friendly gates which only involve the rotation alonga fixed axis at a time.

The first step in the training involves preparing thetraining data. Since quantum control optimization onlyfocuses on the performance in hardware deployment, thecontrol inputs and output have to be chosen such thatthey are experimentally observable. We define the vec-tor of expectation values yi = [〈X〉xi

, 〈Y 〉xi, 〈Z〉xi

] of allsingle-qubit Pauli operators given by the quantum stateprepared by the associated input xi:

|ψi0〉 =U io |0〉 (23)

|ψ〉x =e−iβ2 Ze−i

γ2 Y e−i

δ2 Z |ψi0〉 , (24)

〈X〉x = 〈ψ|x X |ψ〉x , (25)

〈Y 〉x = 〈ψ|x Y |ψ〉x , (26)

〈Z〉x = 〈ψ|x Z |ψ〉x . (27)

Assuming we have prepared the training dataset, eachset consists of input vectors xi = [φ, θ1, θ2] which derivesfrom the randomly drawn g, unitaries that prepare eachinitial state U i0, and the associated expectation values

yi = [〈X〉xi, 〈Y 〉xi

, 〈Z〉xi].

Now we are ready to define the hybrid quantum-classical neural network model in Keras with Tensfor-Flow API. To start, we first define the quantum part ofthe hybrid neural network, which is a simple quantumcircuit of three single-qubit gates as follows.

control_params = sympy.symbols(’theta_ 1:3 ’)

qubit = cirq.GridQubit(0, 0)

control_circ = cirq.Circuit(

cirq.Rz(control_params [2])(qubit),

cirq.Ry(control_params [1])(qubit),

cirq.Rz(control_params [0])(qubit))

We are now ready to finish off the hybrid network bydefining the classical part, which maps the target paramsto the control vector g = β, γ, δ. Assuming we havedefined the vector of observables ops , the code to build

the model is:

circ_in = tf.keras.Input(


x_in = tf.keras.Input ((3,))

d1 = tf.keras.layers.Dense (128)(x_in)

d2 = tf.keras.layers.Dense (128)(d1)

d3 = tf.keras.layers.Dense (64)(d2)

g = tf.keras.layers.Dense (3)(d3)

exp_out = tfq.layers.ControlledPQC(

control_circ , ops)([circ_in , x_in])

Now, we are ready to put everything together to defineand train a model in Keras. The two axis control modelis defined as follows:

Figure 19. Mean square error on training dataset, and valida-tion dataset, each of size 5000, as a function training epoch.

model = tf.keras.Model(

inputs =[circ_in , x_in], outputs=exp_out)

To train this hybrid supervised model, we define an op-timizer, which in our case is the Adam optimizer, withan appropriately chosen loss function:

model.compile(tf.keras.Adam , loss=’mse’)

We finish off by training on the prepared supervised datain the standard way:

history_two_axis = model.fit (...

The training converges after around 100 epochs as seenin Fig. 19, which also shows excellent generalization tovalidation data.

2. Time-dependent Hamiltonian Control

Now we consider a second kind of quantum controlproblem, where the actuated system H is allowed tochange in time. If the system is changing with time, theoptimal control g∗ is also generally time varying. Gener-alizing the discussion of section IV B 1, we can write thetime varying control error given the time-varying con-troller F (t) as ej(F (t), t) = |yj − H(F (xj, t), t)|. Theoptimal control can then be written as g∗(t) = g∗+δ(t).

This task is significantly harder than the problem dis-cussed in section IV B 1, since we need to learn the hiddenvariable δ(t) which can result in potentially highly com-plex real-time system dynamics. We showcase how TFQprovides the perfect toolbox for such difficult control op-timization with an important and realistic problem oflearning and thus compensating the low frequency noise.

One of the main contributions to time-drifting errorsin realistic quantum control is 1/fα- like errors, whichencapsulate errors in the Hamiltonian amplitudes whosefrequency spectrum has a large component at the lowfrequency regime. The origin of such low frequency noiseremains largely controversial. Mathematically, we canparameterize the low frequency noise in the time domainwith the amplitude of the Pauli Z Hamiltonian on each

25

Figure 20. Mean square error on LSTM predictions on 500randomly generated inputs.

qubit as:

Hlow(t) = αteZ. (28)

A simple phase control signal is given by ω(t) = ω0t with

the Hamiltonian H0(t) = ω0tZ. The time-dependantwavefunction is then given by

|ψ(ti)〉 = T[e∫ ti0 (Hlow(t)+H0(t))dt] |+〉 . (29)

We can attempt to learn the noise parameters α and eby training a recurrent neural network to perform time-series prediction on the noise. In other words, given arecord of expectation values 〈ψ(ti)| X |ψ(ti)〉 for ti ∈0, δt, 2δt, . . . , T obtained on state |ψ(t)〉, we want the

RNN to predict the future observables 〈ψ(ti)| X |ψ(ti)〉for ti ∈ T, T + δt, T + 2δt, . . . , 2T.

There are several possible ways to build such an RNN.One option is recording several timeseries on a device a-priori, later training and testing an RNN offline. Anotheroption is an online method, which would allow for real-time controller tuning. The offline method will be brieflyexplained here, leaving the details of both methods tothe notebook associated with this example.

First, we can use TFQ or Cirq to prepare several time-series for testing and validation. The function below per-forms this task using TFQ:

def generate_data(end_time , timesteps , omega_0 ,

exponent , alpha):

t_steps = linspace(0, end_time , timesteps)

q = cirq.GridQubit(0, 0)

phase = sympy.symbols("phaseshift")

c = cirq.Circuit(cirq.H(q),

cirq.Rz(phase_s)(q))

ops = [cirq.X(q)]

phases = t_steps*omega_0 +

t_steps **( exponent + 1)/( exponent + 1)

return tfq.layers.Expectation ()(

c,

symbol_names = [phase],

symbol_values = transpose(phases),

operators = ops)

We can use this function to prepare many realizations ofthe noise process, which can be fed to an LSTM definedusing tf.keras as below:

model = tf.keras.Sequential ([

tf.keras.layers.LSTM(

rnn_units ,

recurrent_initializer=’glorot_uniform ’,

batch_input_shape =[batch_size , None , 1]),

tf.keras.layers.Dense (1)])

We can then train this LSTM on the realizations andevaluate the success of our training using validation data,on which we calculate prediction accuracy. A typical ex-ample of this is shown in Fig. 20. The LSTM convergesquickly to within the accuracy from the expectation valuemeasurements within 30 epochs.

C. Quantum Approximate Optimization


research/qaoa/qaoa.ipynb

1. Background

In this section, we introduce the basics of QuantumApproximate Optimization Algorithms and show how toimplement a basic case of this class of algorithms in TFQ.In the advanced applications section V, we explore how toapply meta-learning techniques [104] to the optimizationof the parameters of the algorithm.

The Quantum Approximate Optimization Algorithmwas originally proposed to solve instances of the Max-Cut problem [12]. The QAOA framework has since beenextended to encompass multiple problem classes relatedto finding the low-energy states of Ising Hamiltonians,including Hamiltonians of higher-order and continuous-variable Hamiltonians [48].

In general, the goal of the QAOA for binary variablesis to find approximate minima of a pseudo Boolean func-tion f on n bits, f(z), z ∈ −1, 1×n. This function is of-ten an mth-order polynomial of binary variables for somepositive integer m, e.g., f(z) =

∑p∈0,1m αpz

p, where

zp =∏nj=1 z

pjj . QAOA has been applied to NP-hard

problems such as Max-Cut [12] or Max-3-Lin-2 [105].The case where this polynomial is quadratic (m = 2)has been extensively explored in the literature. It shouldbe noted that there have also been recent advances us-ing quantum-inspired machine learning techniques, suchas deep generative models, to produce approximate solu-tions to such problems [106, 107]. These 2-local problemswill be the main focus in this example. In this tutorial,we first show how to utilize TFQ to solve a MaxCut in-stance with QAOA with p = 1.

The QAOA approach to optimization first starts inan initial product state |ψ0〉⊗n and then a tunable gatesequence produces a wavefunction with a high probabilityof being measured in a low-energy state (with respect toa cost Hamiltonian).

https://github.com/tensorflow/quantum/blob/research/qaoa/qaoa.ipynb

26

Let us define our parameterized quantum circuitansatz. The canonical choice is to start with a uniformsuperposition |ψ0〉⊗n = |+〉⊗n = 1√

2n(∑x∈0,1n |x〉),

hence a fixed state. The QAOA unitary itself then con-sists of applying

U(η,γ) =

P∏j=1

e−iηjHM e−iγjHC , (30)

onto the starter state, where HM =∑j∈V Xj is known

as the mixer Hamiltonian, and HC ≡ f(Z) is ourcost Hamiltonian, which is a function of Pauli opera-tors Z = Zjnj=1. The resulting state is given by

|Ψηγ〉 = U(η,γ) |+〉⊗n, which is our parameterized out-put. We define the energy to be minimized as the expec-tation value of the cost Hamiltonian HC ≡ f(Z), where

Z = Zjnj=1 with respect to the output parameterizedstate.

Target problems:

1. Train a parameterized quantum circuit for a dis-crete optimization problem (MaxCut)

2. Minimize a cost function of a parameterized quan-tum circuit


1. Conversion of simple circuits to TFQ tensors2. Evaluation of gradients for quantum circuits3. Use of gradient-based optimizers from TF

2. Implementation

For the MaxCut QAOA, the cost Hamiltonian functionf is a second order polynomial of the form,

HC = f(Z) =∑j,k∈E

12 (I − ZjZk), (31)

where G = V, E is a graph for which we would like tofind the MaxCut; the largest size subset of edges (cutset) such that vertices at the end of these edges belongto a different partition of the vertices into two disjointsubsets [12].

To train the QAOA, we simply optimize the expecta-tion value of our cost Hamiltonian with respect to ourparameterized output to find (approximately) optimalparameters; η∗,γ∗ = argminη,γL(η,γ) where L(η,γ) =

〈Ψηγ | HC |Ψηγ〉 is our loss. Once trained, we use theQPU to sample the probability distribution of measure-ments of the parameterized output state at optimal an-gles in the standard basis, x ∼ p(x) = | 〈x|Ψη∗γ∗〉 |2,and pick the lowest energy bitstring from those samplesas our approximate optimum found by the QAOA.

Let us walkthrough how to implement such a basicQAOA in TFQ. The first step is to generate an instance

of the MaxCut problem. For this tutorial we generatea random 3-regular graph with 10 nodes with NetworkX[108].

# generate a 3-regular graph with 10 nodes

maxcut_graph = nx.random_regular_graph(n=10,d=3)

The next step is to allocate 10 qubits, to definethe Hadamard layer generating the initial superpositionstate, the mixing Hamiltonian HM and the cost Hamil-tonian HP.

# define 10 qubits

cirq_qubits = cirq.GridQubit.rect(1, 10)

# create layer of hadamards to initialize the

superposition state of all computational

states

hadamard_circuit = cirq.Circuit ()

for node in maxcut_graph.nodes():

qubit = cirq_qubits[node]

hadamard_circuit.append(cirq.H.on(qubit))

# define the two parameters for one block of

QAOA

qaoa_parameters = sympy.symbols(’a b’)

# define the the mixing and the cost

Hamiltonians

mixing_ham = 0

for node in maxcut_graph.nodes():

qubit = cirq_qubits[node]

mixing_ham += cirq.PauliString(cirq.X(qubit)

)

cost_ham = maxcut_graph.number_of_edges ()/2

for edge in maxcut_graph.edges():

qubit1 = cirq_qubits[edge [0]]

qubit2 = cirq_qubits[edge [1]]

cost_ham += cirq.PauliString (1/2*( cirq.Z(

qubit1)*cirq.Z(qubit2)))

With this, we generate the unitaries representing thequantum circuit

# generate the qaoa circuit

qaoa_circuit = tfq.util.exponential(operators =

[cost_ham , mixing_ham], coefficients =

qaoa_parameters)

Subsequently, we use these ingredients to build ourmodel. We note here in this case that QAOA has noinput data and labels, as we have mapped our graph tothe QAOA circuit. To use the TFQ framework we specifythe Hadamard circuit as input and convert it to a TFQtensor. We may then construct a tf.keras model usingour QAOA circuit and cost in a TFQ PQC layer, anduse a single instance sample for training the variationalparameters of the QAOA with the Hadamard gates as aninput layer and a target value of 0 for our loss function,as this is the theoretical minimum of this optimizationproblem.

This translates into the following code:

# define the model and training data

model_circuit , model_readout = qaoa_circuit ,

cost_ham

input_ = [hadamard_circuit]

input_ = tfq.convert_to_tensor(input_)

27

optimum = [0]

# Build the Keras model.

optimum = np.array(optimum)

model = tf.keras.Sequential ()

model.add(tf.keras.layers.Input(shape =(), dtype=

tf.dtypes.string))

model.add(tfq.layers.PQC(model_circuit ,

model_readout))

To optimize the parameters of the ansatz state, we usea classical optimization routine. In general, it would bepossible to use pre-calculated parameters [109] or to im-plement for QAOA tailored optimization routines [110].For this tutorial, we choose the Adam optimizer imple-mented in TensorFlow. We also choose the mean absoluteerror as our loss function.

model.compile(loss=tf.keras.losses.

mean_absolute_error , optimizer=tf.keras.

optimizers.Adam())

history = model.fit(input_ ,optimum ,epochs =1000,

verbose =1)

V. ADVANCED QUANTUM APPLICATIONS

The following applications represent how we have ap-plied TFQ to accelerate their discovery of new quantumalgorithms. The examples presented in this section arenewer research as compared to the previous section, assuch they have not had as much time for feedback fromthe community. We include these here as they are demon-stration of the sort of advanced QML research that can beaccomplished by combining several building blocks pro-vided in TFQ. As many of these examples involve thebuilding and training of hybrid quantum-classical modelsand advanced optimizers, such research would be muchmore difficult to implement without TFQ. In our re-searchers’ experience, the performance gains and the easeof use of TFQ decreased the time-to-working-prototypefrom weeks to days or even hours when it is compared tousing alternative tools.

Finally, as we would like to provide users with ad-vanced examples to see TFQ in action for research use-cases beyond basic implementations, along with the ex-amples presented in this section are several notebooksaccessible on Github:

github.com/tensorflow/quantum/tree/research

We encourage readers to read the section below for anoverview of the theory and use of TFQ functions andwould encourage avid readers who want to experimentwith the code to visit the full notebooks.

A. Meta-learning for Variational QuantumOptimization


research/metalearning qaoa/metalearning qaoa.ipynb

Figure 21. Quantum-classical computational graph for themeta-learning optimization of the recurrent neural network(RNN) optimizer and a quantum neural network (QNN) overseveral optimization iterations. The hidden state of the RNNis represented by h, we represent the flow of data used to eval-uate the meta-learning loss function. This meta loss functionL is a functional of the history of expectation value estimatesamples y = ytTt=1, it is not directly dependent on the RNNparameters ϕ. TFQ’s hybrid quantum-classical backpropaga-tion then becomes key to train the RNN to learn to optimizethe QNN, which in our particular example was the QAOA.Figure taken from [45], originally inspired from [104].

In section IV C, we have shown how to implement basicQAOA in TFQ and optimize it with a gradient-basedoptimizer, we can now explore how to leverage classicalneural networks to optimize QAOA parameters. To runthis example in the browser via Colab, follow the link:

In recent works, the use of classical recurrent neuralnetworks to learn to optimize the parameters [45] (or gra-dient descent hyperparameters [111]) was proposed. Asthe choice of parameters after each iteration of quantum-classical optimization can be seen as the task of generat-ing a sequence of parameters which converges rapidly toan approximate optimum of the landscape, we can use atype of classical neural network that is naturally suitedto generate sequential data, namely, recurrent neural net-works. This technique was derived from work by Deep-Mind [104] for optimization of classical neural networksand was extended to be applied to quantum neural net-works [45].

The application of such classical learning to learn tech-niques to quantum neural networks was first proposed in[45]. In this work, an RNN (long short term memory;LSTM) gets fed the parameters of the current iterationand the value of the expectation of the cost Hamiltonianof the QAOA, as depicted in Fig. 21. More precisely, theRNN receives as input the previous QNN query’s esti-mated cost function expectation yt ∼ p(y|θt), where ytis the estimate of 〈H〉t, as well as the parameters forwhich the QNN was evaluated θt. The RNN at this timestep also receives information stored in its internal hid-den state from the previous time step ht. The RNN itselfhas trainable parameters ϕ, and hence it applies the pa-rameterized mapping

ht+1,θt+1 = RNNϕ(ht,θt, yt) (32)

https://github.com/tensorflow/quantum/tree/research

https://github.com/tensorflow/quantum/blob/research/metalearning_qaoa/metalearning_qaoa.ipynb

28

which generates a new suggestion for the QNN parame-ters as well as a new hidden state. Once this new set ofQNN parameters is suggested, the RNN sends it to theQPU for evaluation and the loop continues.

The RNN is trained over random instances of QAOAproblems selected from an ensemble of possible QAOAMaxCut problems. See the notebook for full details onthe meta-training dataset of sampled problems.

The loss function we chose for our experiments is theobserved improvement at each time step, summed overthe history of the optimization:

L(ϕ) = Ef,y

[T∑t=1

minf(θt)−minj<t[f(θj)], 0], (33)

The observed improvement at time step t is given by thedifference between the proposed value, f(θt), and thebest value obtained over the history of the optimizationuntil that point, minj<t[f(θj)].

In our particular example in this section, we will con-sider a time horizon of 5 time steps, hence the RNN willhave to learn to very rapidly approximately optimize theparameters of the QAOA. Results are featured in Fig. 22.The details of the implementation are available in the Co-lab. Here is an overview of the problem that was tackledand the TFQ features that facilitated this implementa-tion:

Target problems:

1. Learning to learn with quantum neural networksvia classical neural networks

2. Building a neural-network-based optimizer forQAOA

3. Lowering the number of iterations needed to opti-mize QAOA


1. Hybrid quantum-classical networks and hybridbackpropagation

2. Batching training over quantum data (QAOA prob-lem instances)

3. Integration with TF for the classical RNN

B. Vanishing Gradients and Adaptive LayerwiseTraining Strategies

1. Random Quantum Circuits and Barren Plateaus

When using parameterized quantum circuits for alearning task, inevitably one must choose an initial con-figuration and training strategy that is compatible withthat initialization. In contrast to problems more knownstructure, such as specific quantum simulation tasks [24]or optimizations [112], the structure of circuits used for

Figure 22. The path chosen by the RNN optimizer on a 12-qubit MaxCut problem after being trained on a set of random10-qubit MaxCut problems. We see that the neural networklearned to generalize its heuristic to larger system sizes, asoriginally pointed out in [45].

learning may need to be more adaptive or general to en-compass unknown data distributions. In classical ma-chine learning, this problem can be partially addressedby using a network with sufficient expressive power andrandom initialization of the network parameters.

Unfortunately, it has been proven that due to funda-mental limits on quantum readout complexity in com-bination with the geometry of the quantum space, anexact duplication of this strategy is doomed to fail [113].In particular, in analog to the vanishing gradients prob-lem that has plagued deep classical networks and histor-ically slowed their progress [114], an exacerbated versionof this problem appears in quantum circuits of sufficientdepth that are randomly initialized. This problem, alsoknown as the problem of barren plateaus, refers to theoverwhelming volume of quantum space with an expo-nentially small gradient, making straightforward trainingimpossible if on enters one of these dead regions. The rateof this vanishing increases exponentially with the num-ber of qubits and depends on whether the cost functionis global or local [115]. While strategies have been devel-oped to deal with the challenges of vanishing gradientsclassically [116], the combination of differences in read-out complexity and other constraints of unitarity makedirect implementation of these fixes challenging. In par-ticular, the readout of information from a quantum sys-tem has a complexity of O(1/εα) where ε is the desiredprecision, and α is some small integer, while the complex-ity of the same task classically often scales as O(log 1/ε)[117]. This means that for a vanishingly small gradi-ent (e.g. 10−7), a classical algorithm can easily obtainat least some signal, while a quantum-classical one maydiffuse essentially randomly until ∼ 1014 samples havebeen taken. This has fundamental consequences for themethods one uses to train, as we detail below. The re-quirement on depth to reach these plateaus is only thata portion of the circuit approximates a unitary 2−designwhich can occur at a depth occurring at O(n1/d) wheren is the number of qubits and d is the dimension of theconnectivity of the quantum circuit, possibly requiring aslittle depth as O(log(n)) in the all-to-all case [118]. Onemay imagine that a solution to this problem could be

29

to simply initialize a circuit to the identity to avoid thisproblem, but this incurs some subtle challenges. First,such a fixed initialization will tend to bias results on gen-eral datasets. This challenge has been studied in thecontext of more general block initialization of identityschemes [119].

Perhaps the more insidious way this problem arises, isthat training with a method like stochastic gradient de-scent (or sophisticated variants like Adam) on the entirenetwork, can accidentally lead one onto a barren plateauif the learning rate is too high. This is due to the factthat the barren plateaus argument is one of volume ofspace and quantum-classical information exchange, andrandom diffusion in parameter space will tend to leadone onto a plateau. This means that even the mostclever initialization can be thwarted by the impact ofthis phenomenon on the training process. In practice thisseverely limits learning rate and hence training efficiencyof QNNs.

For this reason, one can consider training on subsets ofthe network which do not have the ability to completelyrandomize during a random walk. This layerwise learningstrategy allows one to use larger learning rates and im-proves training efficiency in quantum circuits [120]. Weadvocate the use of these strategies in combination withappropriately designed local cost functions in order to cir-cumvent the dramatically worse problems with objectiveslike fidelity [113, 115]. TFQ has been designed to makeexperimenting with both of these strategies straightfor-ward for the user, as we now document. For an exampleof barren plateaus, see the notebook at the following link:

docs/tutorials/barren plateaus.ipynb

2. Layerwise quantum circuit learning

So far, the network training methods demonstrated insection IV have focused on simultaneous optimizationof all network parameters, or end-to-end training.As alluded to in the section on the Barren Plateauseffect (V B), this type of strategy, when combined witha network of sufficient depth, can force reduced learningrates, even with clever initialization. While this maynot be the optimal strategy for every realization of aquantum network, TFQ is designed to easily facilitatetesting of this idea in conjunction with different costfunctions to enhance efficiency. An alternative strategythat has been beneficial is layerwise learning (LL) [120], where the number of trained parameters is alteredon the fly. In this section, we will learn to alter thearchitecture of a circuit while it trains, and restrictattention to blocks of size insufficient to randomize ontoa plateau. Among other things, this type of learningstrategy can help us avoid initialization or driftingthroughout training of our QNN onto a barren plateau[113, 120]. In [121], it is also shown that gradient basedalgorithms are more successful in finding global minima

with overparameterized circuits, and that shallow cir-cuits approach this limit when increasing in size. It isnot necessarily clear when this transition occurs, so LLis a cost-efficient strategy to approach good local minima.

Target problems:

1. Dynamically building circuits for arbitrary learningtasks

2. Manipulating circuit structure and parameters dur-ing training

3. Reducing the number of trained parameters andcircuit depth

4. Avoid initialization on or drifting to a barrenplateau


1. Parameterized circuit layers2. Keras weight manipulation interface3. Parameter shift differentiator for exact gradient

computation


research/layerwise learning/layerwise learning.ipynb

As an example to show how this functionality may beexplored in TFQ, we will look at randomly generatedlayers as shown in section V B, where one layer consistsof a randomly chosen rotation gate around the X, Y , orZ axis on each qubit, followed by a ladder of CZ gatesover all qubits.

def create_layer(qubits , layer_id):

# create symbols for trainable parameters

symbols = [

sympy.Symbol(

f’layer_id-str(i)’)

for i in range(len(qubits))]

# build layer from random gates

gates = [

random.choice ([

cirq.Rx , cirq.Ry , cirq.Rz])(

symbols[i])(q)

for i, q in enumerate(qubits)]

# add connections between qubits

for control , target in zip(

qubits , qubits [1:]):

gates.append(cirq.CZ(control , target))

return gates , symbols

We assume that we don’t know the ideal circuit struc-ture to solve our learning problem, so we start with theshallowest circuit possible and let our model grow fromthere. In this case we start with one initial layer, andadd a new layer after it has trained for 10 epochs. First,we need to specify some variables:

# number of qubits and layers in our circuit

n_qubits = 6

n_layers = 8

https://github.com/tensorflow/quantum/blob/master/docs/tutorials/barren_plateaus.ipynb

https://github.com/tensorflow/quantum/blob/research/layerwise_learning/layerwise_learning.ipynb

30

# define data and readout qubits

data_qubits = cirq.GridQubit.rect(1, n_qubits)

readout = cirq.GridQubit (0, n_qubits -1)

readout_op = cirq.Z(readout)

# symbols to parametrize circuit

symbols = []

layers = []

weights = []

We use the same training data as specified in the TFQMNIST classifier example notebook available in the TFQGithub repository, which encodes a downsampled versionof the digits into binary vectors. Ones in these vectorsare encoded as local X gates on the corresponding qubitin the register, as shown in [23]. For this reason, we alsouse the readout procedure specified in that work wherea sequence of XHX gates is added to the readout qubitat the end of the circuit. Now we train the circuit, layerby layer:

for layer_id in range(n_layers):


layer , layer_symbols = create_layer(

data_qubits , f’layer_layer_id’)

layers.append(layer)

circuit += layers

symbols += layer_symbols

# set up the readout qubit

circuit.append(cirq.X(readout))

circuit.append(cirq.H(readout))

circuit.append(cirq.X(readout))

readout_op = cirq.Z(readout)

# create the Keras model

model = tf.keras.Sequential ()

model.add(tf.keras.layers.Input(

shape =(), dtype=tf.dtypes.string))

model.add(tfq.layers.PQC(

model_circuit=circuit ,

operators=readout_op ,

differentiator=ParameterShift (),

initializer=tf.keras.initializers.Zeros)

)

model.compile(

loss=tf.keras.losses.squared_hinge ,


learning_rate =0.01))

# Update model parameters and add

# new 0 parameters for new layers.

model.set_weights(

[np.pad(weights , (n_qubits , 0))])

model.fit(x_train ,

y_train ,

batch_size =128,

epochs =10,

verbose=1,

validation_data =(x_test , y_test))

qnn_results = model.evaluate(x_test , y_test)

# store weights after training a layer

weights = model.get_weights ()[0]

In general, one can choose many different configura-

tions of how many layers should be trained in each step.One can also control which layers are trained by manip-ulating the symbols we feed into the circuit and keepingtrack of the weights of previous layers. The number oflayers, layers trained at a time, epochs spent on a layer,and learning rate are all hyperparameters whose optimalvalues depend on both the data and structure of the cir-cuits being used for learning. This example is meant toexemplify how TFQ can be used to easily explore thesechoices to maximize the efficiency and efficacy of training.See our notebook linked above for the complete imple-mentation of these features. Using TFQ to explore thistype of learning strategy relieves us of manually imple-menting training procedures and optimizers, and autod-ifferentiation with the parameter shift rule. It also lets usreadily use the rich functionality provided by TensorFlowand Keras. Implementing and testing all of the function-ality needed in this project by hand could take up to aweek, whereas all this effort reduces to a couple of lines ofcode with TFQ as shown in the notebook. Additionally,it lets us speed up training by using the integrated qsimsimulator as shown in section II F 4. Last but not least,TFQ provides a thoroughly tested and maintained QMLframework which greatly enhances the reproducibility ofour research.

C. Hamiltonian Learning with Quantum GraphRecurrent Neural Networks

1. Motivation: Learning Quantum Dynamics with aQuantum Representation

Quantum simulation of time evolution was one of theoriginal envisioned applications of quantum computerswhen they were first proposed by Feynman [9]. Sincethen, quantum time evolution simulation methods haveseen several waves of great progress, from the early daysof Trotter-Suzuki methods, to methods of qubitizationand randomized compiling [78], and finally recently withsome methods for quantum variational methods for ap-proximate time evolution [122].

The reason that quantum simulation has been sucha focus of the quantum computing community is be-cause we have some indications to believe that quan-tum computers can demonstrate a quantum advantagewhen evolving quantum states through unitary time evo-lution; the classical simulation overhead scales exponen-tially with the depth of the time evolution.

As such, it is natural to consider if such a potentialquantum simulation advantage can be extended to therealm of quantum machine learning as an inverse prob-lem, that is, given access to some black-box dynamics,can we learn a Hamiltonian such that time evolution un-der this Hamiltonian replicates the unknown dynamics.This is known as the problem of Hamiltonian learning,or quantum dynamics learning, which has been studiedin the literature [60, 123]. Here, we use a Quantum Neu-

31

ral Network-based approach to learn the Hamiltonian ofa quantum dynamical process, given access to quantumstates at various time steps.

As was pointed out in the barren plateaus section V B,attempting to do QML with no prior on the physics ofthe system or no imposed structure of the ansatz hits thequantum version of the no free lunch theorem; the net-work has too high a capacity for the problem at hand andis thus hard to train, as evidenced by its vanishing gra-dients. Here, instead, we use a highly structured ansatz,from work featured in [124]. First of all, given that weknow we are trying to replicate quantum dynamics, wecan structure our ansatz to be based on Trotter-Suzukievolution [77] of a learnable parameterized Hamiltonian.This effectively performs a form of parameter-tying inour ansatz between several layers representing time evo-lution. In a previous example on quantum convolutionalnetworks IV A 1, we performed parameter tying for spa-tial translation invariance, whereas here, we will assumethe dynamics remain constant through time, and performparameter tying across time, hence it is akin to a quan-tum form of recurrent neural networks (RNN). More pre-cisely, as it is a parameterization of a Hamiltonian evo-lution, it is akin to a quantum form of recently proposedmodels in classical machine learning called Hamiltonianneural networks [125].

Beyond the quantum RNN form, we can impose furtherstructure. We can consider a scenario where we know wehave a one-dimensional quantum many-body system. AsHamiltonians of physical have local couplings, we canuse our prior assumptions of locality in the Hamiltonianand encode this as a graph-based parameterization of theHamiltonian. As we will see below, by using a QuantumGraph Recurrent Neural network [124] implemented inTFQ, we will be able to learn the effective Hamiltoniantopology and coupling strengths quite accurately, simplyfrom having access to quantum states at different timesand employing mini-batched gradient-based training.

Before we proceed, it is worth mentioning that the ap-proach featured in this section is quite different from thelearning of quantum dynamics using a classical RNN fea-ture in previous example section IV B. As sampling theoutput of a quantum simulation at different times can be-come exponentially hard, we can imagine that for largesystems, the Quantum RNN dynamics learning approachcould have primacy over the classical RNN approach,thus potentially demonstrating a quantum advantage ofQML over classical ML for this problem.

Target problems:

1. Preparing low-energy states of a quantum system2. Learning Quantum Dynamics using a Quantum

Neural Network Model


1. Quantum compilation of exponentials of Hamilto-nians

2. Training multi-layered quantum neural networkswith shared parameters

3. Batching QNN training data (input-output pairsand time steps) for supervised learning of quantumunitary map

2. Implementation

Please see the tutorial notebook for full code details:

research/qgrnn ising/qgrnn ising.ipynb

Here we provide an overview of our implementa-tion. We can define a general Quantum Graph Neu-ral Network as a repeating sequence of exponentialsof a Hamiltonian defined on a graph, Uqgnn(η,θ) =∏Pp=1

[∏Qq=1 e

−iηpqHq(θ)]

where the Hq(θ) are generally2-local Hamiltonians whose coupling topology is that ofan assumed graph structure.

In our Hamiltonian learning problem, we aim to learna target Htarget which will be an Ising model Hamiltonianwith Jjk as couplings and Bv for site bias term of each

spin, i.e., Htarget =∑j,k JjkZjZk +

∑v BvZv +

∑v Xv,

given access to pairs of states at different times that weresubjected to the target time evolution operator U(T ) =

e−iHtargetT .We will use a recurrent form of QGNN, using Hamil-

tonian generators H1(θ) =∑v∈V αvXv and H2(θ) =∑

j,k∈E θjkZjZk +∑v∈V φvZv, with trainable param-

eters1 θjk, φv, αv, for our choice of graph structureprior G = V, E. The QGRNN is then resembles ap-plying a Trotterized time evolution of a parameterizedIsing Hamiltonian H(θ) = H1(θ)+ H2(θ) where P is thenumber of Trotter steps. This is a good parameteriza-tion to learn the effective Hamiltonian of the black-boxdynamics as we know from quantum simulation theorythat Trotterized time evolution operators can closely ap-proximate true dynamics in the limit of |ηjk| → 0 whileP →∞.

For our TFQ software implementation, we can initial-ize Ising model & QGRNN model parameters as randomvalues on a graph. It is very easy to construct this kind ofgraph structure Hamiltonian by using Python NetworkXlibrary.

N = 6

dt = 0.01

# Target Ising model parameters

G_ising = nx.cycle_graph(N)

ising_w = [dt * np.random.random () for _ in G.

edges]

ising_b = [dt * np.random.random () for _ in G.

nodes]

Because the target Hamiltonian and its nearest-neighborgraph structure is unknown to the QGRNN, we need to

1 For simplicity we set αv to constant 1’s in this example.

https://github.com/tensorflow/quantum/blob/research/qgrnn_ising/qgrnn_ising.ipynb

32

initialize a new random graph prior for our QGRNN. Inthis example we will use a random 4-regular graph witha cycle as our prior. Here, params is a list of trainable

parameters of the QGRNN.

# QGRNN model parameters

G_qgrnn = nx.random_regular_graph(n=N, d=4)

qgrnn_w = [dt] * len(G_qgrnn.edges)

qgrnn_b = [dt] * len(G_qgrnn.nodes)

theta = [’theta’.format(e) for e in G.edges]

phi = [’phi’.format(v) for v in G.nodes]

params = theta + phi

Now that we have the graph structure, weights of edges& nodes, we can construct Cirq-based Hamiltonian oper-ator which can be directly calculated in Cirq and TFQ.To create a Hamiltonian by using cirq.PauliSum ’s or

cirq.PauliString ’s, we need to assign appropriate qubits

on them. Let’s assume Hamiltonian() is the Hamilto-nian preparation function to generate cost Hamiltonianfrom interaction weights and mixer Hamiltonian frombias terms. We can bring qubits of Ising & QGRNNmodels by using cirq.GridQubit .

qubits_ising = cirq.GridQubit.rect(1, N)

qubits_qgrnn = cirq.GridQubit.rect(1, N, 0, N)

ising_cost , ising_mixer = Hamiltonian(

G_ising , ising_w , ising_b , qubits_ising)

qgrnn_cost , qgrnn_mixer = Hamiltonian(

G_qgrnn , qgrnn_w , qgrnn_b , qubits_qgrnn)

To train the QGRNN, we need to create an ensembleof states which are to be subjected to the unknown dy-namics. We chose to prepare a low-energy states by firstperforming a Variational Quantum Eigensolver (VQE)[28] optimization to obtain an approximate ground state.Following this, we can apply different amounts of simu-lated time evolutions onto to this state to obtain a varieddataset. This emulates having a physical system in a low-energy state and randomly picking the state at differenttimes. First things first, let us build a VQE model

def VQE(H_target , q)

# Parameters

x = [’x’.format(i) for i, _ in enumerate(q)]

z = [’z’.format(i) for i, _ in enumerate(q)]

symbols = x + z


circuit.append(cirq.X(q_)**sympy.Symbol(x_)

for q_, x_ in zip(q, x))

circuit.append(cirq.Z(q_)**sympy.Symbol(z_)

for q_, z_ in zip(q, z))

Now that we have a parameterized quantum circuit,we can minimize the expectation value of given Hamil-tonian. Again, we can construct a Keras model withExpectation . Because the output expectation values are

calculated respectively, we need to sum them up at thelast.

circuit_input = tf.keras.Input(

shape =(), dtype=tf.string)

output = tfq.layers.Expectation ()(

circuit_input ,

symbol_names=symbols ,

operators=tfq.convert_to_tensor(

[H_target ]))

output = tf.math.reduce_sum(

output , axis=-1, keepdims=True)

Finally, we can get approximated lowest energy statesof the VQE model by compiling and training the aboveKeras model.2


inputs=circuit_input , outputs=output)

adam = tf.keras.optimizers.Adam(

learning_rate =0.05)

low_bound = -np.sum(np.abs(ising_w + ising_b))

- N

inputs = tfq.convert_to_tensor ([ circuit ])

outputs = tf.convert_to_tensor ([[ low_bound ]])

model.compile(optimizer=adam , loss=’mse’)

model.fit(x=inputs , y=outputs ,

batch_size =1, epochs =100)

params = model.get_weights ()[0]

res = k: v for k, v in zip(symbols , params)

return cirq.resolve_parameters(circuit , res)

Now that the VQE function is built, we can generatethe initial quantum data input with the low energy statesnear to the ground state of the target Hamiltonian forboth our data and our input state to our QGRNN.

H_target = ising_cost + ising_mixer

low_energy_ising = VQE(H_target , qubits_ising)

low_energy_qgrnn = VQE(H_target , qubits_qgrnn)

The QGRNN is fed the same input data as thetrue process. We will use gradient-based training overminibatches of randomized timesteps chosen for ourQGRNN and the target quantum evolution. We willthus need to aggregate the results among the differenttimestep evolutions to train the QGRNN model. To cre-ate these time evolution exponentials, we can use thetfq.util.exponential function to exponentiate our target

and QGRNN Hamiltonians3

exp_ising_cost = tfq.util.exponential(

operators=ising_cost)

exp_ising_mix = tfq.util.exponential(

operators=ising_mixer)

exp_qgrnn_cost = tfq.util.exponential(

operators=qgrnn_cost , coefficients=params)

exp_qgrnn_mix = tfq.util.exponential(

operators=qgrnn_mixer)

Here we randomly pick the 15 timesteps and apply theTrotterized time evolution operators using our above con-structed exponentials. We can have a quantum dataset(|ψTj 〉, |φTj 〉)|j = 1..M where M is the number of

2 Here is some tip for training. Setting the output true value totheoretical lower bound, we can minimize our expectation valuein the Keras model fit framework. That is, we can use the in-equality 〈Htarget〉 =

∑jk Jjk〈ZjZk〉+

∑v Bv〈Zv〉+

∑v〈Xv〉 ≥∑

jk(−)|Jjk| −∑

v |Bv | −N .3 Here, we use the terminology cost and mixer Hamiltonians as the

Trotterization of an Ising model time evolution is very similar toa QAOA, and thus we borrow nomenclature from this analogousQNN.

33

data, or batch size (in our case we chose M = 15),

|ψTj 〉 = U jtarget|ψ0〉 and |φTj 〉 = U jqgrnn|ψ0〉.

def trotterize(inp , depth , cost , mix):

add = tfq.layers.AddCircuit ()

outp = add(cirq.Circuit (), append=inp)

for _ in range(depth):

outp = add(outp , append=cost)

outp = add(outp , append=mix)

return outp

batch_size = 15

T = np.random.uniform(0, T_max , batch_size)

depth = [int(t/dt)+1 for t in T]

true_states = []

pred_states = []

for P in depth:

true_states.append(

trotterize(low_energy_ising , P,

exp_ising_cost , exp_ising_mix))

pred_states.append(

trotterize(low_energy_qgrnn , P,

exp_qgrnn_cost , exp_qgrnn_mix))

Now we have both quantum data from (1) the truetime evolution of the target Ising model and (2) the pre-dicted data state from the QGRNN. In order to maximizeoverlap between these two wavefunctions, we can aim tomaximize the fidelity between the true state and the stateoutput by the QGRNN, averaged over random choicesof time evolution. To evaluate the fidelity between twoquantum states (say |A〉 and |B〉) on a quantum com-puter, a well-known approach is to perform the swap test[126]. In the swap test, an additional observer qubit isused, by putting this qubit in a superposition and usingit as control for a Fredkin gate (controlled-SWAP), fol-lowed by a Hadamard on the observer qubit, the observerqubit’s expectation value in the encodes the fidelity of thetwo states, | 〈A|B〉 |2. Thus, right after Fidelity Swap

Test, we can measure the swap test qubit with Pauli Zoperator with Expectation , 〈Ztest〉, and then we can cal-

culate the average of fidelity (inner product) between abatch of two sets of quantum data states, which can beused as our classical loss function in TensorFlow.

# Check class SwapTestFidelity in the notebook.

fidelity = SwapTestFidelity(

qubits_ising , qubits_qgrnn , batch_size)

state_true = tf.keras.Input(shape =(),

dtype=tf.string)

state_pred = tf.keras.Input(shape =(),

dtype=tf.string)

fid_output = fidelity(state_true , state_pred)

fid_output = tfq.layers.Expectation ()(

fid_output ,

symbol_names=symbols ,

operators=fidelity.op)


inputs =[state_true , state_pred],

outputs=fid_output)

Here, we introduce the average fidelity and implementthis with custom Keras loss function.

L(θ, φ) = 1− 1B

∑Bj=1 |〈ψTj |φTj 〉|2

= 1− 1B

∑Bj=1〈Ztest〉j

def average_fidelity(y_true , y_pred):

return 1 - K.mean(y_pred)

Again, we can use Keras model fit. To feed a batch ofquantum data, we can use tf.concat because the quan-tum circuits are already in tf.Tensor . In this case, weknow that the lower bound of fidelity is 0, but the y_true

is not used in our custom loss function average_fidelity .

We set learning rate of Adam optimizer to 0.05.

y_true = tf.concat(true_states , axis =0)

y_pred = tf.concat(pred_states , axis =0)

model.compile(

loss=average_fidelity ,


learning_rate =0.05))

model.fit(x=[y_true , y_pred],

y=tf.zeros ([batch_size , 1]),

batch_size=batch_size ,

epochs =500)

The full results are displayed in the notebook, wesee for this example that our time-randomized gradient-based optimization of our parameterized class of quan-tum Hamiltonian evolution ends up learning the targetHamiltonian and its couplings to a high degree of accu-racy.

Figure 23. Left: True (target) Ising Hamiltonian with edgesrepresenting couplings and nodes representing biases. Middle:randomly chosen initial graph structure and parameter valuesfor the QGRNN. Right: learned Hamiltonian from the trainedQGRNN.

D. Generative Modelling of Quantum MixedStates with Hybrid Quantum-Probabilistic Models

1. Background

Often in quantum mechanical systems, one encountersso-called mixed states, which can be understood as proba-bilistic mixtures over pure quantum states [127]. Typicalcases where such mixed states arise are when looking atfinite-temperature quantum systems, open quantum sys-tems, and subsystems of pure quantum mechanical sys-tems. As the ability to model mixed states are thus keyto understanding quantum mechanical systems, in thissection, we focus on models to learn to represent andmimic the statistics of quantum mixed states.

As mixed states are a combination of a classical prob-ability distribution and quantum wavefunctions, their

34

statistics can exhibit both classical and quantum formsof correlations (e.g., entanglement). As such, if we wishto learn a representation of such mixed state which cangeneratively model its statistics, one can expect that ahybrid representation combining classical probabilisticmodels and quantum neural networks can be an ideal.Such a decomposition is ideal for near-term noisy de-vices, as it reduces the overhead of representation on thequantum device, leading to lower-depth quantum neu-ral networks. Furthermore, the quantum layers providea valuable addition in representation power to the clas-sical probabilistic model, as they allow the addition ofquantum correlations to the model.

Thus, in this section, we cover some examples whereone learns to generatively model mixed states using ahybrid quantum-probabilistic model [48]. Such modelsuse a parameterized ansatz of the form

ρθφ = U(φ)ρθU†(φ), ρθ =

∑x

pθ(x) |x〉〈x| (34)

where U(φ) is a unitary quantum neural network withparameters φ and pθ(x) is a classical probabilistic modelwith parameters θ. We call ρθφ the visible state andρθ the latent state. Note the latent state is effectively aclassical distribution over the standard basis states, andits only parameters are those of the classical probabilisticmodel.

As we shall see below, there are methods to train bothnetworks simultaneously. In terms of software implemen-tation, as we have to combine probabilistic models andquantum neural networks, we will use a combination ofTensorFlow Probability [128] along with TFQ. A firstclass of application we will consider is the task of gen-erating a thermal state of a quantum system given itsHamiltonian. A second set of applications is given sev-eral copies of a mixed state, learn a generative modelwhich replicates the statistics of the state.

Target problems:

1. Incorporating probabilistic and quantum models2. Variational Quantum Simulation of Quantum

Thermal States3. Learning to generatively model mixed states from

data


1. Integration with TF Probability [128]2. Sample-based simulation of quantum circuits3. Parameter shift differentiator for gradient compu-

tation

2. Variational Quantum Thermalizer

Full notebook of the implementations below are avail-able at:

research/vqt qmhl/vqt qmhl.ipynb

Consider the task of preparing a thermal state: givena Hamiltonian H and a target inverse temperature β =1/T , we want to variationally approximate the state

σβ = 1Zβ e

−βH , Zβ = tr(e−βH), (35)

using a state of the form presented in equation (34).That is, we aim to find a value of the hybrid modelparameters θ∗,φ∗ such that ρθ∗φ∗ ≈ σβ . In orderto converge to this approximation via optimization ofthe parameters, we need a loss function to optimizewhich quantifies statistical distance between these quan-tum mixed states. If we aim to minimize the discrep-ancy between states in terms of quantum relative en-tropy D(ρθφ‖σβ) = −S(ρθφ) − tr(ρθφ log σβ), (whereS(ρθφ) = −tr(ρθφ log ρθφ) is the entropy), then, as de-scribed in the full paper [57] we can equivalently minimizethe free energy4, and hence use it as our loss function:

Lfe(θ,φ) = βtr(ρθφH)− S(ρθφ). (36)

The first term is simply the expectation value of theenergy of our model, while the second term is the en-tropy. Due to the structure of our quantum-probabilisticmodel, the entropy of the visible state is equal to theentropy of the latent state, which is simply the clas-sical entropy of the distribution, S(ρθφ) = S(ρθ) =−∑x pθ(x) log pθ(x). This comes in quite useful dur-

ing the optimization of our model.Let us implement a simple example of the VQT model

which minimizes free energy to achieve an approximationof the thermal state of a physical system. Let us considera two-dimensional Heisenberg spin chain

Hheis =∑〈ij〉h

JhSi · Sj +∑〈ij〉v

JvSi · Sj (37)

where h (v) denote horizontal (vertical) bonds, while 〈·〉represent nearest-neighbor pairings. First, we define thisHamiltonian on a grid of qubits:

def get_bond(q0, q1):

return cirq.PauliSum.from_pauli_strings ([

cirq.PauliString(cirq.X(q0), cirq.X(q1)),

cirq.PauliString(cirq.Y(q0), cirq.Y(q1)),

cirq.PauliString(cirq.Z(q0), cirq.Z(q1))])

def get_heisenberg_hamiltonian(qubits , jh , jv):

heisenberg = cirq.PauliSum ()

# Apply horizontal bonds

for r in qubits:

for q0, q1 in zip(r, r[1::]):

heisenberg += jh * get_bond(q0, q1)

# Apply vertical bonds

for r0, r1 in zip(qubits , qubits [1::]):

for q0, q1 in zip(r0, r1):

heisenberg += jv * get_bond(q0, q1)

return heisenberg

4 More precisely, the loss function here is in fact the inverse tem-perature multiplied by the free energy, but this detail is of littleimport to our optimization.

https://github.com/tensorflow/quantum/blob/research/vqt_qmhl/vqt_qmhl.ipynb

35

For our QNN, we consider a unitary consisting of generalsingle qubit rotations and powers of controlled-not gates.Our code returns the associated symbols so that thesecan be fed into the Expectation op:

def get_rotation_1q(q, a, b, c):

return cirq.Circuit(

cirq.X(q)**a, cirq.Y(q)**b, cirq.Z(q)**c)

def get_rotation_2q(q0, q1 , a):

return cirq.Circuit(

cirq.CNotPowGate(exponent=a)(q0 , q1))

def get_layer_1q(qubits , layer_num , L_name):

layer_symbols = []


for n, q in enumerate(qubits):

a, b, c = sympy.symbols(

"a2_0_1 b2_0_1 c2_0_1"

.format(layer_num , n, L_name))

layer_symbols += [a, b, c]

circuit += get_rotation_1q(q, a, b, c)

return circuit , layer_symbols

def get_layer_2q(qubits , layer_num , L_name):

layer_symbols = []


for n, (q0, q1) in enumerate(zip(qubits [::2] ,

qubits [1::2])):

a = sympy.symbols("a2_0_1".format(

layer_num , n, L_name))

layer_symbols += [a]

circuit += get_rotation_2q(q0, q1, a)

return circuit , layer_symbols

It will be convenient to consider a particular class ofprobabilistic models where the estimation of the gradientof the model parameters is straightforward to perform.This class of models are called exponential families orenergy-based models (EBMs). If our parameterized prob-abilistic model is an EBM, then it is of the form:

pθ(x) = 1Zθe−Eθ(x), Zθ ≡

∑x∈Ω e

−Eθ(x). (38)

For gradients of the VQT free energy loss functionwith respect to the QNN parameters, ∂φLfe(θ,φ) =

β∂φtr(ρθφH), this is simply the gradient of an expec-tation value, hence we can use TFQ parameter shift gra-dients or any other method for estimating the gradientsof QNN’s outlined in previous sections.

As for gradients of the classical probabilistic model,one can readily derive that they are given by the followingcovariance:

∂θLfe =Ex∼pθ(x)

[(Eθ(x)− βHφ(x))∇θEθ(x)

]−(Ex∼pθ(x)

[Eθ(x)−βHφ(x)

])(Ey∼pθ(y)

[∇θEθ(y)

]),

(39)

where Hφ(x) ≡ 〈x| U†(φ)HU(φ) |x〉 is the expectationvalue of the Hamiltonian at the output of the QNN withthe standard basis element |x〉 as input. Since the energyfunction and its gradients can easily be evaluated as it isa neural network, the above gradient is straightforward to

estimate via sampling of the classical probabilistic modeland the output of the QPU.

For our classical latent probability distribution pθ(x),as a first simple case, we can use the product of inde-pendent Bernoulli distributions pθ(x) =

∏j pθj (xj) =∏

j θxjj (1 − θj)1−xj , where xj ∈ 0, 1 are binary values.

We can re-phrase this distribution as an energy basedmodel to take advantage of equation (V D 2). We movethe parameters into an exponential, so that the probabil-ity of a bitstring becomes pθ(x) =

∏j eθjxj/(eθj + e−θj ).

Since this distribution is a product of independent vari-ables, it is easy to sample from. We can use the Tensor-Flow Probability library [128] to produce samples fromthis distribution, using the tfp.distributions.Bernoulli

object:

def bernoulli_bit_probability(b):

return np.exp(b)/(np.exp(b) + np.exp(-b))

def sample_bernoulli(num_samples , biases):

prob_list = []

for bias in biases.numpy():

prob_list.append(

bernoulli_bit_probability(bias))

latent_dist = tfp.distributions.Bernoulli(

probs=prob_list , dtype=tf.float32)

return latent_dist.sample(num_samples)

After getting samples from our classical probabilisticmodel, we take gradients of our QNN parameters. Be-cause TFQ implements gradients for its expectation ops,we can use tf.GradientTape to obtain these derivatives.

Note that below we used tf.tile to give our Hamiltonianoperator and visible state circuit the correct dimensions:

bitstring_tensor = sample_bernoulli(

num_samples , vqt_biases)

with tf.GradientTape () as tape:

tiled_vqt_model_params = tf.tile(

[vqt_model_params], [num_samples , 1])

sampled_expectations = expectation(

tiled_visible_state ,

vqt_symbol_names ,

tf.concat ([ bitstring_tensor ,

tiled_vqt_model_params], 1),

tiled_H)

energy_losses = beta*sampled_expectations

energy_losses_avg = tf.reduce_mean(

energy_losses)

vqt_model_gradients = tape.gradient(

energy_losses_avg , [vqt_model_params ])

Putting these pieces together, we train our model to out-put thermal states of the 2D Heisenberg model on a 2x2grid. The result after 100 epochs is shown in Fig. 24.

A great advantage of this approach to optimization ofthe probabilistic model is that the partition function Zθdoes not need to be estimated. As such, more generalmore expressive models beyond factorized distributionscan be used for the probabilistic modelling of the la-tent classical distribution. In the advanced section of thenotebook, we show how to use a Boltzmann machine asour energy based model. Boltzmann machines are EBM’swhere for bitstring x ∈ 0, 1n, the energy is defined asE(x) = −

∑i,j wijxixj −

∑i bixi.

36

It is worthy to note that our factorized Bernoulli distri-bution is in fact a special case of the Boltzmann machine,one where only the so-called bias terms in the energyfunction are present: E(x) = −

∑i bixi. In the notebook,

we start with this simpler Bernoulli example of the VQT,the resulting density matrix converges to the known ex-act result for this system, as shown in Fig. 24. We alsoprovide a more advanced example with a general Boltz-mann machine. In the latter example, we picked a fullyvisible, fully-connected classical Ising model energy func-tion, and used MCMC with Metropolis-Hastings [129] tosample from the energy function.

Figure 24. Final density matrix output by the VQT algo-rithm run with a factorized Bernoulli distribution as classicallatent distribution, trained via a gradient-based optimizer.See notebook for details.

3. Quantum Generative Learning from Quantum Data

Now that we have seen how to prepare thermal statesfrom a given Hamiltonian, we can consider how we canlearn to generatively model mixed quantum states usingquantum-probabilistic models in the case where we aregiven several copies of a mixed state rather than a Hamil-tonian. That is, we are given access to a data mixedstate σD, and we would like to find optimal parametersθ∗,φ∗ such that ρθ∗φ∗ ≈ σD, where the model state isof the form described in (34). Furthermore, for reasonsof convenience which will be apparent below, it is usefulto posit that our classical probabilistic model is of theform of an energy-based model as in equation (38).

If we aim to minimize the quantum relative entropybetween the data and our model (in reverse compared tothe VQT) i.e., D(σD‖ρθφ) then it suffices to minimizethe quantum cross entropy as our loss function

Lxe(θ,φ) ≡ −tr(σD log ρθφ).

By using the energy-based form of our latent classicalprobability distribution, as can be readily derived (see[57]), the cross entropy is given by

Lxe(θ,φ) = Ex∼σφ(x)[Eθ(x)] + logZθ,

where σφ(x) ≡ 〈x| U†(φ)σDU(φ) |x〉 is the distributionobtained by feeding the data state σD through the inverseQNN circuit U†(φ) and measuring in the standard basis.

As this is simply an expectation value of a state prop-agated through a QNN, for gradients of the loss with re-spect to QNN parameters we can use standard TFQ dif-ferentiators, such as the parameter shift rule presented insection III. As for the gradient with respect to the EBMparameters, it is given by

∂θLxe(θ,φ) = Ex∼σφ(x)[∇θEθ(x)]−Ey∼pθ(y)[∇θEθ(y)].

Let us implement a scenario where we were given theoutput density matrix from our last VQT example asdata, let us see if we can learn to replicate its statisticsfrom data rather than from the Hamiltonian. For sim-plicity we focus on the Bernoulli EBM defined above. Wecan efficiently sample bitstrings from our learned classicaldistribution and feed them through the learned VQT uni-tary to produce our data state. These VQT parametersare assumed fixed; they represent a quantum datasourcefor QMHL.

We use the same ansatz for our QMHL unitary as wedid for VQT, layers of single qubit rotations and expo-nentiated CNOTs. We apply our QMHL model unitaryto the output of our VQT to produce the pulled-backdata distribution. Then, we take expectation values ofour current best estimate of the modular Hamiltonian:

def get_qmhl_weights_grad_and_biases_grad(

ebm_deriv_expectations , bitstring_list ,

biases):

bare_qmhl_biases_grad = tf.reduce_mean(

ebm_deriv_expectations , 0)

c_qmhl_biases_grad = ebm_biases_derivative_avg

(bitstring_list)

return tf.subtract(bare_qmhl_biases_grad ,

c_qmhl_biases_grad)

Note that we use the tf.GradientTape functionality to

obtain the gradients of the QMHL model unitary. Thisfunctionality is enabled by our TFQ differentiators mod-ule.

The classical model parameters can be updatedaccording to the gradient formula above. See the VQTnotebook for the results of this training.

VI. CLOSING REMARKS

The rapid development of quantum hardware repre-sents an impetus for the equally rapid development ofquantum applications. In October 2017, the Google AIQuantum team and collaborators released its first soft-ware library, OpenFermion, to accelerate the develop-ment of quantum simulation algorithms for chemistryand materials sciences. Likewise, TensorFlow Quantumis intended to accelerate the development of quantum

37

machine learning algorithms for a wide array of applica-tions. Quantum machine learning is a very new and ex-citing field, so we expect the framework to change withthe needs of the research community, and the availabil-ity of new quantum hardware. We have open-sourcedthe framework under the commercially friendly Apache2license, allowing future commercial products to embedTFQ royalty-free. If you would like to participate in ourcommunity, visit us at:

https://github.com/tensorflow/quantum/

VII. ACKNOWLEDGEMENTS

The authors would like to thank Google Research forsupporting this project. M.B., G.V., T.M., and A.J.M.would like to thank the Google Quantum AI Lab for their

hospitality and support during their respective intern-ships, as well as fellow team members for several usefuldiscussions, in particular Matt Harrigan, John Platt, andNicholas Rubin. The authors would like to also thankAchim Kempf from the University of Waterloo for spon-soring this project. M.B. and J.Y. would like to thank theGoogle Brain team for supporting this project, in particu-lar Francois Chollet, Yifei Feng, David Rim, Justin Hong,and Megan Kacholia. G.V. would like to thank Stefan Le-ichenauer, Jack Hidary and the rest of the Quantum@Xteam for support during his Quantum Residency. G.V.acknowledges support from NSERC. D.B. is an AssociateFellow in the CIFAR program on Quantum InformationScience. A.S. and M.S. were supported by the USRAFeynman Quantum Academy funded by the NAMS R&DStudent Program at NASA Ames Research Center and bythe Air Force Research Laboratory (AFRL), NYSTEC-USRA Contract (FA8750-19-3-6101). X, formerly knownas Google[x], is part of the Alphabet family of compa-nies, which includes Google, Verily, Waymo, and others(www.x.company).

[1] K. P. Murphy, Machine learning: a probabilistic perspec-tive (MIT press, 2012).

[2] J. A. Suykens and J. Vandewalle, Neural processing let-ters 9, 293 (1999).

[3] S. Wold, K. Esbensen, and P. Geladi, Chemometricsand intelligent laboratory systems 2, 37 (1987).

[4] A. K. Jain, Pattern recognition letters 31, 651 (2010).[5] Y. LeCun, Y. Bengio, and G. Hinton, nature 521, 436

(2015).[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Ad-

vances in Neural Information Processing Systems 25 ,edited by F. Pereira, C. J. C. Burges, L. Bottou, andK. Q. Weinberger (Curran Associates, Inc., 2012) pp.1097–1105.

[7] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio,Deep learning, Vol. 1 (MIT press Cambridge, 2016).

[8] J. Preskill, arXiv preprint arXiv:1801.00862 (2018).[9] R. P. Feynman, International Journal of Theoretical

Physics 21, 467 (1982).[10] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D.

Johnson, M. Kieferova, I. D. Kivlichan, T. Menke,B. Peropadre, N. P. Sawaya, et al., Chemical reviews119, 10856 (2019).

[11] P. W. Shor, in Proceedings 35th annual symposium onfoundations of computer science (Ieee, 1994) pp. 124–134.

[12] E. Farhi, J. Goldstone, and S. Gutmann, “A quan-tum approximate optimization algorithm,” (2014),arXiv:1411.4028 [quant-ph].

[13] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin,R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A.Buell, et al., Nature 574, 505 (2019).

[14] S. Lloyd, M. Mohseni, and P. Rebentrost, NaturePhysics 10, 631633 (2014).

[15] P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev.Lett. 113, 130503 (2014).

[16] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantumalgorithms for supervised and unsupervised machinelearning,” (2013), arXiv:1307.0411 [quant-ph].

[17] I. Kerenidis and A. Prakash, “Quantum recommenda-tion systems,” (2016), arXiv:1603.08675.

[18] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost,N. Wiebe, and S. Lloyd, Nature 549, 195 (2017).

[19] V. Giovannetti, S. Lloyd, and L. Maccone, Physicalreview letters 100, 160501 (2008).

[20] S. Arunachalam, V. Gheorghiu, T. Jochym-OConnor,M. Mosca, and P. V. Srinivasan, New Journal of Physics17, 123010 (2015).

[21] E. Tang, (2018), 10.1145/3313276.3316310,arXiv:1807.04271.

[22] J. Preskill, “Quantum computing in the NISQ era andbeyond,” (2018), arXiv:1801.00862.

[23] E. Farhi and H. Neven, arXiv preprint arXiv:1802.06002(2018).

[24] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,Nature communications 5, 4213 (2014).

[25] N. Killoran, T. R. Bromley, J. M. Arrazola,M. Schuld, N. Quesada, and S. Lloyd, arXiv preprintarXiv:1806.06871 (2018).

[26] D. Wecker, M. B. Hastings, and M. Troyer, Phys. Rev.A 92, 042303 (2015).

[27] L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D.Lukin, arXiv preprint arXiv:1812.01041 (2018).

[28] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, New Journal of Physics 18, 023023 (2016).

[29] S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rief-fel, D. Venturelli, and R. Biswas, arXiv preprintarXiv:1709.03489 (2017).

[30] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart,V. Stojevic, A. G. Green, and S. Severini, npj QuantumInformation 4, 1 (2018).

https://github.com/tensorflow/quantum/

www.x.company

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

https://doi.org/10.1007/BF02650179

https://doi.org/10.1007/BF02650179

http://arxiv.org/abs/1411.4028

http://dx.doi.org/10.1038/nphys3029

http://dx.doi.org/10.1038/nphys3029

http://dx.doi.org/10.1103/PhysRevLett.113.130503



http://arxiv.org/abs/arXiv:1603.08675

http://dx.doi.org/10.1145/3313276.3316310



https://link.aps.org/doi/10.1103/PhysRevA.92.042303

https://link.aps.org/doi/10.1103/PhysRevA.92.042303

38

[31] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T.Sornborger, and P. J. Coles, Quantum 3, 140 (2019).

[32] M. Schuld and N. Killoran, Physical review letters 122,040504 (2019).

[33] S. McArdle, T. Jones, S. Endo, Y. Li, S. Benjamin, andX. Yuan, arXiv preprint arXiv:1804.03023 (2018).

[34] M. Benedetti, E. Grant, L. Wossnig, and S. Severini,New Journal of Physics 21, 043023 (2019).

[35] B. Nash, V. Gheorghiu, and M. Mosca, arXiv preprintarXiv:1904.01972 (2019).

[36] Z. Jiang, J. McClean, R. Babbush, and H. Neven, arXivpreprint arXiv:1812.08190 (2018).

[37] G. R. Steinbrecher, J. P. Olson, D. Englund, and J. Car-olan, arXiv preprint arXiv:1808.10047 (2018).

[38] M. Fingerhuth, T. Babej, et al., arXiv preprintarXiv:1810.13411 (2018).

[39] R. LaRose, A. Tikku, E. O’Neel-Judy, L. Cincio, andP. J. Coles, arXiv preprint arXiv:1810.10506 (2018).

[40] L. Cincio, Y. Subası, A. T. Sornborger, and P. J. Coles,New Journal of Physics 20, 113022 (2018).

[41] H. Situ, Z. Huang, X. Zou, and S. Zheng, QuantumInformation Processing 18, 230 (2019).

[42] H. Chen, L. Wossnig, S. Severini, H. Neven, andM. Mohseni, arXiv preprint arXiv:1805.08654 (2018).

[43] G. Verdon, M. Broughton, and J. Biamonte, arXivpreprint arXiv:1712.05304 (2017).

[44] M. Mohseni et al., Nature 543, 171 (2017).[45] G. Verdon, M. Broughton, J. R. McClean, K. J. Sung,

R. Babbush, Z. Jiang, H. Neven, and M. Mohseni,“Learning to learn with quantum neural networks viaclassical neural networks,” (2019), arXiv:1907.05415.

[46] R. Sweke, F. Wilde, J. Meyer, M. Schuld, P. K.Fahrmann, B. Meynard-Piganeau, and J. Eisert, arXivpreprint arXiv:1910.01155 (2019).

[47] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, New Journal of Physics 18, 023023 (2016).

[48] G. Verdon, J. M. Arrazola, K. Bradler, and N. Killoran,arXiv preprint arXiv:1902.00409 (2019).

[49] Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel,arXiv preprint arXiv:1904.09314 (2019).

[50] E. Farhi and H. Neven, “Classification with quantumneural networks on near term processors,” (2018),arXiv:1802.06002 [quant-ph].

[51] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab-bush, and H. Neven, Nature communications 9, 1(2018).

[52] I. Cong, S. Choi, and M. D. Lukin, Nature Physics 15,12731278 (2019).

[53] S. Lloyd and C. Weedbrook, Physical review letters 121,040502 (2018).

[54] G. Verdon, J. Pye, and M. Broughton, arXiv preprintarXiv:1806.09729 (2018).

[55] J. Romero and A. Aspuru-Guzik, arXiv preprintarXiv:1901.00848 (2019).

[56] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, C. Blank,K. McKiernan, and N. Killoran, arXiv preprintarXiv:1811.04968 (2018).

[57] G. Verdon, J. Marks, S. Nanda, S. Leichenauer, andJ. Hidary, arXiv preprint arXiv:1910.02071 (2019).

[58] M. Mohseni, A. M. Steinberg, and J. A. Bergou, Phys.Rev. Lett. 93, 200403 (2004).

[59] M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and H. Neven,npj Quantum Information 5, 1 (2019).

[60] J. Carolan, M. Mohseni, J. P. Olson, M. Prabhu,C. Chen, D. Bunandar, M. Y. Niu, N. C. Harris, F. N. C.Wong, M. Hochberg, S. Lloyd, and D. Englund, NaturePhysics (2020).

[61] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin,S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is-ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-enberg, D. Mane, R. Monga, S. Moore, D. Murray,C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan,F. Viegas, O. Vinyals, P. Warden, M. Wattenberg,M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: Large-scale machine learning on heterogeneous distributed sys-tems,” (2016), arXiv:1603.04467 [cs.DC].

[62] Google, “Cirq: A python framework for creating, edit-ing, and invoking noisy intermediate scale quantum cir-cuits,” (2018).

[63] A. Meurer, C. P. Smith, M. Paprocki, O. Certık, S. B.Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K.Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger,R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johans-son, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roucka,A. Saboo, I. Fernando, S. Kulal, R. Cimrman, andA. Scopatz, PeerJ Computer Science 3, e103 (2017).

[64] D. E. Rumelhart, G. E. Hinton, and R. J. Williams,nature 323, 533 (1986).

[65] J. Biamonte and V. Bergholm, “Tensor networks in anutshell,” (2017), arXiv:1708.00006 [quant-ph].

[66] C. Roberts, A. Milsted, M. Ganahl, A. Zalcman,B. Fontaine, Y. Zou, J. Hidary, G. Vidal, and S. Le-ichenauer, “TensorNetwork: A Library for Physicsand Machine Learning,” (2019), arXiv:1905.01330[physics.comp-ph].

[67] C. Roberts and S. Leichenauer, “Introducing Tensor-Network, an Open Source Library for Efficient TensorCalculations,” (2019).

[68] The TensorFlow Authors, “Effective TensorFlow 2,”(2019).

[69] F. Chollet et al., “Keras,” https://keras.io (2015).[70] D. P. Kingma and J. Ba, arXiv preprint arXiv:1412.6980

(2014).[71] G. E. Crooks, arXiv preprint arXiv:1905.13311 (2019).[72] M. Smelyanskiy, N. P. D. Sawaya, and A. Aspuru-

Guzik, ArXiv e-prints (2016), arXiv:1601.07195 [quant-ph].

[73] T. Hner and D. S. Steiger, ArXiv e-prints (2017),arXiv:1704.01127 [quant-ph].

[74] Intel, “Intel Instruction Set Extensions Technology,”(2020).

[75] Mark Buxton, “Haswell New Instruction DescriptionsNow Available!” (2011).

[76] D. Gottesman, Stabilizer codes and quantum error cor-rection, Ph.D. thesis, California Institute of Technology(1997).

[77] M. Suzuki, Physics Letters A 146, 319 (1990).[78] E. Campbell, Physical Review Letters 123 (2019),

10.1103/physrevlett.123.070503.[79] N. C. Rubin, R. Babbush, and J. McClean, New Jour-

nal of Physics 20, 053020 (2018).[80] A. Harrow and J. Napp, arXiv preprint

arXiv:1901.05374 (2019).[81] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Pro-

ceedings of the IEEE 86, 2278 (1998).

http://dx.doi.org/10.1038/543171a


http://dx.doi.org/10.1088/1367-2630/18/2/023023


http://dx.doi.org/10.1038/s41567-019-0648-8

http://dx.doi.org/10.1038/s41567-019-0648-8




https://github.com/quantumlib/Cirq



http://dx.doi.org/10.7717/peerj-cs.103




https://ai.googleblog.com/2019/06/introducing-tensornetwork-open-source.html



https://www.tensorflow.org/guide/effective_tf2

https://keras.io




https://www.intel.com/content/www/us/en/support/articles/000005779/processors.html

https://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available

https://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available

http://dx.doi.org/10.1103/physrevlett.123.070503


39

[82] L. Bottou, in Proceedings of COMPSTAT’2010(Springer, 2010) pp. 177–186.

[83] S. Ruder, arXiv preprint arXiv:1609.04747 (2016).[84] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski,

in Proceedings of the 1988 connectionist models summerschool, Vol. 1 (CMU, Pittsburgh, Pa: Morgan Kauf-mann, 1988) pp. 21–28.

[85] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, andJ. M. Siskind, The Journal of Machine Learning Re-search 18, 5595 (2017).

[86] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis,J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard,et al.

[87] R. Frostig, M. J. Johnson, and C. Leary, .[88] M. Abramowitz and I. Stegun, Appl. Math. Ser 55

(2006).[89] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and

N. Killoran, arXiv preprint arXiv:1811.11184 (2018).[90] I. Newton, The Principia: mathematical principles of

natural philosophy (Univ of California Press, 1999).[91] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii,

Physical Review A 98, 032309 (2018).[92] S. Bhatnagar, H. Prasad, and L. Prashanth, .[93] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker,

and J. Eisert, Physical Review Letters 105 (2010),10.1103/physrevlett.105.150401.

[94] J. Schulman, N. Heess, T. Weber, and P. Abbeel,in Advances in Neural Information Processing Systems(2015) pp. 3528–3536.

[95] V. Havlıcek, A. D. Corcoles, K. Temme, A. W. Harrow,A. Kandala, J. M. Chow, and J. M. Gambetta, Nature567, 209 (2019).

[96] W. Huggins, P. Patil, B. Mitchell, K. B. Whaley, andE. M. Stoudenmire, Quantum Science and Technology4, 024001 (2019).

[97] N. Tishby and N. Zaslavsky, “Deep learning andthe information bottleneck principle,” (2015),arXiv:1503.02406 [cs.LG].

[98] P. Walther, K. J. Resch, T. Rudolph, E. Schenck,H. Weinfurter, V. Vedral, M. Aspelmeyer, andA. Zeilinger, Nature 434, 169 (2005).

[99] M. A. Nielsen, Reports on Mathematical Physics 57,147 (2006).

[100] G. Vidal, Physical Review Letters 101 (2008),10.1103/physrevlett.101.110501.

[101] R. Shwartz-Ziv and N. Tishby, “Opening the blackbox of deep neural networks via information,” (2017),arXiv:1703.00810 [cs.LG].

[102] P. B. M. Sousa and R. V. Ramos, “Universal quan-tum circuit for n-qubit quantum gate: A pro-grammable quantum gate,” (2006), arXiv:quant-ph/0602174 [quant-ph].

[103] M. Y. Niu, L. Li, M. Mohseni, and I. Chuang, to bepublished.

[104] Y. Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil,T. P. Lillicrap, M. Botvinick, and N. de Freitas, arXivpreprint arXiv:1611.03824 (2016).

[105] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprintarXiv:1412.6062 (2014).

[106] G. S. Hartnett and M. Mohseni, “A probabil-ity density theory for spin-glass systems,” (2020),arXiv:2001.00927.

[107] G. S. Hartnett and M. Mohseni, “Self-supervised learn-ing of generative spin-glasses with normalizing flows,”

(2020), arXiv:2001.00585.[108] A. Hagberg, P. Swart, and D. S Chult, Exploring

network structure, dynamics, and function using Net-workX, Tech. Rep. (Los Alamos National Lab.(LANL),Los Alamos, NM (United States), 2008).

[109] M. Streif and M. Leib, arXiv preprint arXiv:1908.08862(2019).

[110] H. Pichler, S.-T. Wang, L. Zhou, S. Choi, andM. D. Lukin, “Quantum optimization for maximumindependent set using rydberg atom arrays,” (2018),arXiv:1808.10816.

[111] M. Wilson, S. Stromswold, F. Wudarski, S. Had-field, N. M. Tubman, and E. Rieffel, “Optimiz-ing quantum heuristics with meta-learning,” (2019),arXiv:1908.03185 [quant-ph].

[112] E. Farhi, J. Goldstone, and S. Gutmann, ArXiv e-prints(2014), arXiv:1411.4028 [quant-ph].

[113] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab-bush, and H. Neven, Nature Communications 9 (2018),10.1038/s41467-018-07090-4.

[114] X. Glorot and Y. Bengio, in AISTATS (2010) pp. 249–256.

[115] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J.Coles, arXiv preprint arXiv:2001.00550 (2020).

[116] Y. Bengio, in Neural networks: Tricks of the trade(Springer, 2012) pp. 437–478.

[117] E. Knill, G. Ortiz, and R. D. Somma, Physical ReviewA 75, 012328 (2007).

[118] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush,N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, andH. Neven, Nature Physics 14, 595 (2018).

[119] E. Grant, L. Wossnig, M. Ostaszewski, andM. Benedetti, Quantum 3, 214 (2019).

[120] A. Skolik, J. R. McClean, M. Mohseni, P. van derSmagt, and M. Leib, in preparation.

[121] B. T. Kiani, S. Lloyd, and R. Maity, (2020).[122] C. Cirstoiu, Z. Holmes, J. Iosue, L. Cincio, P. J.

Coles, and A. Sornborger, “Variational fast forward-ing for quantum simulation beyond the coherence time,”(2019), arXiv:1910.04292 [quant-ph].

[123] N. Wiebe, C. Granade, C. Ferrie, and D. Cory,Physical Review Letters 112 (2014), 10.1103/phys-revlett.112.190501.

[124] G. Verdon, T. McCourt, E. Luzhnica, V. Singh, S. Le-ichenauer, and J. Hidary, “Quantum graph neural net-works,” (2019), arXiv:1909.12264 [quant-ph].

[125] S. Greydanus, M. Dzamba, and J. Yosinski, “Hamil-tonian neural networks,” (2019), arXiv:1906.01563[cs.NE].

[126] L. Cincio, Y. Suba, A. T. Sornborger, and P. J. Coles,New Journal of Physics 20, 113022 (2018).

[127] M. A. Nielsen and I. L. Chuang, Quantum Computationand Quantum Information: 10th Anniversary Edition,10th ed. (Cambridge University Press, USA, 2011).

[128] J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, S. Va-sudevan, D. Moore, B. Patton, A. Alemi, M. Hoffman,and R. A. Saurous, “Tensorflow distributions,” (2017),arXiv:1711.10604 [cs.LG].

[129] C. P. Robert, “The metropolis-hastings algorithm,”(2015), arXiv:1504.01896 [stat.CO].

http://dx.doi.org/ 10.1103/physrevlett.105.150401


http://dx.doi.org/10.1088/2058-9565/aaea94

http://dx.doi.org/10.1088/2058-9565/aaea94





http://arxiv.org/abs/quant-ph/0602174

http://arxiv.org/abs/quant-ph/0602174






http://dx.doi.org/10.1038/s41467-018-07090-4

http://dx.doi.org/10.1038/s41467-018-07090-4

https://journals.aps.org/pra/abstract/10.1103/PhysRevA.75.012328

https://journals.aps.org/pra/abstract/10.1103/PhysRevA.75.012328








http://dx.doi.org/10.1088/1367-2630/aae94a



TensorFlow Quantum: A Software Framework for Quantum ... · 3/9/2020 · Jarrod R. McClean,...

Documents

Transcript of TensorFlow Quantum: A Software Framework for Quantum ... · 3/9/2020 · Jarrod R. McClean,...