Autoencoders and Jet Physics - ias.ust.hkias.ust.hk/program/shared_doc/2019/201901hep... · W rks...

Autoencoders and Jet Physics

Jennifer Thompsn Universität Heidelberg

Based n: QCD r What? T. Heimel, G. Kasieczka, T. Plehn, JT arXiv:1808.08979

IAS HKUST wrkshp11/01/2019

2

Motivation: New Physics Searches● Particle physics is becoming very data-driven

➔ Even more so at future colliders● Can make the most of this

➔ With model independent searches➔ With data driven methods➔ With anomaly detection

● Unsupervised machine learning arXiv:1808.0897%9, T. Heimel, G. Kasiecska, T. Plehn, JT

arXiv:1808.089792, M. Farina, Y. Nakai, D. Shih

arXiv:180%.102731v2 J. Hajer, Y-Y. Li, T. Liu, H. Wang

arXiv:1%08.02479, E.Metodiev, N. Nachman, J. Thaaler

● In this talk: Focus on autoencoders

SM events

HiddenBSM

“Background” events

Anomaly detector

3

Autoencoder Anomaly Search

Idea: Train a network to detect generic anomalies

0 10 20 30

0

5

10

15

20

25

30

3510 7

10 6

10 5

10 4

10 3

10 2

p T[G

eV]

S it’s used t seeing this22 but nt this

Average QCD image

Preprocessing from arXiv:1803.00170% S. Macaluso and D. ShihImage credit: Michel Luchman

4

Search Strategy

Idea: Train a network to detect generic anomalies

0 10 20 30

0

5

10

15

20

25

30

3510 7

10 6

10 5

10 4

10 3

10 2

p T[G

eV]

Anomaly Detector

QCD What

5

Network Architecture

Apply (test)● Background + signal● Reconstruction

✔ Background

✗ Signal

Train● Background only● Compress to latent space● Learn reconstruction● Loss =

0 10 20 30

0

5

10

15

20

25

30

3510 7

10 6

10 5

10 4

10 3

10 2

p T[G

eV]

0 10 20 30

0

5

10

15

20

25

30

3510 7

10 6

10 5

10 4

10 3

10 2

p T[G

eV]

∑pixels

(kTinput

−kTout

)2

6

● So far we have considered images

● We also looked at constituent 4-vectors

● Implemented in with the CoLa/LoLa framework

● Diffeerent architecture● CoLa and LoLa layers at start● No convolutional layers

A Note on Data Format

0 10 20 30

0

5

10

15

20

25

30

3510 7

10 6

10 5

10 4

10 3

10 2

p T[G

eV]

arXiv:1707.089699v3 A. Butteer, G. Kasieczka, T. Plehn, and M. Russell

Cola LoLa

7

CoLa and LoLa Framework

kμ , i→~k μ , i=kμ , jC ji

CLa – simplifieed jet algrithm, trainable cmbinatins

LLa

~k μ , j→ k̂μ , j=(

m j2

pT , jw jmE Em

w jmd d jm

2 )

Traditinal

8

CoLa and LoLa Framework

kμ , i→~k μ , i=kμ , jC ji

CLa

LLa

~k μ , j→ k̂μ , j=(

m j2

pT , jw jmE Em

w jmd d jm

2 )

Traditinal Fr the AutencderMust be invertible

Nt invertible

~k μ , j=(

~E j~k 1 j~k 2 j~k 3 j

)→(~E j~k 1 j~k 2 j~k 3 j

√m j2)

9

Tops vs QCD

0 10 20 30

0

5

10

15

20

25

30

3510 7

10 6

10 5

10 4

10 3

10 2

p T[G

eV]

0 10 20 30

0

5

10

15

20

25

30

3510 7

10 6

10 5

10 4

10 3

10 2

p T[G

eV] Use public data set fr tps vs QCD

https://goo.gl/XGYju3

1t

● Pythia sample with Delphes detectr simulatin 14 TeV● 14 TeV hadrnic tps● 550GeV < pT , j<650GeV

0 100 200 300jet mass [GeV]

0.000

0.005

0.010

0.015

0.020

0.025

norm

alize

d di

strib

utio

n

QCDTop

0 20 40 60 80 100 120number of constituents

0.000

0.005

0.010

0.015

0.020

0.025

0.030

norm

alize

d di

strib

utio

n

QCD

Top

10

Thee Latent Space

0.0 0.2 0.4 0.6 0.8 1.0Signal efficiency S

101

102

Back

grou

nd re

ject

ion

1/B

40

50

60

70

80

90

100

110

120

Num

ber o

f con

stitu

ents


100

101

102

Back

grou

nd re

ject

ion

1/B

5791113151719212325

Bottl

enec

k siz

e

Very stable w.r.t number f cnstituents Plateau with btteleneck size


0.000

0.005

0.010

0.015

0.020

0.025

0.030

norm

alized

distr

ibutio

n

QCD

Top

6 9 12 15 18 21 24Bottleneck size

0.74

0.76

0.78

0.80

0.82

0.84

0.86

0.88

Area u

nder

ROC c

urve

AUC

loss 1

2

3

4

5

valid

ation

loss

105

11

Test application: Tops vs QCD


100

101

102

103

Back

grou

nd re

ject

ion

1/B Constituents

Bottleneck 6AUC 0.93

ImagesBottleneck 32AUC 0.89

Cnstituents utperfrm

imagesWithut seeing tp events

AUC~0.9Supervised

Sees bth tp and QCD eventsAUC~0.98

12

Does the Network use Mass?

0 50 100 150 200 250 300jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

norm

alize

d di

strib

utio

n

100%

70%

40%10%

5%

least QCD-like

0 100 200 300jet mass [GeV]

0.000

0.005

0.010

0.015

0.020

0.025

norm

alize

d di

strib

utio

n

QCDTopLw QCD jet mass

Thoe higher the jet mass, the less QCD-like✔ QCD jet-mass spectrum learnt

✗ Want t learn mre than jet mass

13

Decorrelating from the Jet Mass

We can use an adversary to decorrelate

● Introduce a second network

● Input = diffeerence between initial and finnal image ● Predict the (binned) jet mass●

Mass shaping

Smth distributi

n

Ladv=(~M (|kT ,i

adv−kT , i

auto|)−M )

2

Recnstructed mass

True mass

More adversarial training in particle physocsarXiv:180%.08%733 C. Englert, P, Galler, P. Harris, M. Spannowsky

14

Balancing Loss Functions Loss = : How do we set λ?

- Scan with the background-only hypothesis

- Want both loss functions to be of a similar size

0 50 100 150 200 250 300jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

0.012

norm

alize

d di

strib

utio

n

= 1 10 3

AUC = 0.63

3% TopsQCD3% Tops(5% least QCD-like)QCD(5% least QCD-like)

0 50 100 150 200 250 300jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

norm

alize

d di

strib

utio

n

= 5 10 4

AUC = 0.70


0 50 100 150 200 250 300jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

norm

alize

d di

strib

utio

n

= 1 10 4

AUC = 0.78


Trade ff between perfrmance and mass-shaping

Lauto−λ Ladv

15

Adversarial Autoencoder


100

101

102

103

Back

grou

nd re

ject

ion

1/B

ConstituentsBottleneck 15

= 1 10 3

AUC 0.67

ImagesBottleneck 32

= 5 10 4

AUC 0.70

Wrks well fr images

Hard fr cnstituents:Explicit

mass-dependence

Typical area f interest

Fcus n images

16

Thee Adversarial Autoencoder in a Bump Hunt

1) Train on background

● Monte Carlo

● Data ➔ Background region

2) Decorrelate from jet mass

3) Bump hunt in jet mass

0 50 100 150 200 250 300jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

norm

alize

d di

strib

utio

n

= 5 10 4

AUC = 0.70


Queestin: can we train in the signal regin?

17

Training on the Signal Region

Train on 97% QCD, 3% tops

● Not all events encoded➔ Bottlleneck too small

● Network learns QCD➔ Dominant component

● Still see emergent peak

0 50 100 150 200 250 300jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

norm

alize

d di

strib

utio

n

= 5 10 4

AUC = 0.65


Train n any backgrund-dminated regin

18

Dark Showers

● Dark showers have many possible collider signatures

➔ Increased hadronic activity, displaced vertices,

● We consider a dark SU(3) symmetry

➔ Implemented in Pythia

● 2 benchmark points (fix5ed 200 GeV dark quark mass)

➔ 100 GeV dark meson mass

➔ 10 GeV dark meson mass

● Dark meson able to decay back to SM

●

ETmiss

575GeV < pT , j<625GeV

qv

�qv

q

�q

S

S

19

Dark Shower vs QCD


0.000

0.005

0.010

0.015

0.020

0.025

norm

alize

d di

strib

utio

n

QCD m = 10GeV

m = 100GeV

0 50 100 150 200 250 300jet mass [GeV]

0.0000

0.0025

0.0050

0.0075

0.0100

0.0125

0.0150

0.0175

norm

alize

d di

strib

utio

n

QCD

m = 10GeV

m = 100GeV

100 GeV mesn mass = small mass peak

Similar t QCD

20

Dark Shower Results

0 50 100 150 200 250 300jet mass [GeV]

0.000

0.002

0.004

0.006

0.008

0.010

norm

alize

d di

strib

utio

n

5% least QCD-likefull sample

3% m = 100GeV3% m = 10GeVQCD


100

101

102

103

Back

grou

nd re

ject

ion

1/B

Adversarialm = 100GeV

Adversarialm = 10GeV

Non-Adversarialm = 100GeV

Non-Adversarialm = 10GeV

Discriminatin pwer in challenging prcesses

21

Conclusions

● Autoencoders allow for ● model-independent searches● training entirely on data

● Adversarial autoencoders allow jet mass-decorrelation● Bump hunt in jet mass

● See discrimination power in● Tops vs QCD● Dark showers

Autoencoders and Jet Physics - ias.ust.hkias.ust.hk/program/shared_doc/2019/201901hep... · W rks...

Documents

Transcript of Autoencoders and Jet Physics - ias.ust.hkias.ust.hk/program/shared_doc/2019/201901hep... · W rks...