CMS Physics Analysis Summary - CERNcds.cern.ch/record/2195743/files/BTV-15-002-pas.pdf · 2016. 7....

25
Available on the CERN CDS information server CMS PAS BTV-15-002 CMS Physics Analysis Summary Contact: [email protected] 2016/07/01 Identification of double-b quark jets in boosted event topologies The CMS Collaboration Abstract Searches for new physics at the LHC necessitate to distinguish the merged decay products of resonances produced with high transverse momentum, from jets that originate from single partons. We present an algorithm that aims to reconstruct the b b decay mode of such resonances. The algorithm is applicable to any resonance with the mass close to the W/Z/H boson mass and with high enough energy for its decay products to be clustered in a single jet within a cone of size R = 0.8. An example is the 125 GeV Higgs boson which is the focus of this document. The efficiency and the mistag rate from top quark jets have been measured using the 2.6 fb -1 dataset collected with the CMS experiment at 13 TeV in 2015.

Transcript of CMS Physics Analysis Summary - CERNcds.cern.ch/record/2195743/files/BTV-15-002-pas.pdf · 2016. 7....

  • Available on the CERN CDS information server CMS PAS BTV-15-002

    CMS Physics Analysis Summary

    Contact: [email protected] 2016/07/01

    Identification of double-b quark jets in boosted eventtopologies

    The CMS Collaboration

    Abstract

    Searches for new physics at the LHC necessitate to distinguish the merged decayproducts of resonances produced with high transverse momentum, from jets thatoriginate from single partons. We present an algorithm that aims to reconstruct thebb decay mode of such resonances. The algorithm is applicable to any resonance withthe mass close to the W/Z/H boson mass and with high enough energy for its decayproducts to be clustered in a single jet within a cone of size R = 0.8. An exampleis the 125 GeV Higgs boson which is the focus of this document. The efficiency andthe mistag rate from top quark jets have been measured using the 2.6 fb−1 datasetcollected with the CMS experiment at 13 TeV in 2015.

    http://cdsweb.cern.ch/collection/CMS%20PHYSICS%20ANALYSIS%20SUMMARIESmailto:[email protected]?subject=BTV-15-002

  • 1

    1 IntroductionAs the CERN Large Hadron Collider (LHC) explores a new energy regime, searches for physicsbeyond the Standard Model will probe particles produced with a momentum considerablyhigher than their mass, affecting in a very specific way the event topology. The decay productsof such boosted objects will be highly collimated such that they could end up merged within asingle “fat jet”. Highly boosted objects represent a challenge for the standard jet identificationalgorithms, object identification and isolation criteria.

    The discovery of the Higgs boson at 125 GeV [1, 2] is a major milestone in our understandingof the standard model. Because of the large predicted branching fraction for the H→ bb decay(≈58%), its coupling to b quarks is one of the most interesting to study. For transverse mo-menta, pT, of the Higgs boson above about 300 GeV, the two b quark jets merge into a single jet(”H jet”) for a jet cone size of R = 0.8 and the approach to reconstruct the Higgs boson in thistopology is different than considering the individual smaller-size jets separately. The decayingobject is reconstructed within a single fat jet. Then, the composite nature of such a jet is re-vealed by analyzing its substructure. Several phenomenological studies have explored H→ bbtagging algorithms (or “H tagging”) using jet substructure [3], though ultimately the optimalperformance comes from using both the substructure information of the fat jet and the trackand vertex information related to the b hadron lifetime. The approach presented here exploitsboth the jet substructure and the b tagging information aiming to identify the two b hadronsfrom the bb pair within the same fat jet.

    An algorithm designed to identify the boosted H→ bb signal can be used in many differentprocesses: resonant HH and VH production, as well as searches for the H boson in ttH, VHand VBF production modes and in searches for boosted mono-H, t’ and b’ in the tH and bHfinal states. The flexibily to operate the H tagger in many different topologies and kinematicalregimes is ensured by avoiding a strong performance dependency on the fat jet pT and mass.In principle the algorithm should also be able to identify Z→ bb, as well as any hypotheticalparticle with a mass close to the W/Z/H boson mass that decays into a bb pair.

    2 StrategyIn LHC Run I two different approaches to identify boosted H→bb candidates were exploredand used at CMS: the fat jet and the subjet b tagging [4, 5]. Both approaches are based on thestandard b tagging algorithms which take advantage of the tracking and vertexing informationand are designed to identify jets from single b quarks. In the first approach the standard btagging algorithms are applied to the fat jet but with the track and vertex association criteriarelaxed due to a larger jet cone size, while in the second approach the subjets are first definedand then the standard b tagging is applied to each of the subjets. The performance of the fatjet b tagging is inherently limited by the fact that the algorithm is not designed to identifysignal jets containing two b quarks. On the other hand, the subjet b tagging, with its focuson individual subjets, does not fully profit from the global properties of the fat jets containingtwo b hadrons. The two approaches are therefore complementary to an extent. As shownbelow, the fat jet b tagging performs better in the high efficiency regime mostly relying onthe presence of displaced tracks, while the subjet b tagging performs better in the high purityregime relying heavily on the reconstruction of secondary vertices associated to the subjets.Furthermore, at high pT the subjets start to overlap causing the standard b tagging techniquesto break down due to double-counting of tracks and secondary vertices when computing thesubjet b-tag discriminants. In Run I the subjet b tagging was mainly used at CMS to identify

  • 2 3 CMS detector and event samples

    boosted H→bb candidates [6–8]. While this approach is successful, its complementarity withthe fat jet b tagging is a clear indication that further improvements can be achieved.

    In this document we present a novel approach to identifying boosted H→bb candidates whichtries to fully exploit the presence of two b quarks inside a fat jet and their topology in relation tothe jet substructure, namely the fact that the b hadron flight directions are strongly correlatedwith the energy flows of the two subjets. To discriminate bb originated from a heavy resonancefrom QCD jets initiated by single partons, we have developed a dedicated multivariate (MVA)tagging algorithm, named “double-b tagger”, implemented and optimized using the TMVApackage [9]. To reconstruct b hadron decay vertices, we apply the Inclusive Vertex Finder(IVF) algorithm [4, 10] which identifies secondary vertices independently of the jet clustering.We reconstruct the decay chains of the two b hadrons by associating reconstructed secondaryvertices to the subjet axes represented by τ-axes defined in Section 4. No other substructurevariable or quantity is employed. As shown in Section 5, we find that this novel approachgreatly improves the ability to identify boosted Higgs bosons with respect to previously usedmethods.

    In Section 6 efficiency measurements for the double-b tagger performed in data are presented.Due to the small cross section of producing events with boosted H→bb or Z→bb jets [11], thedouble-b tagger efficiency is measured using QCD multijet events enriched in jets from gluonsplitting to bb (g→ bb) with topology similar to that of boosted H→ bb jets.

    One of the major backgrounds for analyses selecting boosted H or Z bosons decaying to bbpairs is the tt production. The misidentification rate for boosted top quark jets faking H jetsis measured in data, as described in Section 7, using a data sample enriched in semileptonic ttevents.

    3 CMS detector and event samplesThe central feature of the Compact Muon Solenoid (CMS) apparatus is a superconductingsolenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the supercon-ducting solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electro-magnetic calorimeter (ECAL), and a brass/scintillator hadron calorimeter (HCAL). Muons aremeasured in gas-ionization detectors embedded in the steel return yoke outside the solenoid.Extensive forward calorimetry (|η| > 3) complements the coverage provided by the barrel(|η| < 1.3) and endcap (1.3 < |η| < 3) detectors. The first level (L1) of the CMS trigger system,composed of custom hardware processors, uses information from the calorimeters and muondetectors to select the most interesting events in a fixed time interval of less than 4 µs. The highlevel trigger (HLT) processor farm further decreases the event rate from around 100 kHz toaround 300 Hz, before data storage. A more detailed description of the CMS detector, togetherwith a definition of the coordinate system used and the relevant kinematic variables, can befound in Ref. [12].

    Simulated Monte Carlo (MC) samples of heavy resonances decaying to two Higgs bosons(X→ HH) have been used as source of H jets. This simple topology is optimal for this studysince no other objects are present in the final state and as the mass of the resonance increases,the H bosons are produced with larger boost. An example is KK-Graviton of signal [13] whichis produced through gluon fusion and has spin 2. Several mass points are considered in orderto cover a large enough phase space to study the pT dependency, (800 GeV- 3.5 TeV).

  • 3

    QCD multijets events, used in Section 5 and 6, are simulated using PYTHIA [14] for different p̂Tbins and combined together to cover a broad kinematic range.

    Several MC samples are necessary for the evaluation of the mistag rate from top quarks detailedin Section 7. Top quark pair events are simulated with the next-to-leading-order generatorPOWHEG v2 [15–18]. This generator is also used for the electroweak production of single topquarks in the tW channel [19]. The MC@NLO generator is used for the s- and t-channel processesof single top quark production [20] and for the Z+jets backgrounds. The generation of theW+jets was performed with MADGRAPH [21]. The MLM matching scheme is used, allowingup to four additional partons in the matrix element [22]. All samples are interfaced to PYTHIAfor the showering.

    All events are generated using the parton distribution functions (PDF) from the NNPDF 3.0PDF sets [23], while for the showering the underlying event tune CUETP8M1 [24] is used. Tosimulate accurately the LHC luminosity conditions during the 2015 data taking period, ad-ditional pp interactions overlapping with the event of interest in the same bunch crossing, de-noted as pileup events, are added in the simulated samples to reproduce the pileup distributionmeasured in data.

    Data corresponding to an integrated luminosity of 2.6 fb−1 at√

    s = 13 TeV with 25 ns bunchspacing in 2015 are used. They have been collected with single jet triggers with pT threshold of200, 260, 320 and 400 GeV in order to measure the double-b tagger efficiency. All triggers exceptthe one with the highest threshold have been prescaled to limit the trigger rates, which meansthat the event samples they recorded correspond to a lower integrated luminosity. Triggerswith different pT thresholds are combined to gain efficiency, taking trigger prescale factorsinto account. Apart from the prescaling, the trigger efficiency is more than 99% in the phasespace selected for this study. Collision events recorded with a single muon trigger, requiringpT(µ) > 45 GeV and |η(µ)| < 2.1 are used for the mistagging measurement from top quarkjets.

    4 Event reconstruction and fat jet identificationStable particles are identified with the particle-flow (PF) algorithm [25, 26] that reconstructseach individual particle with an optimized combination of information from the various ele-ments of the CMS detector.

    Events are required to have at least one reconstructed vertex consistent with a pp interaction.The vertex with the highest sum of the transverse momentum squared of the associated physicsobjects is considered to be the primary interaction vertex.

    Muons are reconstructed within |η| < 2.4 by selection criteria based on the compatibility ofthe track reconstructed by means of the silicon tracker only and of the combination of thehits in both the silicon tracker and the muon spectrometer [27]. Additional requirements arebased on the compatibility of the trajectory with the primary vertex and on the number ofhits observed in the tracker and muon systems. The muon isolation requirement is computedusing the reconstructed tracks within ∆R =

    √(∆η)2 + (∆Φ)2 < 0.3 from the muon direction,

    excluding the muon itself.

    Jets are reconstructed from particle-flow candidates using the anti-kT clustering algorithm [28],with a distance parameter of R= 0.8 (AK8), as implemented in the FASTJET package [29, 30].Jet energy corrections, as a function of pseudorapidity and transverse momentum of the jet, areapplied [31]. Jet identification criteria are also applied to reject fake jets from detector noise and

  • 4 5 Double-b tagger algorithm

    jets originating from primary vertices not associated with the hard interaction [32]. We selectjets in the event requiring |η| < 2.4, so that they fall within the tracker acceptance.

    As the mass of the H boson is larger than the invariant mass of a typical QCD jet, the jet massis critical to distinguish a H jet from a QCD jet. In physics analyses the jet mass may be used todefine sidebands for the background modeling. So to avoid jet mass dependency of the taggerperformance, the mass is not used as input in the multivariate discriminant but only to selectjets to be used for the algorithm training.

    The bulk of the H jet mass arises from the kinematics of the two jet cores that correspond to thetwo b quarks. In contrast, the QCD jet mass arises mostly from soft gluon radiation. Due tocontributions from initial state radiation, the underlying event and pileup the reconstructed jetmass can be far higher than the mass of the initial parton. This effect is exacerbated by using alarge distance parameter for jet reconstruction.

    Jet grooming methods such as filtering [3], trimming [33] and pruning [34] help to remove thesofter radiation. This shifts the jet mass of QCD jets to smaller values, while maintaining thejet mass for H jets close to the H boson mass. We adopt pruning as technique to remove softand wide-angle radiation. We use the pruned jet mass to select jets to be used for this study.The mass window is 70 - 200 GeV to cover a range around the Higgs boson mass within theresolution.

    Substructure information enclosed in the N-subjettiness [35], τN , is usually exploited in severalCMS physics results involving boosted bosons or top quarks. It is a generalized jet shapeobservable, which is computed under the assumption that the jet has N subjets, and it is thepT-weighted distance between each jet constituent and its nearest subjet axis (∆R):

    τN =1d0

    ∑k

    pT,kmin{∆R1,k, ∆R2,k, . . . , ∆RN,k} (1)

    where k runs over all constituent PF candidates. The normalization factor is d0 = ∑k pT,kR0and R0 is the original jet distance parameter (0.8). The τN observable has a small value if thejet is consistent with having N or fewer subjets. The ratio τ2/τ1 is useful for discriminationbetween H jets with two subjets and QCD jets consistent with a single subjet, as it tends tosmaller values for H jets. The subjet axes are obtained by reclustering the jet constituents usingthe kT algorithm and undoing the last step of the sequential recombination. These kT subjetaxes are then used as a starting point for the N-subjettiness minimization. The N-subjettinessaxes (to which we also refer as τ-axes) are an integral part of the N-subjettiness computation.We do not exploit τ2/τ1 as a discriminating variable, but we do use the τ-axes to estimate the band b directions as illustrated in the drawing in Fig. 1.

    5 Double-b tagger algorithmSeveral observables exploiting the distinctive properties of b hadrons are employed as inputvariables for the CSVv2 [36] algorithm used in the CMS collaboration. Following that examplewe have adapted their definition to deal with the bb topology. We substitute the jet axis infor-mation with the two τ-axes to resolve the two b hadron decay chains we expect for the H→ bbsignal.

  • 5.1 Discriminating variables 5

    subjets fatjet double-b

    τ-axis1τ-axis2

    Figure 1: Schematic comparison of the fat jet and subjet b tagging approaches and the presenteddouble-b tagger.

    5.1 Discriminating variables

    We present here the discriminating variables that are used as input to the MVA algorithm todistinguish between the signal H→ bb jets and the background from inclusive QCD jets. Thevariables rely on reconstructed tracks, secondary vertices (SV) as well as the two-SV system.Since the angular separation between the decay products of a resonance depend on the mo-mentum and the mass of the resonance, in order to keep the algoritm as general as possible,one of the guiding principle in the selection of input variables is that the variables do not havestrong dependence on the jet pT and the jet mass.

    Tracks with pT > 1 GeV are associated to jets in a cone ∆R 0.8 GeV, secondary vertices are identified through the Inclusive Vertex Finder(IVF) [4, 10] algorithm. This algorithm is not seeded from tracks associated to the reconstructedjets, but it uses as input the collection of reconstructed tracks in the event. The reconstructedsecondary vertices are associated to jets in a cone ∆R

  • 6 5 Double-b tagger algorithm

    tracks ordered in decreasing SIP, to further discriminate against single b quark andlight flavor jets from QCD when one or both SV are not reconstructed due to IVFinefficiencies;

    • The measured IP significance in the plane transverse to the beam axis, 2D SIP, ofthe first two tracks (first track) that raises the SV invariant mass above the bottom(charm) threshold of 5.2 (1.5) GeV;

    • The number of SV associated to the jet;• The significance of the 2D distance between the primary vertex and the secondary

    vertex, flight distance, for the SV with the smallest 3D flight distance uncertainty, foreach of the two τ-axes;

    • The ∆R between the SVs with the smallest 3D flight distance uncertainty and itsτ-axis, for each of the two τ-axes;

    • The relative pseudorapidity, ηrel, of the tracks from all SVs with respect to their τ-axis for the three leading tracks ordered in increasing ηrel, for each of the two τ-axes;

    • The total SV mass, defined as the total mass of all SVs associated to a given τ-axis,for each of the two τ-axes;

    • The ratio of the total SV energy, defined as the total energy of all SVs associated toa given τ-axis, and the total energy of all the tracks associated to the fat jet that areconsistent with the primary vertex, for each of the two τ-axes;

    • The information related to the two-SV system, the z variable, defined as:

    z = ∆R(SV0, SV1) ·pT,SV1

    m(SV0, SV1)(2)

    where SV0 and SV1 are SVs with the smallest 3D flight distance uncertainty. The zvariable helps rejecting the bb background from gluon splitting relying on the dif-ferent kinematic properties compared to the bb pair from the decay of a massiveresonance.

    We select as discriminating variables all those with enough classifier separation (a default out-put of TMVA), that show small correlation with the other inputs and improve the QCD back-ground discrimination by at least 5%. In total 27 variables are used as input to the multivariatediscriminant. The most discriminating variables are the SIP for the most displaced tracks, thevertex energy ratio for SV0, and the 2D SIP for the first track above bottom threshold. In Fig. 2distributions for some discriminating input variables are shown for the signal H→ bb jets andthe background QCD jets. In particular g → bb and single b quark production are shown sep-arately as well as light flavor jet contribution. The secondary vertex multiplicity and the vertexenergy ratio for SV0, along with SIP of the first track above bottom threshold show a good sep-aration between the H→ bb jets and different QCD jet components. The z variable shows gooddiscrimination against the g→ bb contribution.

    Several variables related to the presence and properties of soft leptons arising from the b hadrondecay have also been investigated. Despite a small gain in performance, the soft lepton vari-ables were excluded from the final list of input variables since they could introduce undesiredbiases in the performance measurement in data where µ-tagged jets from QCD multijets eventsare used.

  • 5.1 Discriminating variables 7

    2D SIP for first track above b threshold0 2 4 6 8 10 12 14 16 18 20

    a.u.

    -410

    -310

    -210

    -110

    1

    QCD, single bbQCD, gluon splitting to b

    QCD, light flavor)bH(b

    (13 TeV)

    CMSSimulation Preliminary

    > 300 GeVT

    AK8, p70 < m < 200 GeV

    number of SV0 1 2 3 4 5 6 7 8

    a.u.

    -410

    -310

    -210

    -110

    1QCD, single b

    bQCD, gluon splitting to bQCD, light flavor

    )bH(b

    (13 TeV)

    CMSSimulation Preliminary

    > 300 GeVT

    AK8, p70 < m < 200 GeV

    energy ratio0SV0 0.5 1 1.5 2 2.5 3 3.5 4

    a.u.

    -310

    -210

    -110

    1

    QCD, single bbQCD, gluon splitting to b

    QCD, light flavor)bH(b

    (13 TeV)

    CMSSimulation Preliminary

    > 300 GeVT

    AK8, p70 < m < 200 GeV

    z variable0 5 10 15 20 25 30 35 40

    a.u.

    -310

    -210

    -110

    1QCD, single b

    bQCD, gluon splitting to bQCD, light flavor

    )bH(b

    (13 TeV)

    CMSSimulation Preliminary

    > 300 GeVT

    AK8, p70 < m < 200 GeV

    Figure 2: Distributions of 2D IP significance for the most displaced track raising the SV in-variant mass above the bottom quark threshold, number of secondary vertices associated tothe AK8 jet, the vertex energy ratio for SV0, and the z variable. Comparison between H→ bbjets from simulated samples of KK-Graviton decaying to HH and QCD jets containing zero,one or two b quarks are used. AK8 jets are selected with pT > 300 GeV and pruned jet mass70 < m < 200 GeV. The distributions are normalized to unit area.

  • 8 6 Efficiency measurement in data

    5.2 Performance

    A boosted decision tree (BDT), implemented using the TMVA package [9], is trained on sim-ulated signal jets using the aforementioned discriminating variables and its output is used toseparate signal from background jets. We select signal and background jets in the 70–200 GeVpruned jet mass window and 300–2500 GeV jet pT range. For a reliable comparison of H signaland QCD background, both are simulated with similar pT shapes: the mean and RMS of the pTdistributions are 710 and 317 GeV for H jets, while 669 and 306 GeV for QCD jets, respectively.

    We compare the performance of the double-b tagger with the fat and subjet b tagging algo-rithms [36]. Both fat and subjet b tagging are based on the CSVv2 algorithm [36].

    The mistag rate is provided for the inclusive QCD jets in Fig. 3 and also separately for theg→ bb component. At the same signal efficiency, the mistag rate is uniformily lower by about afactor of 2 compared to the subjet b tagging approach. Given the different kinematic propertiesexpected for a bb pair originating from the decay of a massive resonance compared to gluonsplitting, the mistag rate for the gluon splitting background reduces from 60% to 50% at 80%signal efficiency and from 20% to 10% at 35% signal efficiency compared to the subjet approach.The performance curves for subjet show a discontinuity, which is due to non physical value ofthe CSVv2 discriminant for at least one subjet when no tracks are associated to it.

    In each case the double-b tagger outperforms the fat jet and subjet b tagging approaches. Athigh pT the new tagger improves even more the signal identification efficiency with respect tothe other two taggers, which is an important gain for searches for heavy resonance where veryhigh pT jets are expected.

    In Fig. 4 the signal efficiencies and mistag rates for the double-b tagger are reported as func-tion of jet pT for three operating points: loose (discriminant value >0.3), medium (>0.6) andtight (>0.9) which correspond to 80%, 70% and 35% signal efficiency, respectively, for a jet pTof about 1000 GeV. The mistag rate is rather flat across the pT range considered while the sig-nal efficiency decreases with increasing pT, as expected from the degradation of the trackingperformance inside high pT jets.

    6 Efficiency measurement in dataThe efficiency of the double-b tagger is measured in the data sample, described in Section 3,consisting of high pT jets enriched in bb from gluon splitting. In order to select topologies assimilar as possible to a signal jet, we require an AK8 jet with pT > 300 GeV and pruned mass> 50 GeV. We ask the jet to be matched to at least two muons, each with pT > 7 GeV and|η| < 2.4. Each pruned subjet is required to have at least one muon among its constituents andwithin ∆R < 0.4 from the subjet axis (“double-muon tagged”). An alternative selection thatrequires at least one muon is also examined as cross check for the measurement (“single-muontagged”). While this single-muon selection allows for a larger dataset in which to performthe tagger efficiency measurement, the gluon splitting topology in this inclusive phase space isless signal like relative to the double-muon selection. Thus, to maximize the similarity betweenthe g → bb and the H→ bb topology, the measurement is performed requiring double-muontagged jets.

    The comparison between the data and the simulated samples of the variables that are used asinputs to the double-b tagger shows good agreement, as can be seen Fig. 5. In Fig. 6 we reportalso the double-b tagger output in data and simulated events. The total number of entries inthe simulation is normalized to the observed number of entries in data. Overall the agreement

  • 9

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y

    3−10

    2−10

    1−10

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 500 GeVT

    70 < m < 200 GeV , 300 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y

    3−10

    2−10

    1−10

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 800 GeVT

    70 < m < 200 GeV , 500 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y

    3−10

    2−10

    1−10

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 2000 GeVT

    70 < m < 200 GeV , 800 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    )bb→

    Mis

    tagg

    ing

    effic

    ienc

    y (g

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 2000 GeVT

    70 < m < 200 GeV , 800 < p

    Figure 3: Comparison of the performance of the double-b tagger, the minimum CSVv2 valueamong the two subjets b tag scores, and fat jet b tag which exploits CSVv2 algorithm. The tag-ging efficiency for signal is evaluated using boosted H→ bb jets from simulation. The mistagrate is evaluated for simulated QCD jets containing zero, one or two b quarks. Top-left for alljets with 300< pT 800 GeV the mistag rate evaluated for g→ bb.

  • 10 6 Efficiency measurement in data

    (GeV)T

    p400 600 800 1000 1200 1400 1600 1800 2000

    )bb→

    Tag

    ging

    Effi

    cien

    cy (

    H

    0

    0.2

    0.4

    0.6

    0.8

    1 Double-b LooseDouble-b MediumDouble-b Tight

    (13 TeV)CMS Simulation Preliminary

    70 < m < 200 GeV

    (GeV)T

    p400 600 800 1000 1200 1400 1600 1800 2000

    Mis

    tagg

    ing

    Effi

    cien

    cy

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2

    Double-b LooseDouble-b MediumDouble-b Tight

    (13 TeV)CMS Simulation Preliminary

    70 < m < 200 GeV

    Figure 4: Signal efficiency (left) and mistag rate (right) distribution with respect to jet pT aftera selection on the double-b tagger for Loose, Medium and Tight operating points. SimulatedH→ bb jets from KK-Graviton decaying to HH (left) and QCD jets containing zero, one ortwo b quarks (right) are used. AK8 jets are selected with pT > 300 GeV and pruned jet mass70 < m < 200 GeV.

    between data and simulation is fairly good.

    The efficiency of the double-b tagger is measured in data and MC for three different operatingpoints as defined in Section 5.2. The measurement relies on the Jet Probability (JP) discrimi-nant, for which the expected simulated distributions (“templates”) are different for the variousjet flavors. The fraction of b (from gluon splitting) jets is estimated by fitting the data distribu-tion of the JP variable with the templates. This so-called Lifetime Tagging (LT) method [37] isalso used to perform the measurement of the b jet identification efficiency scale factors for thestandard anti-kT R = 0.4 (AK4) jets [36].

    The QCD MC sample is split into events containing b quark jets arising from gluon splittingand those (from b, c, light parton) which are not associated to this process, by requiring at leasttwo generator level b hadrons clustered inside the jet. An example of fitted distributions forthe JP discriminant in data is presented in Fig. 7.

    The resulting data/MC efficiency scale factors (SFs) are presented in Fig. 8 and listed in Ta-bles 1–3 for the double-muon tagged selection. The measurement is done for jets with pT upto 700 (500) GeV for loose and medium (tight) operating points, which is driven by the size ofthe available data sample. Jets with larger pT are included in the last pT bin with an additionalcontribution up to ' 20% to the total number of jets selected in this bin.

    As several background processes are being varied as a combined template in the fit procedure,the results could be sensitive to the prediction of the flavor composition of this backgroundsample. The uncertainty on the scale factor due to the template definition is estimated by con-servatively varying the normalization of each background contribution by ± 50%. As a crosscheck, the scale factor derivation is also performed by using all the background contributions asindividual templates in the fit. The background template normalization variation contributesup to 5% as a systematic uncertainty on the scale factor.

    Uncertainties on jet energy scale (JES) corrections are included as shape systematics on the JP

  • 11T

    rack

    s / 2

    1

    10

    210

    310

    410

    510

    610

    710

    810 Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Double-muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    2D SIP for first track above b threshold20− 15− 10− 5− 0 5 10 15 20

    Dat

    a/M

    C

    00.5

    1

    1.52

    SV

    s / 1

    1

    10

    210

    310

    410

    510

    610

    710Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Double-muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    number of SV0 1 2 3 4 5 6 7 8

    Dat

    a/M

    C

    00.5

    1

    1.52

    SV

    s / 0

    .2

    1

    10

    210

    310

    410

    510

    610Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Double-muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    energy ratio0SV0 0.5 1 1.5 2 2.5 3 3.5 4

    Dat

    a/M

    C

    00.5

    1

    1.52

    Jets

    / 2

    1

    10

    210

    310

    410

    510

    610

    710 Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Double-muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    variablez0 5 10 15 20 25 30 35 40

    Dat

    a/M

    C

    00.5

    1

    1.52

    Figure 5: Distributions of 2D IP significance for the most displaced track raising the SV invari-ant mass above the b quark threshold, number of secondary vertices associated to the AK8 jet,the vertex energy ratio for SV0, and the z variable. Data and simulated events are shown for thedouble-muon tagged jets selection. Simulated events are normalized to the yield observed indata, the overflow is in last bin. The bottom panel in each figure shows the ratio of the numberof events observed in data to that of the MC prediction.

    AK

    8 je

    ts /

    0.1

    1

    10

    210

    310

    410

    510

    610

    710

    L M T

    Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Double-muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    double b-tagger discriminant 1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1

    Dat

    a/M

    C

    00.5

    1

    1.52

    Figure 6: Double-b tagger discriminant distribution in data and simulated samples for thedouble-muon tagged jets selection. Simulated events are normalized to the yield observed indata. The loose, medium and tight operating points are also reported. The bottom panel showsthe ratio of the number of events observed in data to that of the MC prediction.

  • 12 7 Mistag rate measurement from top quark jets in data

    JP discriminant0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

    Jets

    / 0.

    1

    0

    10

    20

    30

    40

    50CMSPreliminary

    (13 TeV, 25 ns)-12.6 fb

    < 600 GeVT

    500 < p

    Data

    b b→g

    c c→b + g

    c + light

    JP discriminant0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

    Jets

    / 0.

    1

    0

    5

    10

    15

    20

    25

    30 CMSPreliminary

    (13 TeV, 25 ns)-12.6 fb

    < 600 GeVT

    500 < p

    Data

    b b→g

    other flavors

    Figure 7: Comparison of the JP discriminant distribution for the data and the sum of the fittedtemplates for all selected jets (left) and those jets passing the loose double-b tagger requirement(right) with pT between 500 and 600 GeV. The shaded area represents the statistical and sys-tematic (refer to the text for details) uncertainties on MC templates. Double-muon tagged AK8jets are used for this measurement. The overflow is included in the last bin.

    discriminant and their impact on the scale factor measurement is negligible. Systematic uncer-tainties due to bad modeling of track multiplicity and the b fragmentation function contribute5% and 2% at most, respectively. Those associated to pileup, c quark fragmentation function,uncertainties on the fragmentation rate of a c quark to various D mesons, the branching ratiosfor c hadrons to muons, the KS and Λ production fraction are found to be negligible.

    We estimate the effect of the residual shape differences in the double-b tagger discriminantdistribution between simulated H→ bb and g→ bb jet topologies. We compute a set of weightsin order to match gluon splitting to H jets for the vertex energy ratio for SV0 and the z variabledistributions. Then, these weights are applied to both data and simulated events and the SFsare measured again. We found the SFs computed with and without applying these weightsto agree within the uncertainty, validating the assumption of the gluon splitting being a goodproxy for the signal in the selected phase space.

    The SFs derived using double-muon and single-muon tagged jets are compatible, though thedouble-muon SFs have larger uncertainties, due to the limited size of the data sample. In bothcases the Data/MC SFs are compatible with unity within uncertainties.

    Table 1: Loose double-b tag efficiency (e) and Data/MC efficiency ratio (SF). Uncertainties areboth statistical and systematic for the SF and data efficiency, while for the MC efficiency onlythe statistical uncertainty is reported. Jets with pT > 700 GeV are included in the last bin.

    pT (GeV) 300 - 400 400 - 500 500 - 600 600 - 700e (Data) 0.79 ± 0.07 0.78 ± 0.09 0.70 ± 0.14 0.66 ± 0.17e (MC) 0.83 ± 0.01 0.79 ± 0.01 0.77 ± 0.01 0.68 ± 0.01

    SF 0.95 ± 0.08 0.98 ± 0.12 0.91 ± 0.18 0.97 ± 0.25

    7 Mistag rate measurement from top quark jets in dataWe evaluate the differences between data and MC in the misidentification rate for top quarkjets faking H jets, in tt production. These studies are based on the single lepton tt final state,where one top quark decays leptonically and the other hadronically. Reconstructed muons are

  • 13

    [GeV]T

    p

    350 400 450 500 550 600 650 700

    Dat

    a/S

    imul

    atio

    n S

    F

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    Loose double-b tag

    double muonstat

    syst⊕stat single muonstat

    syst⊕stat

    CMSPreliminary

    (13 TeV, 25 ns)-12.6 fb

    [GeV]T

    p

    350 400 450 500 550 600 650 700

    Dat

    a/S

    imul

    atio

    n S

    F

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    Medium double-b tag

    double muonstat

    syst⊕stat single muonstat

    syst⊕stat

    CMSPreliminary

    (13 TeV, 25 ns)-12.6 fb

    [GeV]T

    p

    300 350 400 450 500

    Dat

    a/S

    imul

    atio

    n S

    F

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    Tight double-b tag

    double muonstat

    syst⊕stat single muonstat

    syst⊕stat

    CMSPreliminary

    (13 TeV, 25 ns)-12.6 fb

    Figure 8: Data/MC efficiency ratio (SF) for loose, medium and tight double-b tagger require-ment obtained with single and double-muon tagged selections. Central values of scale factorsare artificially shifted along the x-axis for better visibility.

  • 14 7 Mistag rate measurement from top quark jets in data

    Table 2: Medium double-b tag efficiency (e) and Data/MC efficiency ratio (SF). Uncertaintiesare both statistical and systematic for the SF and data efficiency, while for the MC efficiencyonly the statistical uncertainty is reported. Jets with pT > 700 GeV are included in the last bin.

    pT (GeV) 300 - 400 400 - 500 500 - 600 600 - 700e (Data) 0.70 ± 0.07 0.70 ± 0.09 0.60 ± 0.12 0.58 ± 0.12e (MC) 0.75 ± 0.01 0.70 ± 0.01 0.64 ± 0.01 0.55 ± 0.01

    SF 0.92 ± 0.09 0.99 ± 0.12 0.94 ± 0.19 1.05 ± 0.21

    Table 3: Tight double-b tag efficiency (e) and Data/MC efficiency ratio (SF). Uncertainties areboth statistical and systematic for the SF and data efficiency, while for the MC efficiency onlythe statistical uncertainty is reported. Jets with pT > 500 GeV are included in the last bin.

    pT (GeV) 300 - 400 400 - 500e (Data) 0.43 ± 0.04 0.36 ± 0.05e (MC) 0.47 ± 0.01 0.39 ± 0.01

    SF 0.90 ± 0.09 0.91 ± 0.14

    required to have pT > 50 GeV and |η| < 2.1. The event selection requires exactly one isolatedmuon and at least one AK4 jet in the same hemisphere of the event and additional muons inthe event are vetoed. These requirements account for the leptonic decay of the top quark. Theisolated muon is used to divide each event into two hemispheres: the hadronic hemisphere:|φ − φµ| > 23 π and the leptonic hemisphere: |φ − φµ| <

    23 π. The hadronic decay of the top

    quark is selected requiring an AK8 jet in the hadronic hemisphere, with a large transversemomentum.

    At least two AK4 jets with pT > 30 GeV and |η| < 2.4 are required in the event. One AK8 jetwith pT > 300 GeV, |η| < 2.4 and mass between 70 GeV and 200 GeV is required in the hadronichemisphere of the event. This jet represents the probe jet, used to evaluate the misidentificationrate for the double-b tagger discriminant. An additional requirement τ3/τ2 550 GeV. For the tight working point only the inclusivescale factor achieves a significant statistical precision, but the pT-dependent scale factors arealso reported for completeness. The scale factors are summarized in Table 4. The systematic

  • 15

    L M T

    Figure 9: Double-b tagger discriminant distribution for the jet associated to the boosted topquark hadronic decay in selected semi-leptonic tt events. Simulated events are normalized tothe yield observed in data. The bottom panel shows the ratio of the number of events observedin data to that of the MC prediction.

    uncertainties are propagated only to the pT-inclusive scale factors. The measured SFs for themistag rate from top quarks are close to one within the uncertainty. The data and simulationagreement is comparable to the other b tagging approaches pursued in the CMS collabora-tion [36], and to the level of agreement observed in the muon-enriched multijet sample shownin Fig. 6.

    Table 4: Mistag scale factors from top quark jets for the three operating points of the double-btagger and for different pT ranges. The reported uncertainties are statistical only. For the pT-inclusive scale factor (pT >300 GeV) both statistical and systematic uncertainties are reported.Jets with pT > 700 (500) GeV are included in the last bin.

    pT bin (GeV) 300 - 400 400 - 550 550 - 700 inclusive (300 - 700 GeV)loose double-b

    e (Data) 0.40±0.04 0.40±0.05 0.47±0.09 0.41±0.03e (MC) 0.33±0.01 0.36±0.0.01 0.34±0.01 0.34±0.01

    SF 1.24±0.13 1.12±0.13 1.40±0.32 1.20±0.09 (stat.) ±0.05 (syst.)medium double-b

    e (Data) 0.26±0.04 0.25±0.04 0.25±0.03 0.26±0.03e (MC) 0.23±0.01 0.25±0.01 0.22±0.01 0.24±0.01

    SF 1.14±0.16 1.01±0.17 1.13±0.39 1.09±0.11 (stat.) ±0.05 (syst.)pT bin (GeV) 300 - 400 400 - 500 inclusive (300 - 500 GeV)

    tight double-be (Data) 0.10±0.02 0.08±0.02 0.06±0.01e (MC) 0.07±0.01 0.06±0.01 0.09±0.02

    SF 1.54±0.36 1.41±0.39 1.49±0.27 (stat.) ±0.05 (syst.)

  • 16 References

    8 ConclusionsWe have presented the “double-b tagging” algorithm aiming at identifying the bb decay modeof resonances produced with high transverse momentum and detected as single fat jets, and todistinguish them from jets initiated by single partons. An example is the Higgs boson whichis the focus of this document but with general applicability to any resonance with the massclose to the W/Z/H mass and pT above 300 GeV. We show that this new tagger outperformsthe previous techniques to distinguish H jets from the QCD background. At the same signalefficiency, the mistag rate is lower by a factor of 2 compared to, for example, the subjet b taggingapproach [4, 5]. Given the different kinematic properties expected for a bb pair originating fromthe decay of a massive resonance compared to gluon splitting, the mistag rate for the gluonsplitting background reduces from 60% to 50% for the loose operating point and from 20% to10% for the tight operating point compared to the subjet approach. The efficiency and mistagrate from top quark jets have been measured in data and correction factors for simulated jetshave been derived for three different operating points for jets with pT between 300 and 700GeV (500 GeV for the tight working point). The uncertainty on the scale factor measurementswill be improved with the increased integrated luminosity in 2016, as well as a higher pT rangewill be covered.

    References[1] CMS Collaboration, “Observation of a new boson at a mass of 125 GeV with the CMS

    experiment at the LHC”, Phys. Lett. B 716 (2012) 30–61,doi:10.1016/j.physletb.2012.08.021, arXiv:1207.7235.

    [2] ATLAS Collaboration, “Observation of a new particle in the search for the StandardModel Higgs boson with the ATLAS detector at the LHC”, Phys. Lett. B 716 (2012) 1–29,doi:10.1016/j.physletb.2012.08.020, arXiv:1207.7214.

    [3] J. M. Butterworth, A. R. Davison, M. Rubin, and G. P. Salam, “Jet substructure as a newHiggs search channel at the LHC”, Phys.Rev.Lett. 100 (2008) 242001,doi:10.1103/PhysRevLett.100.242001, arXiv:0802.2470.

    [4] CMS Collaboration, “Performance of b tagging at√

    s=8 TeV in multijet, ttbar and boostedtopology events”, CMS Physics Analysis Summary CMS-PAS-BTV-13-001 (2013).

    [5] CMS Collaboration, “Performance of b tagging in boosted topology events”, CMSPerformance Note CMS-DP-2014-031, 2014.

    [6] CMS Collaboration, “Search for vector-like T quarks decaying to top quarks and Higgsbosons in the all-hadronic channel using jet substructure”, JHEP 06 (2015) 080,doi:10.1007/JHEP06(2015)080, arXiv:1503.01952.

    [7] CMS Collaboration, “Search for pair-produced vector-like B quarks in proton-protoncollisions at

    √s = 8 TeV”, (2015). arXiv:1507.07129. Submitted to Phys. Rev. D.

    [8] CMS Collaboration, “Search for heavy resonances decaying to two Higgs bosons in finalstates containing four b quarks”, (2016). arXiv:1602.08762. Submitted to Eur. Phys. J.C.

    [9] A. Hoecker et al., “TMVA: Toolkit for Multivariate Data Analysis”, PoS ACAT (2007)040, arXiv:physics/0703039.

    http://dx.doi.org/10.1016/j.physletb.2012.08.021http://www.arXiv.org/abs/1207.7235http://dx.doi.org/10.1016/j.physletb.2012.08.020http://www.arXiv.org/abs/1207.7214http://dx.doi.org/10.1103/PhysRevLett.100.242001http://www.arXiv.org/abs/0802.2470https://cds.cern.ch/record/1581306https://cds.cern.ch/record/1581306https://twiki.cern.ch/twiki/bin/view/CMSPublic/BoostedBTaggingPlots2014http://dx.doi.org/10.1007/JHEP06(2015)080http://www.arXiv.org/abs/1503.01952http://www.arXiv.org/abs/1507.07129http://www.arXiv.org/abs/1602.08762http://www.arXiv.org/abs/physics/0703039

  • References 17

    [10] CMS Collaboration, “Measurement of BB̄ Angular Correlations based on SecondaryVertex Reconstruction at

    √s = 7 TeV”, JHEP 03 (2011) 136,

    doi:10.1007/JHEP03(2011)136, arXiv:1102.3194.

    [11] LHC Higgs Cross Section Working Group Collaboration, “Handbook of LHC HiggsCross Sections: 3. Higgs Properties”, doi:10.5170/CERN-2013-004,arXiv:1307.1347.

    [12] CMS Collaboration, “The CMS experiment at the CERN LHC”, JINST 3 (2008) S08004,doi:10.1088/1748-0221/3/08/S08004.

    [13] L. Randall and R. Sundrum, “A Large mass hierarchy from a small extra dimension”,Phys. Rev. Lett. 83 (1999) 3370–3373, doi:10.1103/PhysRevLett.83.3370,arXiv:hep-ph/9905221.

    [14] T. Sjöstrand et al., “An Introduction to PYTHIA 8.2”, Comput. Phys. Commun. 191 (2015)159–177, doi:10.1016/j.cpc.2015.01.024, arXiv:1410.3012.

    [15] P. Nason, “A New method for combining NLO QCD with shower Monte Carloalgorithms”, JHEP 11 (2004) 040, doi:10.1088/1126-6708/2004/11/040,arXiv:hep-ph/0409146.

    [16] S. Frixione, P. Nason, and C. Oleari, “Matching NLO QCD computations with PartonShower simulations: the POWHEG method”, JHEP 11 (2007) 070,doi:10.1088/1126-6708/2007/11/070, arXiv:0709.2092.

    [17] S. Alioli, P. Nason, C. Oleari, and E. Re, “A general framework for implementing NLOcalculations in shower Monte Carlo programs: the POWHEG BOX”, JHEP 06 (2010) 043,doi:10.1007/JHEP06(2010)043, arXiv:1002.2581.

    [18] S. Frixione, P. Nason, and G. Ridolfi, “A Positive-weight next-to-leading-order MonteCarlo for heavy flavour hadroproduction”, JHEP 09 (2007) 126,doi:10.1088/1126-6708/2007/09/126, arXiv:0707.3088.

    [19] E. Re, “Single-top Wt-channel production matched with parton showers using thePOWHEG method”, Eur. Phys. J. C 71 (2011) 1547,doi:10.1140/epjc/s10052-011-1547-z, arXiv:1009.2450.

    [20] J. Alwall et al., “The automated computation of tree-level and next-to-leading orderdifferential cross sections, and their matching to parton shower simulations”, JHEP 07(2014) 079, doi:10.1007/JHEP07(2014)079, arXiv:1405.0301.

    [21] J. Alwall et al., “MadGraph 5 : Going Beyond”, JHEP 06 (2011) 128,doi:10.1007/JHEP06(2011)128, arXiv:1106.0522.

    [22] M. L. Mangano, M. Moretti, F. Piccinini, and M. Treccani, “Matching Matrix Elementsand Shower Evolution for Top-Quark Production in Hadronic Collisions”, JHEP 01(2007) 013, doi:10.1088/1126-6708/2007/01/013, arXiv:hep-ph/0611129.

    [23] NNPDF Collaboration, “Parton distributions for the LHC Run II”, JHEP 04 (2015) 040,doi:10.1007/JHEP04(2015)040, arXiv:1410.8849.

    [24] CMS Collaboration, “Event generator tunes obtained from underlying event andmultiparton scattering measurements”, Eur. Phys. J. C 76 (2016), no. 3, 155,doi:10.1140/epjc/s10052-016-3988-x, arXiv:1512.00815.

    http://dx.doi.org/10.1007/JHEP03(2011)136http://www.arXiv.org/abs/1102.3194http://dx.doi.org/10.5170/CERN-2013-004http://www.arXiv.org/abs/1307.1347http://dx.doi.org/10.1088/1748-0221/3/08/S08004http://dx.doi.org/10.1103/PhysRevLett.83.3370http://www.arXiv.org/abs/hep-ph/9905221http://dx.doi.org/10.1016/j.cpc.2015.01.024http://www.arXiv.org/abs/1410.3012http://dx.doi.org/10.1088/1126-6708/2004/11/040http://www.arXiv.org/abs/hep-ph/0409146http://dx.doi.org/10.1088/1126-6708/2007/11/070http://www.arXiv.org/abs/0709.2092http://dx.doi.org/10.1007/JHEP06(2010)043http://www.arXiv.org/abs/1002.2581http://dx.doi.org/10.1088/1126-6708/2007/09/126http://www.arXiv.org/abs/0707.3088http://dx.doi.org/10.1140/epjc/s10052-011-1547-zhttp://www.arXiv.org/abs/1009.2450http://dx.doi.org/10.1007/JHEP07(2014)079http://www.arXiv.org/abs/1405.0301http://dx.doi.org/10.1007/JHEP06(2011)128http://www.arXiv.org/abs/1106.0522http://dx.doi.org/10.1088/1126-6708/2007/01/013http://www.arXiv.org/abs/hep-ph/0611129http://dx.doi.org/10.1007/JHEP04(2015)040http://www.arXiv.org/abs/1410.8849http://dx.doi.org/10.1140/epjc/s10052-016-3988-xhttp://www.arXiv.org/abs/1512.00815

  • 18 References

    [25] CMS Collaboration, “Particle-Flow Event Reconstruction in CMS and Performance forJets, Taus, and MET”, CMS Physics Analysis Summary CMS-PAS-PFT-09-001 (2009).

    [26] CMS Collaboration, “Commissioning of the Particle-Flow reconstruction inMinimum-Bias and Jet Events from pp Collisions at 7 TeV”, CMS Physics AnalysisSummary CMS-PAS-PFT-10-002 (2010).

    [27] CMS Collaboration, “Performance of CMS muon reconstruction in pp collision events at√s = 7 TeV”, JINST 7 (2012) P10002, doi:10.1088/1748-0221/7/10/P10002,

    arXiv:1206.4071.

    [28] M. Cacciari, G. P. Salam, and G. Soyez, “The Anti-k(t) jet clustering algorithm”, JHEP 04(2008) 063, doi:10.1088/1126-6708/2008/04/063, arXiv:0802.1189.

    [29] M. Cacciari, G. P. Salam, and G. Soyez, “FastJet User Manual”, Eur. Phys. J. C 72 (2012)1896, doi:10.1140/epjc/s10052-012-1896-2, arXiv:1111.6097.

    [30] M. Cacciari and G. P. Salam, “Dispelling the N3 myth for the kt jet-finder”, Phys. Lett. B641 (2006) 57–61, doi:10.1016/j.physletb.2006.08.037,arXiv:hep-ph/0512210.

    [31] CMS Collaboration, “Determination of jet energy calibration and transverse momentumresolution in CMS”, JINST 6 (November, 2011) 11002,doi:10.1088/1748-0221/6/11/P11002, arXiv:1107.4277.

    [32] CMS Collaboration, “Pileup Jet Identification”, CMS Physics Analysis SummaryCMS-PAS-JME-13-005 (2013).

    [33] D. Krohn, J. Thaler, and L.-T. Wang, “Jet Trimming”, JHEP 02 (2010) 084,doi:10.1007/JHEP02(2010)084, arXiv:0912.1342.

    [34] S. D. Ellis, C. K. Vermilion, and J. R. Walsh, “Recombination Algorithms and JetSubstructure: Pruning as a Tool for Heavy Particle Searches”, Phys.Rev. D 81 (2010)094023, doi:10.1103/PhysRevD.81.094023, arXiv:0912.0033.

    [35] J. Thaler and K. Van Tilburg, “Identifying Boosted Objects with N-subjettiness”, JHEP 03(2011) 015, doi:10.1007/JHEP03(2011)015, arXiv:1011.2268.

    [36] CMS Collaboration, “Identification of b quark jets at the CMS Experiment in the LHCRun 2”, CMS Physics Analysis Summary CMS-PAS-BTV-15-001 (2016).

    [37] CMS Collaboration, “Identification of b-quark jets with the CMS experiment”, JINST 8(2013) P04013, doi:10.1088/1748-0221/8/04/P04013, arXiv:1211.4462.

    [38] CMS Collaboration, “Measurement of the differential cross section for top quark pairproduction in pp collisions at

    √s = 8 TeV”, Eur. Phys. J. C75 (2015), no. 11, 542,

    doi:10.1140/epjc/s10052-015-3709-x, arXiv:1505.04480.

    https://cds.cern.ch/record/1194487https://cds.cern.ch/record/1194487https://cds.cern.ch/record/1279341https://cds.cern.ch/record/1279341http://dx.doi.org/10.1088/1748-0221/7/10/P10002http://www.arXiv.org/abs/1206.4071http://dx.doi.org/10.1088/1126-6708/2008/04/063http://www.arXiv.org/abs/0802.1189http://dx.doi.org/10.1140/epjc/s10052-012-1896-2http://www.arXiv.org/abs/1111.6097http://dx.doi.org/10.1016/j.physletb.2006.08.037http://www.arXiv.org/abs/hep-ph/0512210http://dx.doi.org/10.1088/1748-0221/6/11/P11002http://www.arXiv.org/abs/1107.4277http://dx.doi.org/10.1007/JHEP02(2010)084http://www.arXiv.org/abs/0912.1342http://dx.doi.org/10.1103/PhysRevD.81.094023http://www.arXiv.org/abs/0912.0033http://dx.doi.org/10.1007/JHEP03(2011)015http://www.arXiv.org/abs/1011.2268http://dx.doi.org/10.1088/1748-0221/8/04/P04013http://www.arXiv.org/abs/1211.4462http://dx.doi.org/10.1140/epjc/s10052-015-3709-xhttp://www.arXiv.org/abs/1505.04480

  • 19

    A Additional performance curves

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y (u

    dscg

    )

    3−10

    2−10

    1−10

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 500 GeVT

    70 < m < 200 GeV , 300 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y (u

    dscg

    )3−10

    2−10

    1−10

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 800 GeVT

    70 < m < 200 GeV , 500 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y (u

    dscg

    )

    3−10

    2−10

    1−10

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 2500 GeVT

    70 < m < 200 GeV , 800 < p

    Figure 10: Comparison of the performance of the double-b tagger, the minimum CSVv2 valueamong the two subjets b tag scores, and fat jet b tag which exploits CSVv2 algorithm. The tag-ging efficiency for signal is evaluated using boosted H→ bb jets from simulation. The mistagrate is evaluated for simulated QCD jets containing zero b quark. Top-left for all jets with300< pT

  • 20 A Additional performance curves

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    )bb→

    Mis

    tagg

    ing

    effic

    ienc

    y (g

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 500 GeVT

    70 < m < 200 GeV , 300 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    )bb→

    Mis

    tagg

    ing

    effic

    ienc

    y (g

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 800 GeVT

    70 < m < 200 GeV , 500 < p

    Figure 11: Comparison of the performance of the double-b tagger, the minimum CSVv2 valueamong the two subjets b tag scores, and fat jet b tag which exploits CSVv2 algorithm. Thetagging efficiency for signal is evaluated using boosted H→ bb jets from simulation. Themistag rate is evaluated for simulated QCD jets containing two b quarks. Left for all jets with300< pT

  • 21

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y (b

    )

    -310

    -210

    -110

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 500 GeVT

    70 < m < 200 GeV , 300 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y (b

    )

    -310

    -210

    -110

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 800 GeVT

    70 < m < 200 GeV , 500 < p

    )bb→Tagging efficiency (H0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Mis

    tagg

    ing

    effic

    ienc

    y (b

    )

    3−10

    2−10

    1−10

    1double-b-tag

    Subjet CSVv2

    Fatjet CSVv2

    (13 TeV)CMS Simulation Preliminary

    AK8

    < 2000 GeVT

    70 < m < 200 GeV , 800 < p

    Figure 12: Comparison of the performance of the double-b tagger, the minimum CSVv2 valueamong the two subjets b tag scores, and fat jet b tag which exploits CSVv2 algorithm. The tag-ging efficiency for signal is evaluated using boosted H→ bb jets from simulation. The mistagrate is evaluated for simulated QCD jets containing one b quark. Top-left for all jets with300< pT

  • 22 B Data and MC comparison for the single-muon tagged selection

    B Data and MC comparison for the single-muon tagged selectionThe comparison between the data and the simulated samples of the variables that are usedas inputs to the double-b tagger for the single-muon tagged selection are reported in Fig. 13.In Fig. 14 we report also the double-b tagger output in data and simulated events. The totalnumber of entries in the simulation is normalized to the observed number of entries in data.Overall the agreement between data and simulation is fairly good.

    Tra

    cks

    / 2

    1

    10

    210

    310

    410

    510

    610

    710

    810

    910

    1010 Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    2D SIP for first track above b threshold20− 15− 10− 5− 0 5 10 15 20

    Dat

    a/M

    C

    00.5

    1

    1.52

    SV

    s / 1

    1

    10

    210

    310

    410

    510

    610

    710

    810

    910

    1010Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    number of SV0 1 2 3 4 5 6 7 8

    Dat

    a/M

    C

    00.5

    1

    1.52

    SV

    s / 0

    .2

    1

    10

    210

    310

    410

    510

    610

    710

    810

    910 Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    energy ratio0SV0 0.5 1 1.5 2 2.5 3 3.5 4

    Dat

    a/M

    C

    00.5

    1

    1.52

    Jets

    / 2

    1

    10

    210

    310

    410

    510

    610

    710

    810

    910

    1010Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    variablez0 5 10 15 20 25 30 35 40

    Dat

    a/M

    C

    00.5

    1

    1.52

    Figure 13: Distributions of 2D IP significance for the most displaced track raising the SV invari-ant mass above the b quark threshold, number of secondary vertices associated to the AK8 jet,the vertex energy ratio for SV0, and the z variable. Data and simulated events are shown forthe single-muon tagged jets selection. Simulated events are normalized to the yield observed indata, the overflow is in last bin. The bottom panel in each figure shows the ratio of the numberof events observed in data to that of the MC prediction.

  • 23

    AK

    8 je

    ts /

    0.1

    1

    10

    210

    310

    410

    510

    610

    710

    810

    910

    L M T

    Datauds quark or gluonc quarkc from gluon splittingb quarkb from gluon splitting

    (13 TeV, 25 ns)-12.6 fb

    CMSPreliminary

    Muon-tagged AK8 jetsMuon Enriched Multijet sample

    (AK8 jets) > 300 GeVT

    p

    double b-tagger discriminant 1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1

    Dat

    a/M

    C

    00.5

    1

    1.52

    Figure 14: Double-b tagger discriminant distribution in data and simulated samples for thesingle-muon tagged jets selection. Simulated events are normalized to the yield observed indata. The loose, medium and tight operating points are also reported. The bottom panel showsthe ratio of the number of events observed in data to that of the MC prediction.

    1 Introduction2 Strategy3 CMS detector and event samples4 Event reconstruction and fat jet identification5 Double-b tagger algorithm5.1 Discriminating variables5.2 Performance

    6 Efficiency measurement in data7 Mistag rate measurement from top quark jets in data8 ConclusionsA Additional performance curvesB Data and MC comparison for the single-muon tagged selection