Getting Started with IT Service Intelligence

55
Copyright © 2016 Splunk Inc. Getting Started with IT Service Intelligence Ahmed Kira – Sr. Sales Engineer, SF Bay Area

Transcript of Getting Started with IT Service Intelligence

Copyright©2016SplunkInc.

GettingStartedwithITServiceIntelligence

AhmedKira– Sr.SalesEngineer,SFBayArea

Agenda

2

3

ITSICoreConcepts

WhatisaService?

Service RequestsResponses

InITSI,aService isalogicalgroupoftechnologycomponentsthatauserdeemsneedtobemonitoredtogether.

Itcanoftenbegeneralizedasa“blackbox”whichwesendrequests,andexpectresponses

4

WhatisaService?

DNS RequestsResponses

TechnicalServices

Auth RequestsResponses

Web RequestsResponses

Servicescanbelowerlevel(technical)…

5

WhatisaService?

DNS RequestsResponses

TechnicalServices

CustomerTransactions

RequestsResponses

BusinessServices

Auth RequestsResponses

Web RequestsResponses

SupportDesk RequestsResponses

Servicescanalsobehigherlevel(business)…

6

WhatisaService?

PacketNetwork

HypervisorandHosts

RBMDBs

StorageTier

APIServices

WebServices

CustomerTransactions

Mobile

API/Middlew

are

PartnerPortal

DNS

ServicescanencompassmultipletiersoftheITdomain.Servicesmayalsodependuponotherservices

7

WhatisaKPI?

DNS RequestsResponses

KPI:NumberofrequestsKPI:ErrorrateKPI:AverageresponsetimeKPI:ServerCPUloadKPI:ServernetworkI/Ferrors

CustomerTransactions

RequestsResponses

KPI:NumberoftransactionsKPI:ErrorrateKPI:AverageresponsetimeKPI:CountofIncidentTicketsKPI:SyntheticTransxHealth

KPIsandHealthscoresconstitutethemeansbywhichServicesaremonitored.

8

9

KeyPerformanceIndicators(KPIs)

10

AKeyPerformanceIndicator(KPI)isaSplunksavedsearchcreatedwithintheITSIUIthathelpsmonitoraspecificfieldlikeCPU,Memory,NumberofErrors

andsoon.KPIsarecontainedwithinServices.

ServiceHealthScores

11

AHealthscoreisascoreform0-100(0beingcriticaland100beingnormal)thathelpsdeterminethehealthofaService.ItiscalculatedbasedonallKPIs

importanceanditsstatus(e.g.green,orange,red),onceeveryminute.

Let’sTalkEntities

12

● Entitiesaretherelevantcomponentsthatsupportaservice(oftenbutnotalwayshosts)

● Selectthecorrectentitieswithfilters,ANDs,ORs

● EntitylistcancomefromaCMDB,aspreadsheet,aSplunksearch…

13

ServiceDecomposition

ServiceDecompositioninITSI

14

Identifyahigh-valuebusinessservice

ServiceDecompositioninITSI

15

Identifytheprocessflowandunderlyingsub-services(Web->Middleware->DB->Middleware->Web)

ServiceDecompositioninITSI

16

Foreachsub-service,identifyKPIsthatwillshowhealthandstatus(Requests,responsetime,errors,OShealth…)

ServiceDecompositioninITSI

17

ForeachKPI,defineaSplunksearch

ServiceDecompositioninITSI

18

19

ITSISetupWalkthrough

TypicalITSIconfiguration● CreateServices&entities– ImportviaCSVorSplunkSearchorClone– Manual

● DefineKPIs– Selectfromavailabletemplate,DataModelbasedKPIs,orSplunkSearch

● PopulateGlassTables● Extracapabilities– AdaptiveThresholding– AnomalyDetection– MultiKPIAlerting– CustomCorrelationSearches– NotableEvents

20

AdaptiveThresholds

21

WhatifyourKPIdatalookslikethis?

22

AdaptiveThresholdsStaticthresholdswillnotwork…

23

AdaptiveThresholdsAdaptiveThresholdingworksbeautifullywithcyclical(andotherdynamic)data

AnomalyDetection

24

● MachineLearning

● Workswellfordatawithpatterns

● Requiressome“training”

● Candetectentitiesthataren’tbehavingthesameastheothers,notjustasingleKPIhavingissues

UseITSIasaManagerofManagersprocessing3rd partyALERTS

http://docs.splunk.com/Documentation/ITSI/2.4.1/User/Ingestthird-partyalertsasnotableevents

25

NotableEvents

WrapUp- Review

26

● High-valueservicescanbedecomposedandmodeledinITSI,usingmachinedatafromtherelevantsystems

● Services andKPIs canbecreatedinminutes,withsophisticatedthresholdingtechniquestodistinguish“normal”from“notnormal”

● GlassTablesallowservicehealthandKPImetricstobedisplayedinawaythatmakessensetospecificgroups,suchasExecutiveLeadership,BusinessServiceOwners,theNOC,DevOps&Others

● DeepDivesallowKPIstobecomparedside-by-sideacrossanytimerange,acceleratingrootcauseanalysisandsignificantlyreducingMTTR

● Multi-KPIAlertsandNotableEventsreducealertnoise,producingactionableeventsandameanstomanagethem

● …andit’sfuntobuild!

Wanttoexploreonyourown?

27

Signupforyourveryownseven-dayfreesandbox!http://splunk.com/ITSI

Thenclick:

You’llfindaSandboxGuideintheDashboards!IntheITSIappofyoursandbox,gotoSearch->Dashboards->SplunkITSISandboxGuide

SignUpforaGlassTableExerciseHarnessthecreativityanddomainknowledgeofyourorganizationtounlockthevalueofdataandsolveanimportantserviceproblemthroughajoint

serviceintelligenceworkshopwithkeystakeholders

Definemethodsfor:

• Proactiveservicemonitoring

• Reducedriskandfailures

• Fasterissueresolution

• Increasedbusiness

performance

Whatisit?

• 1DayOnsiteWorkshop

• Tightlylinkedwithvalue

• Collaborativeapproach

• BuildyourownSplunk

ITSIGlassTable……

Copyright©2015SplunkInc.

• 5,000+ITandBusinessProfessionals• 175+Sessions• 80+CustomerSpeakers

PLUSSplunk University• Threedays:Sept23-25,2017• GetSplunk CertifiedforFREE!• GetCPEcreditsforCISSP,CAP,SSCP

SEPT25-28,2017WalterE.WashingtonConventionCenterWashington,D.C.CONF.SPLUNK.COM

The8th AnnualSplunkWorldwideUsers’Conference

30

ThankYou

31

BackupContent

ServiceDecompositioninITSI

32

CLICK“GlassTables”

ServiceDecompositioninITSI

33

CLICK(openinnewtab)“ButtercupGamesBusinessProcess(INPROGRESS)”

ServiceDecompositioninITSI

34

CLICK(openinnewtab)“ButtercupGamesOnlineStore”

ServiceDecomp:TheBusinessProcesses

35

ServiceDecomp:End-To-EndProcessFlow

36

NewRequirements!

37

● CreateanewKPIfortheDBService:● NetworkUtilization

● ModifytheExecutiveGlassTableinordertoshowofftheservicesyouslaveover

“WEonlyhaveabout15minTODOWHAT???!!???”

Thinkabouthowlongthiswouldtakeyoutoday?

38

ConfigurationofDBService

Click Configure >Click Services

AKPIin5minutes?Absolutely!

39

ClickNew– GenericKPI

Select DataModel● HostOperatingSystem● Network● #bytes● Next

KPIsContinued….

40

SplunkBuildsSearchesforyou–OhYeah,that’shappeningJ

● Select Yesfor Splitby& Filteroptions● Select hostfor EntityLookup& Aliasoptions● Click Next

AlmostThere…

41

Select● KPISearchSchedule:EveryMinute● EntityCalculation:Average● Service/AggCalculation:Average● CalculationWindow:LastMinute● Click Next

● Unit:Bps● Click Next

FinalSteps…

42

Setyourthresholds:● Aggregate(All)● PerEntity

● Click “AddThreshold”TWICE● MaketheNeapolitanicecreamcolors

Yellow,Green,Yellow● Dragtheslidersaroundinordertoget

thecurrentdatagraphentirelyinsidetheGreen(normal) band

● Click Finish● Otheroptionsarealsoavailable,

includingadaptivethresholdsandanomalydetection

ITSIDemo–Troubleshooting

43

NamethatKPI!

44

FromthelistofKPIs,selectyournewone(atthebottom)● Clickonthelittlepencilnexttothename● Callit“NetworkUtilization”,

withyourusernameupfront

● ClickonSave atbottomrightwhenfinished!

Let’sFixthatGlassTable

45

ClonetheGlassTable

46

ReturntoSavedGlassTablespage(click onGlassTablesintheuppermenubar)

CLICKEdit for“ButtercupGamesBusinessProcess(INPROGRESS)”• Select Clone• Title:Add yourusername

tothefront• Permissions:SharedinApp• Click ClonePage

• Click onyournewGlassTablefromthelist,toviewit

Edit&HaveFun!

47

ClickonEdit intheupperrightcornerofyourGlassTable

Usethe“Services”panelonthelefttoselectIndividualKPIs,or AggregateServiceHealthScores• Choose2KPIsfromOnlineStore thatwouldbeusefulin

the“OrderProcess”section• Dragtheselectedwidgetsontothecanvas,positioningin

thegrayoval

• What’sthedifferencebetweenthe

and toolsatthetopleft?

MoreFunwiththeGlassTableEditor…

48

UsetheConfigurations panelontherighttoeditaselectedwidget• Canchangethevisualizationtype,drilldown

behavior,andothersettings

• YoushouldhitSave frequently• IwonderwhatAutoLayoutdoes?• (YIKES!)RevertAllChangesmightbehelpful

Finishingup…

49

• AddaServiceHealthScore widgetforOnlineStoreunderButtercup

• ChooseaVizTypewithasparklinegraph,thenresizetomakeitlookpretty

• ModifytheCustomDrilldownactiontogotothesavedglasstable,ButtercupGamesOnlineStore

• BonusPoints:Makethelabelbigger,morereadable

• Click Save• View whendone

ATroubleshootingExercise

50

Let’suseITSItotroubleshootanoutage● StartatyourGlassTable,“<UserName>ButtercupBusinessProcess”● CustomerCarereportsthatunhappycustomersarecomplainingoffailures

andlongdelayswhentryingtopurchase● Thecallsbegancominginataroundtenminutesafterthehour.● IntheupperrightcorneroftheGlassTable,changethetimepickerfromNow

toXX:10:00.0,whereXXistheappropriatehour.Forexample,ifitiscurrently14:05,setthetimepickerto13:10:00.0,thenApply

● Thisishowwecan“timetravel”backtoseeconditionsataparticularoutage– ohyeah!

ATroubleshootingExercise,cont’d

51

● TheOnlineStoreseemstobedegraded,justasCustomerCarereported.ClickonthewidgetunderButtercuptodrilldownfurther

ATroubleshootingExercise,cont’d.

52

● TheOnlineStoreGlassTableshowsamuchmoredetailedview,includingtheimpactedcustomer-facingKPIsatthefarleft(Revenue,etc)

● Basedonthisviewofalltherelevantservices,wheredoyouthinktherootcauselies?

● Whichserviceshouldwetroubleshootfirst?● ClickonHealthwidgetforthatservice,to

drilldowntoaDeepDive

DeepDive

53

● DeepDiveshowsmultipleKPIsandHealthScoresinparallel“swimlanes”.

● TheHealthScoreforthisServiceisthetopswimlane.Canyouseewhenitbeginstodegradefrom100%?

● Mousing overthispointintime,canyouspottheKPIwiththeleadingfaultindication,i.e.,whatfailedfirst?

Multi-KPIAlertsandNotableEvents

54

● Click onNotableEventsReview● MultipleKPIsandHealthscorescan

becombinedinsophisticatedwaystocreateMulti-KPIalerts

● WhenaMulti-KPIalertfires,oneoftheoutcomesisthecreationofaNotableEvent

● NotableEventsallowNOCpersonnelandotherstotriageandcoordinateeventmanagementefforts

ServiceAnalyzer

55

● Click onServiceAnalyzer> DefaultServiceAnalyzer

● Backwherewestarted!● Thisviewshowsa“no-frills”listof

services(top)andhottestKPIs(bottom)

● ProvidesaquickjumpingoffpointintoDeepDivesandtheNotableEventsReview

● ItisusefulforNOCsandotherswhoneedahigh-levelsituationalview