Designing the Ideal Video Streaming QoE Analysis Tool 140618
Transcript of Designing the Ideal Video Streaming QoE Analysis Tool 140618
WhitePaper
Follow us on
Designing the ideal video streaming QoE analysis tool
Contents 1. Introduction .......................................................................................................................................................................... 1
2. Factors that impact QoE ................................................................................................................................................... 2
Encoding Profile ................................................................................................................................................................. 2
Network Conditions ........................................................................................................................................................... 3
Devices and Players ........................................................................................................................................................... 4
Combinations of Factors .................................................................................................................................................. 5
3. Measuring QoE – what are the output metrics? ........................................................................................................ 6
4. What does the ideal tool need to do? ........................................................................................................................... 7
5. Framework design ............................................................................................................................................................... 7
6. Video capture and analysis .......................................................................................................................................... 9
7. Summary ........................................................................................................................................................................ 10
1. Introduction EurofinsDigitalTestinghasbuiltapurposedesignedframeworkforconductingvideostreamingQualityofExperience(QoE)analysis.Inthiswhite-paperwedescribetheproblemanalysis,requirementsdefinitionanddesignprocessthatresultedintheresultingtoolthatwehavesinceusedonmultipleprojects.
OurbasicgoalwastoplaybackarangeofvideostreamsonarangeofdevicesandmeasuretheresultingQoE.However,thisisnotassimpleasitsounds.Inthisarticlewestartbyexaminingallthefactors(orinputs)thatcanultimatelyaffecttheenduser’sQoEwhenwatchingstreamedvideo,andhowtheseinputscanberepeatedlycontrolled.
WethenconsiderwhatQoEmetrics(oroutputs)shouldbemeasuredandputthisaltogethertodeduceasetofrequirementsthatshouldbemetbyanidealQoEanalysisframework.
Finally,wedescribehowweengineeredaframeworktomeettheserequirements,resultinginatoolthatcanmeasureQoEfordifferentcontent,networkconditionsanddevices/appsinawaythatisautomatedandeasilyvisualisedwhileprovidingdeepofflineanalysiscapabilities.
WhitePaper
Follow us on
2. Factors that impact QoE ThefactorsthataffectQoEcanbesummarisedinthefollowingpicture:
Figure1:OverviewoffactorsthatimpactQoE
Broadlyspeaking,thesefactorscanbeputintothreecategories:encodingprofile,end-to-endnetworkconditions,andthedevice/applicationthecontentisplayedbackon.Let’sconsidereachoftheseinturn.
Encoding Profile
Forourpurposesan“encodingprofile”referstotheentiresetofaudiovideodataandparametersdescribingasinglecontentasset(containingaudioandvideo,inmultipleadaptationsetsinthecaseofABRcontent).ThisistypicallystoredasasinglefragmentedMP4filebutdoesn’thavetobe.Anencodingprofilecanvaryinmanyways:
a) Contentcharacteristics–e.g.sport/movie/news/cartoonb) Codec–e.g.H.264/H.265/VP9/AV1c) Encodervendorchoice–identicalcodecscanhaveverydifferentperformanceaccordingto
theencoderusedd) ABRformat:
• HLSvsMPEG-DASH• Segmentlength
e) Numberandbitrateofrepresentationsorvariants–i.e.thedesignofthebitrateladderf) Videoresolutiong) Videoframe-rateh) Otherelementarystreamencodingparameters,including:
• Profile/Level• GOPstructure• CBR/VBR
Typically,higherbitratesareusedforhigherpictureresolutionsandframeratesandareimplicitlyassociatedwithhigherQoE.However,it’salsowidelyunderstoodthatthereisanon-linear
WhitePaper
Follow us on
relationshipbetweenbitrateandvideoqualityforagivenvideoresolutionandthat,aboveacertainthreshold,
simplyincreasingthebitrateavailabletoencodethevideoresultsininsignificantincreasesinpicturequality.Thismeanstherearesituationswhere,ifthereismorebandwidthavailable,simplyallocatingmorebitratebaseduponthecurrentprofile/resolution/frame-rateusedintheencodingprofilemaynotresultinnoticeablyimprovedQoEand,instead,perhapsahigherresolutionpictureshouldbeoffered.Ofcourse,mostoftheabovevideofactorshaveanaudioequivalent.FormostAVcontent,thevideobitrateisanorderofmagnitudegreaterthantheaudiobitratesohasgreaterimpactonQoE,whichiswhywefocussedonvaryingandcontrollingvideocharacteristic
Network Conditions
InanABRstreamingscenario,theselectionofvideostreamtoconsumefromasetofvideostreamsofvaryingbitratesisbasedontheprevailingnetworkconditionsatthetimeofviewing.Fluctuationsinthenetworkconditionscancausethequalityoftheviewingexperiencetodegradeorimprove.Thereisnosimplemodeloftheeffectdifferentnetworkimpairmentshaveonanygivenviewingeventsincethealgorithmusedbydifferentplayerswillresponddifferently,andtheimpactonqualityisoftenhighlynon-linear.
Thesignificantfactorsthatimpactnetworkconditionsforoneclientplayingonestreamincludethefollowing:
a) Effectivebandwidth/throughput(Mb/s)b) Packetlatency,jitter,reorderingandlossc) HTTPconnectionerrorrate
Allstagesintheend-to-endnetworkconnectionarerelevant:theperformanceoftheoriginserver,theCDN,theISP’slastmileconnection,andtheconsumer’shomenetworkallcombinetodeterminetheoverallnetworkconditions.Ourgoalwastobeabletomodelandcontroltheend-to-endnetworkcharacteristics,notanyindividualstage.
Thefactorslistedaboveallvaryovertimeandthenatureofthisvariationisacriticalfactorinmodellingreal-worldbehaviour.Weareabletosimulateawiderangeofbehaviourthroughaccesstoaverylargeandrichdatasetofreal-worldend-to-endnetworkconditions.
SincethemajorityofABRvideoistodaycarriedviaHTTPoverTCP,weassumedthatpacketlevelbehaviour(i.e.thesecondbulletpointabove)ishiddenbytheTCPlayerandweconstrainedjusttheeffectiveavailablebandwidth,ratherthanindividuallycontrolledfactorssuchaslatencyandjitter.InthecaseofmanyCDNstheHTTPconnectionerrorrate(i.e.thenumberoftimesaclientreceivesanHTTP404responseandhastore-requestasegment)issosmallthatitdoesnothaveamaterialimpactonQoE.
Thetimevaryingbitratethroughputavailabletotheclientwecallthe“networkprofile”
WhitePaper
Follow us on
Devices and Players
ThedeviceandviewingenvironmentusedtowatchthecontenthasaclearimpactontheQoEduetothefollowingfactors.
a) Decodercapabilities–formatsupportb) Deviceresourceload(CPU/memory)c) Displaysized) Displayresolutione) Otherdisplayproperties:
• Pixelrefreshspeed• Displayfilteringand“pixelimprovement”algorithms• Brightness• Viewingangle
f) ABRimplementationandbufferingalgorithmg) Networkstackcharacteristicsh) Viewingdistance&lightingconditions
Factorsa)andb)arecharacteristicsofthedevicesprocessinghardware,whilec),d)ande)arecharacteristicsofthedevice’sdisplayhardware.
Sometimethecontentmightnotbesupportedbythedeviceatallbecausethemediacodeclevelofthecontentisnotsupportedbythehardwarecomponentswithinthedevice(especiallywhenconsideringmobiledevicessuchasphonesandtablets).ThisisaparticularproblemonAndroiddeviceswherethemandatedlevelofcodecsupportisquitelowrelativetootherdevices,butitisknownthatmanydeviceshavehigherlevelsofcodecsupport.
Whiletheend-userdevicehardwareanddisplayitselfisobviouslyassociatedwithaviewer’sQoE,theimpactofthevideoplayerclientsoftwareusedtowatchthecontentisoftennotconsideredexplicitly.Factorsf)andg)abovearebothdeterminedbythedevice’ssoftware.Theplayersoftware,whetherattheapplicationlayerorpartofthebuilt-inmediaplayer,hasamaterialimpactontheQoEbecauseinABRvideoconsumptiontheplayerisresponsibleforchoosingwhichvariantbitratetostream,whentoswitch,andwhatbufferingstrategytoadopt.ABRalgorithmscanvaryveryconsiderablybetweenplayerapplicationsrunningonthesamedevice.
UnderstandinghowthesoftwareimpactstheQoEandaffectsthechoiceofencodingprofilesiscomplicatedbythefactthattherearedifferenttypesofplayeragentsoftwareandtheabilityofanOTTserviceprovidertoaffectthebehaviouroftheplayerdependsonthetypeofplayerinuse.Broadlyspeakingtherearetwoprimarycategoriesofplayer:
Embedded.Thisiswherethechoiceofbitratetostreamandpresentisdeterminedbyanunderlyingcapabilityoftheplatformtheplayerapplicationisrunningon.InthecaseofAppleiOS,HLSplaybackandbitrateselectionarecontrolledbyiOSitselfandtheappauthorcan’taffecttheadaptationbehaviourdirectly(ignoringoneortwoexceptionalcases,suchasNetflix).InthecaseofAndroid,thereisnativeHLSplaybackavailableviatheMediaPlayerAPI,butitislimitedbasedonthespecificversionofAndroid.Anotherexamplewouldbefor
WhitePaper
Follow us on
HbbTVdeviceswhichsupportMPEG-DASHviaprovidingthemanifest(MPD)directlytoanHTML5videoelementandtheunderlyingmiddlewarecontrolstheadaptationalgorithm;inotherwords,althoughit'sanHTMLapplication,HTMLMediaStreamingExtensions(MSE)arenotused.
Applicationcontrolled.Thisiswheretheserviceprovider’sapplicationitselfcontrolstheselectionofthebitrateandhasanalgorithmthatdetermineswhentoswitchprofile.Examplesinclude:HTMLbasedapplicationsusingMSE,e.g.theopensourcedash.jsandHLS.jsplayers:nativemediaplayerlibrariesthatcanbeembeddedintheapplication,e.g.theopen-sourceExoPlayerapplicationforAndroid;andcommerciallyavailablethird-partyplayerapplicationsthattheserviceprovidercancustomisetheUIfor.
InthisproposalthecombinationofdeviceandplayersoftwareisconsideredasasinglevariabledimensionwhenconsideringimpactonQoE.
Combinations of Factors
AllofthefactorsdescribedwithinthesethreecategoriescometogethertoaffecttheQoE.TheycanbevisualisedasdimensionsofsearchspacewhereeachpointinthatspacecorrespondstodifferentinputconditionsthatwillresultindifferentoutputQoE.
Inputconditions ResultingoutputQoEmetricsDevice/AppCombinations
EncodingProfiles
NetworkProfiles
QoEmetric1
QoEmetric2
…
DA1 EP1NP1 …
NPo
…NP1 … NPo
EPnNP1 … NPn
…
EP1NP1 … NPo
…NP1 … NPo
EPnNP1 … NPn
DAm
EP1NP1 … NPo
…NP1 … NPo
Figure2:AsearchspacerepresentationofthefactorsthatimpactQoE.Thetablerowsindicatethetestrunresultsfor‘m’device/appcombinations,‘n’encodingprofilesand‘o’networkprofiles.
NetworkProfiles
Device/Apps
Encoding Profiles
WhitePaper
Follow us on
Abriefconsiderationofallthefactorslistedabovequicklysuggeststhattestingallpossiblecombinationsleadstoavastsearchspaceduetothecombinatorialexplosion.Forexample,thefollowingexperimentwouldresultin25,000testruns!
• 50encodingprofiles,from,say,5contenttypes,2codectypes,5bitrateladderdesigns.
• 10networkprofiles• 50device/playercombinations,from,say,10deviceswith5differentweb-player
applications
Evenarelativelynarrowexperiment–e.g.examiningwhichsoftwareplayerondeviceXperformsbestforaspecifictypeofcontentandcodec–canrequirehundredsoftestrunstoenablestatisticallyrobustevaluation.Forthisreason,enablingefficientandrepeatablerunningofalargenumberoftestrunswasafundamentalrequirementofourtool.
3. Measuring QoE – what are the output metrics? ThefollowingmeasuresareallrelevanttoassessingtheQoEofstreamingvideo.
• Videostart-uptime:timefrominitiationofplaybacktofirstvideoframesbeingdisplayed.• Bufferingevents:thenumber,durationandlocationofeventsduringplaybackwherethe
videoispausedornotshowingwhilefurthersegmentsareacquired.• Switchingevents:thenumberandlocationofeventswheretheplayeradaptsfromone
bitratetoanother.• Frame-by-frameobjectivevideoquality.Tocalculatethis,weusedtheSSIMPLUSalgorithm–
afull-reference,objectivevideoqualitymetric.Wemadethischoiceafterevaluatinganumberofalternatives,includingVMAF,andthereasonforthechoiceofSSIMPLUShasbeenexplainedinaseparatearticle.
• Audioquality,includingaudio-videosync.
MeasuringABRQoEisanactiveandrelativelyimmatureresearchtopicandthereisnoagreedwayofcombiningthesemeasurementstoasimpleoverallQoEmetric.Worse,forsomeofthemetricsresearchexperimentshavecontradictoryconclusionashowevenasinglemeasureimpactsQoE.Forexample,considerthefrequencyandvisibilityofswitchingevents.Someresearchershaveconcludedthatswitchingeventsshouldbeavoidedasfaraspossibleleadingto“widelyspaced”encodingladderswithfewerrepresentations.OtherresearchershaveshownthatbarelyperceptibleswitchingeventshavelittleimpactonQoE,leadingto“tightlypacked”encodingladderswithahighernumberofrepresentations,whereeachadjacentrepresentationhasaqualitydifferencethatisonlyjustdistinguishable.Inpracticeitseemsthatencodingandstoragecostsarethemoreimportantfactorindeterminingthischoice:OTTproviderswithhighvalue,relativelystaticcatalogues(e.g.Netflix)willprefer“tightlypacked”ladders,whereasthosewithfastturnoverorlivecontent(e.g.broadcastercatch-upservices)arelikelytoprefer“widelyspaced”ladders.
WhitePaper
Follow us on
TheimpactonQoEofthenumberanddurationofbufferingeventsisequallypoorlyunderstood.Whilelessbufferingisobviouslydesirable,thereisnoconclusiveresearchonwhether,say,5x6sbufferingeventsisbetterorworsethanasingle60sevent.
4. What does the ideal tool need to do? So,havingevaluatedtheimpactfactorsthatneedtobevaried(networkconditions,encodingprofiles,device/players),andtheoutputmetricsthatneedtobemeasured,wearrivedatthefollowingsetofrequirementsforouranalysisframework.
• Automated.Thisistheonlywaytocostefficientlyrunhundredsofexperiments:itissimplynotviabletousemanualexecution.Additionally,measuringQoEaccuratelyandinarepeatablefashionmakestestautomationthebestchoice.
• Deviceagnostic.Theframeworkmustbeusableonanytypeofdevice(set-top-box,HDMIstick,mobilephone,tablet,computer,smartTV)andawidevarietyofscreentypeswithoutrelyingonframecaptureandanalysistechniquesthatworkonsomedevicesbutnotonothers.
• Contentandencoderagnostic.Wewantedthesolutiontobeagnosticofthecodec,encodingandpackagingplatformwithoutrelyingonparticulartricksorfeaturesofaparticularencoderpipeline.Thiswastomakesurewecouldevaluatedifferentcodecs,encodersandABRformatswithouthavingtore-architecttheframework.
• Networkreproducibility.Topreciselyandrepeatedlycontrolnetworkconditionsmeanstheframeworkhastorunwithinadedicatedlocalnetwork,whereavarietyoftypicalnetworkconditionscanbeaccuratelyreproduced.
• Powerfulresultsvisualisation.Allthemeasurementsmustbeautomaticallycapturedintoaspreadsheettoallowdetailedandcomprehensiveanalysis.However,wealsowantedtoprovideeasy,visualandintuitivebrowsingofindividualtestruns,allowingtheusertoseehowmetricsvaryovertimewhilealsobeingabletoseethecorrespondingvideothatwasbeingdisplayedontheclientdevice.Itwasalsoimportanttoenabledeepanalysisoftheresultswithouthavingtogobackandre-runaparticulartest–meaningthatallrelevanttestdate,suchasnetworktrafficlogs,shouldbestoredforlateruse.
• Accurate.Theframeworkmustofferframeaccurateanalysisoftheresultswhendeducingmetricslikeamountofbufferingtime,switchingeventsandstart-uptimes.Gatheringthesemetricswithsuchhighprecisionisnottrivial.
5. Framework design Theaboverequirementsresultedinthefollowingframeworkdesign,showninthediagramonthenextpage.
WhitePaper
Follow us on
Figure3:Schematicrepresentationoftestframework
Eachtestrunconsistsofplayingbackasinglepieceofvideocontentonasingletestdevice,whilemodifyingpropertiesofthenetworklinkbetweenthevideoserverandthedevice-under-test.Atestruncanbebrokenintothefollowingsteps:
• UsingAppium,SeleniumorIR,thedeviceisdriventoruntheappandplayavideofile.Theprecisesequenceofcontrolstepsdependsuponthedeviceinquestionandtheappbeingused.Carefultimesynchronisationismaintainedbetweenthedevicecontrolscript,thenetworkemulator,andthevideocaptureunit.
• Thenetworkprofileshapingscriptisexecutedwhichvariesthebandwidthforthedurationofthetestasrequired.Thenetworkshapingunitisalsoresponsibleforloggingallnetworktrafficduringthetests.
• ThecameraorHDMIcapturerecordsthedisplayscreenofthedevice-under-testfortwicetheknowndurationofthecontent(toallowforanybufferingevents).Therecordedvideoisstoredforeverytestrunforsubsequentanalysisandforusewithintheresultsvisualisation.
• Thedevicestartstoplaybackthelocalvideocontentandcontinuesuntilplaybackiscomplete.• Therecordedvideoisstoredandanalysedoffline,givingalistofalltheframe/representations
playedateverypointoftimeduringthecontentplayback.• Thisdataisusedtocalculateasetofqualitymetrics(listedabove)andstorethemina
spreadsheet.AtthesametimeanHTML/JavaScriptvisualisationofalltheresultsiscreatedandhostedinthecloudtoenablevisualisationandanalysisoftheresultsfromanylocation.
TestRunDescription
Device/appscript
Encodingprofile
Networkprofile
NetworkShaping
VideoCapture
VideoServer
DeviceControl
Testcontrol&sync
ResultsAnalysis
ViaIR,AppiumorSelenium
LAN
Devicesundertest
Camera
HLSsegmentsoverHTTP
Resultsandvisualisation
HDMICapture
WhitePaper
Follow us on
Thesestepsarethenrepeatedforeachtestrun,withtheURLofthevideocontentandthenetworkprofileadjustedtowhatisrequiredforthattestrun.Betweeneachtestrunthedeviceisrebootedandtheapplication
isrestartedtoensurethatallcachesarecleared.
6. Video capture and analysis EachsourcevideotypehastwosmallQRcodeembeddedinthesourcecontentthatvariesforeveryframeandforeveryrepresentation/variant.UsingacamerarecordingorHDMICapturefromthedevice/appplayingthestreamedvideo,werobustlyandautomaticallydetecteveryframethatwasplayedoutonthedevice/appundertestanddeterminewhichrepresentationwithintheABRformattheframewastakenfrom.Thisallowsfordetectionofdroppedframes,blankframesorbuffering,repeatedframes,anddetectionofswitchingevents.
ThecapturedvideoandcontainedQRcodesareonlyusedtocalculatetheexactframethatisbeingdisplayedatagivenmomentintime.Thequalityofthevideodisplayedbythedeviceisnotdeterminedusingtherecordedvideo.Instead,eachencodingofthesourcecontentispre-processedtodeterminetheSSIMPLUSscoreofeveryframewithineveryrepresentation/variant.WhenanalysingthetestrunthisSSIMPLUSscoreisthenlookedupforeverycapturedframe,meaningthatthequalityoftherecordedvideohasnoimpactontheSSIMPLUSscores.
Theresultsetforeachtestrunisveryrich,containinginformationabouthowthedisplayedframe,SSIMPLUSscore,availablebandwidthandutilisedbandwidthvaryovertime.Tofacilitateinterpretationofthisinformation,displayedalongsidethevideoandtheinferredQoEmetricstheresultscanbepresentedinaninteractivemannerinanywebbrowser.Byhoveringthemouseoverthetimeline,theusercanseethevalueofvariouspropertiesatthatmomentintime,aswellastheprecisevideoframethatwascapturedatthattime.
Figure4onthenextpageshowsatypicalresultsvisualisationforasingletestrun.
WhitePaper
Follow us on
Figure4:Atypicalresultsvisualisationforasingletestrun
7. Summary OurQoEanalysistoolhassucceededinfulfillingourdesignrequirementsandhasbeensuccessfullyusedtoevaluatevideostreamingQoEperformanceacrossmultipleprojects.Thishasinvolvedmorethan20device/appcombinations,10encodingprofilesandover10networkprofiles;leadingtohundredsofindividualtestruns.AlreadythishasledourcustomerstohaveagreaterdeeperinsightintohowtheirsystemchoicesimpactQoE–inmanycasesreachingconclusionsthatarenon-obviousandcounter-intuitive.
Ifyouwouldliketoseeademonstrationofthesystem,[email protected],youcanexploreasubsetoftheresultsbyvisitinghttps://ott-qoe.eurofins-digitaltesting.com
Follow us on