Jun. 2, 2008 Student: Shang-Yu Yeh ( 葉尚諭 ) Advisor: Dr. Hsueh -Ming Hang ( 杭學鳴 )
description
Transcript of Jun. 2, 2008 Student: Shang-Yu Yeh ( 葉尚諭 ) Advisor: Dr. Hsueh -Ming Hang ( 杭學鳴 )
1
Jun. 2, 2008Student: Shang-Yu Yeh ()Advisor: Dr. Hsueh-Ming Hang ()
Coding Efficiency and Quality Improvement for MPEG Surround Encoding11My WorkDesign MPEG Surround Encoding AlgorithmsSubset coding mode Parameter band strideParameter setsAdaptive smoothingImplementation in the Reference Software2work:spectoolsencodermodulecodingimplementref sw encoder
2OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsConclusion and Future WorkDemo
3outlinempsdemo3OutlineMPEG Surround IntroductionSpatial HearingMPEG Surround EncoderMPEG Surround DecoderProposed Procedures and Experimental ResultsConclusion and Future WorkDemo
4mpsencoderdecoder4Spatial HearingDescribing how human locate sound source in the horizontal placeInteraural Level Difference (ILD)Interaural Time Difference (ITD)Interaural Coherence (IC)
5spatial imagespatial hearingimagesource(direct left)time delayintensitytimeleveldifferencenon-coherencenon-coherenceIC
5MPEG SurroundLow-bitrate parametric coding technology for multi-channel audio signalBackward compatibility to stereo equipmentStandardizationCfP on SAC in March 2004Finalize in July, 2006 (ISO/IEC 23003-1)
6MPEG Surroundmulti-channelparametric coding()waveform coding(stereo) MPEG Surroundstandardization:04propose for sac2005mps2006finalize
6MPEG Surround EncoderCapture the spatial image of multi-channel audioGenerate a mono/stereo downmix
7mps Encoder(N-channel)fbbandsubband domain71banddomaindownmixchdmxinfolosschdmxsynthesistime domaindownmixaudio encoder/decoder(ex: mp3, aac etc)qcodingdmxdecoder
7MPEG Surround DecoderSynthesis multi-channel output signalBackward compatibility
8decencdecoderdownmixfbbitstreamN-channelsignaldecoderMPEG Surround decoderdownmixpath
8Downmix and Parameter ExtractionTwo elementary blocks construct hierarchical structuresR-OTT box (Reverse One-To-Two box)R-TTT box (Reverse Two-To-Three box)
9dmxparamdomainchannel downmixchannelSpecbasic element: ottboxtttbox1upmix22upmix3decencoderR-ottboxR-TTTbox ;R-ottbox2-channel input1-channel outputdmxinfolossspatial parameterR-tttbox3-channel input2-channel output spatial parameterFor example: 5.1channelaudio downmix2-channel5253r-ottboxr-tttboxbox
9Parameter Sets and BandsParameter sets: grouping of time slotsParameter bands: grouping of subbands
10enc subband domain71bandfsize=2048band32samplechsamplegroupingsample groupgroupingparameter setframe8groupingparameter bandnon-uniform7codingframe2pspairingpscodingqpentropybit
10R-OTT BoxCreate a mono downmix from a stereo inputExtract relevant spatial parametersChannel Level Differences (CLD)
whereInter-Channel coherence (ICC)
11R-ottboxparametercld: channelICC: correlation
11R-TTT BoxCreate a stereo downmix from three input channels
Two way to reconstruction the 3rd signalPrediction mode: 2 CPCs and ICC
Energy mode: 2CLDs
12
R-TTTbox3downmix3input()3()23Spec223cpcpredictionresidualiccenergymode2 cld3ch(codersbr)12Quantization and Entropy Coding SchemesQuantization - fine and coarseEntropy coding - Differential coding + Huffman tables
13quantizationfinecoarsecoarseQentropy codingdifferential codingDF and DTDTpilot-based codingdiffdatacodinghufftabtab1222datacodewordFPTPPCMraw data
rf sw encoderimplement PCM DF DT+1D Huff13OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsSubset coding mode Parameter band strideParameter setsAdaptive smoothingConclusion and Future WorkDemo
14mps14New Encoder Structure4 Additional modules:
15encoder4moduleredundancy15Subset coding mode 4 coding modes for each parameter subset:Default(0)Keep(1)Interpolation(2)Lossless(3)Ref S/W implements only the Lossless mode
16QcodingSpecsubset4modesubsetpsparameter subsetps4modedecodersubset:d,k,I,Lsyntax (4mode)ref swimplementlossless modedecoder3mode16Subset coding mode Flow chartSearch each mode for the least error
Compare with a threshold
Exploit correlation of time
17flow chartsubsetmode0 1 2errorerrorxximode iminthresholdthreshold3modeerrormodelossless modemoduleredundancy
(errorreconstruction reference datadecoder(defaultlossless))17Experimental ResultsOnly the Lossless mode costs bitsThe bitrate reduction can be estimated:
Testsequencesps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4159.0444.5751.8034.6435.2820.51275.7455.4677.6655.1674.5952.79366.8547.9858.8639.4140.0123.66459.5344.0650.9734.1133.4019.08563.3747.0459.2440.9642.0626.0518lossless modebitmodebitbitrate3psbs%codingmode%allsetlosslessdecisionmodeset
18Experimental ResultsComparisons:1950~60consistent19Experimental Results2 phenomena:Theoretical results larger than experimental resultsdifferential coding schemesNumber of parameter sets increases => theoretical & experimental results decreaseprobability distributions202:1)? moduleentropy codingdtt moduledtcodinggaintotal2)psentropy coding 20Experimental ResultsDistributions of DT data:
CLDICCpdfstandard deviationps=1ps=4ps=1ps=41.772.130.841.2121?dtstdevseqseqps1ps4 (why????????)information theorycorrelation221Parameter Band StrideParameter band cannot be adjustedThe frequency resolution is adjusted by parameter band stride4 strides for each parameter subset
Parameter bandsParameter groups using different stridesStride 1Stride 2Stride 5Stride 28442115531177421101052114147312020104128281461
22toolfgroupsubbandpbpbencodingMpsfreq resol: pbstridestridesubsetbandbandbandstridingpbpgmps4stridestridesebsetpb4stridepgPbpgceiling functionpb14stide53pg1pg4pb25
22Parameter Band StrideExploit correlation in frequencyCombined with the pairing decisionFlow chart:2 successive lossless subsets1 single subset
23stridefreqredundancybandcorrelationpairingsubsetcodingstridestridepairinputframecoding modedatalosslesssetlosslesslosslesssubset(3133mode)subsetcodingstride2setstridetotal4
23Parameter Band Stride4 possible results:2 successive subsets in a pair with the same stride (>1)2 successive subsets using different strides (>1)2 successive subsets in a pair with stride=11 subset coded individually244?4codingstride3pairing1)stride>12)stridepair>1stride3)strideerrorbandstridesubset pair
reconstructdecdecoder
24Experimental ResultsThe bitrate can be estimated by :
Test sequenceps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4145.2823.2936.1015.5827.649.95251.7821.8748.5720.9646.2819.54340.0417.1634.6113.9928.8710.71445.9923.5239.8918.6532.5113.18544.9323.0238.6518.2931.8713.3725stridebitrate3psbs%codingsubsetstride%subset_stride2stride2subsetR_stridexpbpgpb14stride53pg=14/3
25Experimental Results2 phenomena:Theoretical results larger than experimental resultsdifferential coding schemesNumber of parameter sets increases => theoretical & experimental results decreaseprobability distributions
2622coding mode:1)? moduleentropy codingdf moduledfcodinggaintotal2)ps
26Experimental ResultsDistributions of DF dataCLDICCpdfstandard deviationps=1ps=4ps=1ps=42.83.021.471.7527banddfstdevseqseqps1ps4information theorycorrelation2
27Comparisons of the 2 modulesUsing coding mode is more efficient than pbstrideCompare the DT and DF dataDTICCCLDDFICCCLDps=1ps=4ps=1ps=4ps=1ps=4ps=1ps=411.071.311.371.6611.852.142.953.2720.810.862.151.421.851.984.144.2430.791.171.941.7531.611.893.143.3940.841.211.772.1341.471.752.83.0250.921.141.461.6151.771.992.943.1928coding modepbstridecoding modestrideseqdt dfstdevcasedtdftcorrelationfmodestride(bitrateerrorerrorcmdpbsdappendix)
28Comparisons of the 2 modulesUsing pbstride are more overestimated than using coding mode modules Differential coding schemes
29stridebitratecoding modedifferential coding5seqcoding modepcmdtdfstridepcmpcm2strideentropy codinggainstride29Experimental Results-Combined with Coding ModeBitrate reduction percentage: 25~55%Complexity: 0.13%ps1ps2ps4154.1442.0827.06258.3657.3855.36350.3743.0729.04452.7542.2027.36554.5248.0633.40302modulebitrate25~55pscomplexitymodule0.13%filter bank30Time ResolutionDescribing the number of parameters for each parameter band2 kinds of framing:Fixed framing: divided into equal partsVariable framing: arbitrary divisions1~8 parameter setsRequiring dynamic decision 31time resolframeps??Spec21)decsetdec2)dec8quality31Time ResolutionA border existsLarge difference of parametersCalculate the differences of backward and forward extractions
Division at time slots with larger differences
32?time borderpspstime slot32Time Resolutionafd
33inputframeframeps1)tree structuretime slot2)2slot2sample peakthresholdpeak3)frametime slotpeakpeakborder ? peak peak countslotgroupgroupborder bordergroupslotcountcount countweighting
33Experimental Resultswaveforms
34
ainputbps=1c?decpsps
34Experimental ResultsAdditional bitrate:
Complexity:Test sequences12345Additional bitrate(%)4.094.836.3424.784.0035bitrate4seqseq19.5%25%iteration9(par)*32(slot)*71(hyband)*2L(window)35Parameter SmoothingCompensate for artifacts caused by coarse quantizationPerformed at the decoder side1st order IIR filter
36toolcoarseqpartifactstationarytooldecodertemporal smoothing1st iirwl-1wkonj2sdeltasdeltapsslotddeterminatetauencoderwkonjwltau464, 128, 256, 51236Parameter SmoothingFlow chartCompare smoothed coarse with fine quantized parametersChoose the configure with the least error
37taufine qerrorapplyps levelsubsetsubsetnormalize4smoothing constant(i=0~3)smoothcoarsefine qerrordecolddata37Experimental Resultswaveforms
38modulefine qcoare qsmoothinga bbqqpsmgfine q38Experimental ResultsBitrate variations:
Complexity:Test sequence12345Bitrate change %(cf. coarse quantized)0.510.550.690.640.53Bitrate change %(cf. fine quantized)-11.53-7.37-7.03-10.93-11.0039Bitratetoolcoarse qcoarse q1%toolsyntaxbit per framefine q10%
complexity0.4%fb39OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsConclusion and Future WorkDemo
40future work40ConclusionImplementation of some encoding procedure in the reference softwareExploit correlation along time axis and frequency axisBitrate reduction: 25~55%Theoretical EstimationAdaptive time resolution and parameter smoothing41spectoolencdecisioncoding modepbstridefreqredundancyqualitybitrate25~50%bitratetime resolutionsmoothingtool41Future WorkModify error measures Different band weightingsDifferent parameter weightingsFind a more precise evaluations of quality to fine-tuneSome other toolsResidual coding, temporal shapingetc42error measurebandbanderror measure
thresholdqualitythreshold
mpstool42OutlineMPEG Surround IntroductionProposed Procedures and Experimental ResultsConclusion and Future WorkDemo
4343Appendix44Filter Banks2 stages
45encdecfbanalysis filter2stagestageuniform 64-bandQMF fbfbSBRlow frequencyresolution3QMF bankfilteringdelay0QMF band6sub-subband; 1,2QMF band2sub-subbnad71bands
45OTT BoxSynthesize by a mono downmix with parameters
46
mono downmix XsXsenergyX1 X2cldX1X2iccX1 X2decorrelatorXdcommon rotation angle beta? XdupmixXd0betaupmix2
46R-TTT Box(2/2)Prediction mode:2 CPCs and 1 ICC:
where
Energy Mode:2 CLDs:
47?pred mode2cpc(channel prediction coefficient)2 icciccpred errorenergy mode2cld3chenergy ratio47TTT BoxPrediction Mode:With residual signal-> 2 CPCsWithout residual signal-> use the ICC to compensate the energy loss
Energy Mode:Energy reconstruction
48decoderPred moderes sig2cpcxd33input res sigiccresEnergy mode2cld48Experimental Results49Pbstride60~70%consistent
49bitrate reduction % without any error50dm0_xxxDataModeps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4Input0111.45 3.58 11.15 2.49 11.14 2.34 Input0240.60 24.03 39.56 20.63 38.36 19.81 Input0314.23 5.49 13.28 3.76 13.12 3.50 Input0411.45 3.64 11.14 2.52 11.12 2.28 Input0512.03 4.10 11.46 2.71 11.39 2.54 dmx_000DataModeps1ps2ps4theo_ps1exp_ps1theo_ps2exp_ps2theo_ps4exp_ps4Input0110.57 1.03 9.59 0.39 9.13 0.40 Input0233.67 10.54 32.68 10.76 32.11 10.58 Input0313.88 2.39 12.00 1.32 11.13 1.21 Input049.47 0.66 9.32 0.47 9.07 0.40 Input0510.10 1.06 9.41 0.46 8.92 0.43 50Reference Software EncoderParameter set=1Parameter band=20Tree structure: 5151, 5152, 525Time slots: 16, 32Fine quantizationDifferential in T/F, PCM + 1D Huffman
51CLDICC1235DT distributions52Prediction Mode of R-TTT Box2 ways to decoding:With residual signal:Without residual signal: use ICC to compensate energy loss How to decide appropriate CPCs and ICC?
53prediction modecpctttdec2residualresidualreconstruction erroriccresidlossclddeterminatecpcspecresidualcpcicc?53Prediction Mode of R-TTT Box
54Eq1residualinput sig1eq2residdecicceq3sig2eq42icc1checkiccicc=1residualenergy054Prediction Mode of R-TTT BoxChoose CPCs to make prediction more preciseResidual energy ->0 good predictionNot verified yet since the coder is not considered55estimation errorenergy0encodercpc? criterioncpc
moduleprediction energy modedepend oncoder
55coding efficiency and quality improvement for mpeg surorund encodingJun. 2, 2008Student: Shang-Yu Yeh ()Advisor: Dr. Hsueh-Ming Hang ()
5656T/F Transform
T/F Transform
T/F Transform
Downmix
SpatialParameterEstimation
AudioEncoder
CompressedAudioBitstream
Spatial Parameters
F/T Transform
F/T Transform
MPEG Surround Encoder
CompressedAudioBitstream
AudioDecoder
SurroundSynthesis
Spatial Parameters
Legacy Decoding
F/T Transform
F/T Transform
F/T Transform
T/F Transform
T/F Transform
MPEG Surround Decoder
O
A
B