Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video...

13
1 Codec and GOP Identification in Double Compressed Videos P. Bestagini, Member, IEEE, S. Milani, Member, IEEE, M. Tagliasacchi, Member, IEEE, S. Tubaro, Senior Member, IEEE Abstract—Video content is routinely acquired and distributed in digital compressed format. In many cases, the same video content is encoded multiple times. This is the typical scenario that arises when a video, originally encoded directly by the acquisition device, is then re-encoded, either after an editing operation, or when uploaded to a sharing website. The analysis of the bitstream reveals details of the last compression step (i.e., the codec adopted and the corresponding encoding parameters), while masking the previous compression history. Therefore, in this paper we consider a processing chain of two coding steps, and we propose a method that exploits coding-based footprints to identify both the codec and the size of the Group Of Pictures (GOP) used in the first coding step. This sort of analysis is useful in video forensics, when the analyst is interested in determining the characteristics of the originating source device, and in video quality assessment, since quality is determined by the whole compression history. The proposed method relies on the fact that lossy coding is an (almost) idempotent operation. That is, re-encoding a video sequence with the same codec and coding parameters produces a sequence that is similar to the former. As a consequence, if the second codec in the chain does not significantly alter the sequence, it is possible to analyze this sort of similarity to identify the first codec and the adopted GOP size. The method was extensively validated on a very large dataset of video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC, DIRAC) and different encoding parameters. In addition, a proof of concept showing that the proposed method can be used also on videos downloaded from YouTube is reported. Index Terms—Video forensics, video codec, coding-based foot- prints, GOP identification I. I NTRODUCTION Due to the increasing availability of inexpensive digital devices, camcorders are becoming widespread on the market, being embedded in virtually all smartphones. Moreover, thanks to the ubiquitous availability of high-speed Internet connection and the increasing use of video sharing web sites (e.g., YouTube, Vimeo, etc.), many users upload video sequences on the web. At the same time, digital videos might undergo several editing steps during their lifetime. For example, after acquiring a sequence, a user might manipulate it to enhance its quality. Alternatively, after having downloaded a sequence P. Bestagini, M. Tagliasacchi and S. Tubaro are with the Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Pi- azza Leonardo da Vinci 32, 20133, Milan, Italy (e-mail: paolo.bestagini / marco.tagliasacchi / [email protected]). S. Milani is with the Department of Information Engineering, Uni- versity of Padova, Via Gradenigo 6/b, 35131, Padova, Italy (e-mail: si- [email protected]). This work is an extension of previous work published at ICASSP’12 [1] and EUVIP’13 [2] by the same authors. from a website, a user might apply different kinds of transfor- mations, including, e.g., cropping, scaling, color adjustment, text/logo insertion, etc. After any editing operation, video se- quences are usually encoded, since uncompressed video would lead to a huge amount of data to be stored or transmitted. Therefore, it is very likely that a video sequence available online has been compressed multiple times. When a sequence is decoded to the pixel domain and then re-encoded, any information regarding the previous coding step is apparently lost, since the previous compression history cannot be simply obtained by parsing the bitstream of the last coding step. However, video coding is lossy in most cases, since it encompasses the use of quantization, which is a non-invertible operation. Thus, each coding step is bound to leave characteristic traces, or footprints, on the compressed video, which can be leveraged as an asset to reconstruct its compression history. This sort of information is very useful in video forensics [3], since an analyst might be able to estimate the characteristics of the previous coding steps (e.g., the codec adopted and the corresponding encoding parameters, namely, GOP structure, quantization parameters, coding modes, etc.). A forensic analyst can successfully exploit the knowledge of the compression history in many ways: i) it might serve as a potential clue for source device identification, as it might, or might not, be compatible with some acquisition device models; ii) it might be used for video splicing detection, in those cases in which sequences originally encoded separately, are then spliced together (and re-encoded) into the same sequence; iii) it might be adopted to detect removed/inserted frames, since this might alter the original GOP structure. In addition, there are other fields besides video forensics in which the knowledge of the compression history is highly valuable. For example, in video quality assessment, it is interesting to automatically estimate the quality of a sequence in a no-reference set-up, i.e., without the availability of the original sequence. Although many no-reference video quality metrics do exploit directly the information contained in the bitstream of the last coding step, this might not be representative of the actual visual quality, e.g., when a content is first compressed at low rate, and then re-compressed at a higher rate [4]. In this paper we address the problem of reconstructing the compression history of video sequences. We consider the case of double compressed video sequences, and we focus on the identification of the coding-based footprints left by the first encoder. Specifically, we aim at determining the codec adopted by the first step, namely the coding standard, and the cor- responding GOP size. Although multiple video compression

Transcript of Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video...

Page 1: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

1

Codec and GOP Identification in DoubleCompressed Videos

P. Bestagini, Member, IEEE, S. Milani, Member, IEEE, M. Tagliasacchi, Member, IEEE, S. Tubaro, SeniorMember, IEEE

Abstract—Video content is routinely acquired and distributedin digital compressed format. In many cases, the same videocontent is encoded multiple times. This is the typical scenariothat arises when a video, originally encoded directly by theacquisition device, is then re-encoded, either after an editingoperation, or when uploaded to a sharing website. The analysisof the bitstream reveals details of the last compression step (i.e.,the codec adopted and the corresponding encoding parameters),while masking the previous compression history. Therefore, inthis paper we consider a processing chain of two coding steps,and we propose a method that exploits coding-based footprintsto identify both the codec and the size of the Group Of Pictures(GOP) used in the first coding step. This sort of analysis is usefulin video forensics, when the analyst is interested in determiningthe characteristics of the originating source device, and in videoquality assessment, since quality is determined by the wholecompression history. The proposed method relies on the factthat lossy coding is an (almost) idempotent operation. That is,re-encoding a video sequence with the same codec and codingparameters produces a sequence that is similar to the former.As a consequence, if the second codec in the chain does notsignificantly alter the sequence, it is possible to analyze this sortof similarity to identify the first codec and the adopted GOP size.The method was extensively validated on a very large dataset ofvideo sequences generated by encoding content with a diversityof codecs (MPEG-2, MPEG-4, H.264/AVC, DIRAC) and differentencoding parameters. In addition, a proof of concept showing thatthe proposed method can be used also on videos downloaded fromYouTube is reported.

Index Terms—Video forensics, video codec, coding-based foot-prints, GOP identification

I. INTRODUCTION

Due to the increasing availability of inexpensive digitaldevices, camcorders are becoming widespread on the market,being embedded in virtually all smartphones. Moreover, thanksto the ubiquitous availability of high-speed Internet connectionand the increasing use of video sharing web sites (e.g.,YouTube, Vimeo, etc.), many users upload video sequenceson the web. At the same time, digital videos might undergoseveral editing steps during their lifetime. For example, afteracquiring a sequence, a user might manipulate it to enhanceits quality. Alternatively, after having downloaded a sequence

P. Bestagini, M. Tagliasacchi and S. Tubaro are with the Dipartimentodi Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Pi-azza Leonardo da Vinci 32, 20133, Milan, Italy (e-mail: paolo.bestagini /marco.tagliasacchi / [email protected]).

S. Milani is with the Department of Information Engineering, Uni-versity of Padova, Via Gradenigo 6/b, 35131, Padova, Italy (e-mail: [email protected]).

This work is an extension of previous work published at ICASSP’12 [1]and EUVIP’13 [2] by the same authors.

from a website, a user might apply different kinds of transfor-mations, including, e.g., cropping, scaling, color adjustment,text/logo insertion, etc. After any editing operation, video se-quences are usually encoded, since uncompressed video wouldlead to a huge amount of data to be stored or transmitted.Therefore, it is very likely that a video sequence availableonline has been compressed multiple times.

When a sequence is decoded to the pixel domain and thenre-encoded, any information regarding the previous codingstep is apparently lost, since the previous compression historycannot be simply obtained by parsing the bitstream of thelast coding step. However, video coding is lossy in mostcases, since it encompasses the use of quantization, whichis a non-invertible operation. Thus, each coding step is boundto leave characteristic traces, or footprints, on the compressedvideo, which can be leveraged as an asset to reconstruct itscompression history. This sort of information is very useful invideo forensics [3], since an analyst might be able to estimatethe characteristics of the previous coding steps (e.g., the codecadopted and the corresponding encoding parameters, namely,GOP structure, quantization parameters, coding modes, etc.).A forensic analyst can successfully exploit the knowledge ofthe compression history in many ways: i) it might serve as apotential clue for source device identification, as it might, ormight not, be compatible with some acquisition device models;ii) it might be used for video splicing detection, in those casesin which sequences originally encoded separately, are thenspliced together (and re-encoded) into the same sequence; iii)it might be adopted to detect removed/inserted frames, sincethis might alter the original GOP structure. In addition, thereare other fields besides video forensics in which the knowledgeof the compression history is highly valuable. For example,in video quality assessment, it is interesting to automaticallyestimate the quality of a sequence in a no-reference set-up,i.e., without the availability of the original sequence. Althoughmany no-reference video quality metrics do exploit directly theinformation contained in the bitstream of the last coding step,this might not be representative of the actual visual quality,e.g., when a content is first compressed at low rate, and thenre-compressed at a higher rate [4].

In this paper we address the problem of reconstructing thecompression history of video sequences. We consider the caseof double compressed video sequences, and we focus on theidentification of the coding-based footprints left by the firstencoder. Specifically, we aim at determining the codec adoptedby the first step, namely the coding standard, and the cor-responding GOP size. Although multiple video compression

Page 2: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

2

can be encountered in some situations, double compression isarguably the most common use case, which arises any timeuser generated content is acquired (and compressed) by adevice and subsequently uploaded to a video sharing web site.

Unlike double image compression, in which the same codec(namely JPEG) is typically assumed to be used in both codingsteps, video compression is often performed with codecsdefined by different standards, thus complicating the analysis.In our study, we considered widely adopted video codingstandards that follow the conventional hybrid DCT/DPCMarchitecture, namely MPEG-2 [5], MPEG-4 part 4 (MPEG-4) [6] and MPEG-4 part 10 - H.264/AVC (AVC) [7]. Inaddition, we extended the analysis to include a wavelet-basedcodec, DIRAC [8].

The proposed method is based on the idea that quantizationis an idempotent operation. Indeed, re-quantizing a scalar valuewith the same quantizer produces exactly the same value.Somewhat similarly, in the specific case of video coding, re-encoding a previously compressed video sequence adoptingthe same codec and coding parameters produces as output asequence that is similar to the former. Therefore, our ideais that, in principle, it is possible to re-encode the available(double compressed) video sequence with different codecs,testing for each of them different encoding parameters, andlooking for the configuration of codecs/parameters which min-imizes the distortion introduced in the re-encoded sequence.As a matter of fact, instead of performing a full search ofall coding parameters (i.e., GOP, use of filters, macro-blockstructure, etc.), we propose an algorithm that is able to loopover just two of them (i.e., the codec and the quantizationstep). The same method is also used for determine the GOPsize of the first coding step, by looking for those frames whichwere originally encoded using intra-frame coding (I-frames).

As the proposed method is based on re-encoding the videosequence under analysis, the computational time stronglydepends on the size of the search space of the re-encoding pa-rameters (i.e., number of tested codec and quantization steps).However, this is not considered to be an issue in most forensicapplications where the amount of videos to analyze is limited(e.g., even just one suspect video). Anyway, in this manuscriptwe also show that it is possible to significantly decrease thesize of the parameters’ search space strongly reducing thecomputational complexity of the proposed algorithm withoutimpairing its accuracy. The possibility of reducing the com-putational complexity of the proposed algorithm enables it tobe used also for different applications, where efficiency is acompelling requirement.

Our method was validated on a very large dataset of doublecompressed sequences, which was generated with a wide rangeof codecs and encoding parameters. The results reveal thepossibility of identifying the coding standard and the GOP sizeof the first coding step, depending on the strength of the secondcoding step. Specifically, when the distortion introduced by thesecond coding step is not significantly stronger than the dis-tortion introduced by the first codec, both the coding standardand the GOP size can be reliably estimated. Moreover, in orderto show that the proposed method can be applied in a typicalreal-world scenario, we tested it on a set of video sequences

uploaded and downloaded from YouTube. Also in this case thecodec of the first compression step was correctly identified.

The rest of this paper is organized as follows. Section IIdiscusses the related work, focusing on similar ideas recentlyapplied to the case of still images. Section III introduces theproblem addressed in this paper and Section IV the idea ofexploiting the idempotent property of quantization. Section Villustrates in details the proposed method to identify both thecoding standard and the GOP size. A thorough experimentalvalidation is presented in Section VI, and Section VII con-cludes the paper, providing insights on open problems andfuture challenges.

II. RELATED WORK

The study of the compression history has been widelyaddressed in the past literature for the case of still im-ages [9]. The problem of determining whether an image wascompressed with JPEG was originally studied in [10] andfurther explored in [11], proposing a method to estimate thequantization matrix by looking at the periodicities in thedistribution of the DCT coefficients. In the case of forgeries,the image is often compressed twice, thus stimulating workaimed at determining whether an image (or part of it) is singlevs. double compressed [12]. The solution to this problem wasrecently extended to the challenging case in which the image iscropped [13], resized [14], or contrast enhanced [15] betweenthe first and second compression. Although most of the imagesare compressed using JPEG, in [16] the authors show howto discriminate between different block-wise transform imagecodecs (e.g., DCT-based, or DWT-based). A theoretical anal-ysis of the footprint left by transform coding, a key elementin most image coding architectures, was presented in [17] andlater extended in [18] for the case of double compression.

The study of the compression history of video sequenceshas been explored only more recently [3]. This has to do withthe intrinsic difficulty in dealing with video sequences ratherthan still images. First, in the case of video, there is no singlecoding standard that is universally used. Conversely, differentstandards are being adopted depending on the scenario, sothat sequences might be encoded with legacy MPEG-2 andMPEG-4 encoders, as well as with the more recent AVCand HEVC [19] standards. Second, the number of degreesof freedom when configuring a video encoder is significantlylarger than in JPEG. These include, for example, the choiceof the GOP structure, the coding mode decision rules, themotion estimation algorithm, the rate control algorithm, etc.Third, complex statistical dependencies are created betweenthe quantized coefficient values in different frames, due to theadoption of motion-compensated prediction. For these reasons,early work addressing video compression focused on theanalysis of MPEG-2 encoded sequences, and more specificallyon I-frames, to detect the quantization parameter [20] ordouble compression [21], being a straightforward extensionof the methods applicable to still images. Conversely, theearly works addressing the specific challenges posed by theanalysis of compressed video focused on the estimation of thequantization parameter (QP), which determines the distortion

Page 3: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

3

introduced in each frame (and, in some standards, in eachcoding unit). In [22] and [23], the authors propose a methodto estimate the QP parameter in both I- and P-frames forthe case of, respectively, MPEG-2 and AVC coded video, byanalyzing the histograms computed from DCT coefficients ofprediction residuals. The estimation of the GOP structure (i.e.,how I- , P- and B-frames alternate in a sequence), or, moresimply, the GOP size (i.e., the distance between consecutiveI-frames) was addressed in [24] for single-compressed videobased on the strength of spatial blocking artifacts. A methodsuitable for the case of double-compression was proposedin [25], by exploiting the footprints left by the skip codingmode. Anyway, in this manuscript we propose a more generalmethod that works also with codecs based on different kindsof transforms (i.e., DCT and DWT) not exploiting macro-blocks size. A more recent work targeting double-compressedvideo is [4], where the authors estimate the bitrate adopted inthe first compression step. In this case, the first and secondcodecs are considered to be the same, thus known. Otherworks address the analysis of motion vectors. In [26], it isshown how to reconstruct motion vectors in AVC-encodedvideo from decoded pixels, whereas the identification of themotion estimation strategy is proposed in [27]. In case ofmultiple video compression, a method to estimate the numberof compression steps is described in [28], which is basedon the analysis of the distribution of the first digits of DCTcoefficients, thus extending the previous work that covered thecase of JPEG compression [29].

This paper goes beyond the previous literature by addressingfor the first time the problem of estimating the video codingstandard adopted in the first coding step in the case ofdouble video compression. The proposed method is basedon a recompress-and-observe paradigm, inspired by similarmethods presented in the literature, which exploit the idem-potency property of quantization in order to, e.g., estimatethe quality factor in JPEG compressed images [30], exposingforgeries in compressed images [31] and to detect JPEG anti-forensics [32]. However, to the best of the authors’ knowledge,it has never been applied to the problem of identifying thevideo codec.

With respect to our previous conference publications [1],[2], several improvements have been made:• We present an analytical explanation of the idempotency

property of quantizers (i.e., Section IV), which explainsthe rationale behind the proposed algorithm.

• We address for the first time the problem of the estima-tion of the GOP size, which was assumed to be givenin [1], [2]. To this purpose, we also provide a thoroughvalidation of the proposed GOP estimation algorithm (seeSection VI).

• The codec identification algorithm has been modified toincrease its robustness to the masking introduced by thesecond coding step. The new algorithm does not require alinear behavior between the Quantization Parameter (QP)(or its logarithm) and Peak Signal to Noise Ratio (PSNR),which was assumed in [1], [2].

• The dataset used for testing has been widely expanded.We are now considering more videos (also at different

resolutions) and more codecs (both DCT- and DWT-based). Moreover we tested the effect of using differentimplementations of the same codec (for both MPEG4 andH.264). Additionally, we tested the proposed methods onvideos downloaded from YouTube to prove the effective-ness of the new algorithm also in a real-world scenario.

• We investigated the possibility of reducing the computa-tional complexity of the presented algorithm, demonstrat-ing the trade-off between accuracy and complexity in ourexperimental results.

III. PROBLEM FORMULATION

Conventional video coding is implemented according toan architecture that can be described with a chain of basicprocessing operators, which is illustrated in Figure 1. Theencoder processes a video sequence X frame-by-frame. Eachframe is divided into blocks x, which are then coded accordingto two main coding modes (note that in some cases, the frameincludes a single block, which is as large as the whole frame).In intra-frame coding, each I-frame is considered as a stand-alone image, and inter-pixel correlation within the same blockis exploited by means of transform coding. To increase thecoding efficiency, in the more recent standards (e.g., AVC andHEVC) intra prediction can be enabled by using a predictionalgorithm P , which computes a predictor from the pixelsof neighboring blocks and subtracts it to the current blockbefore applying transform coding. In inter-frame coding, foreach block of a P- or B-frame, a predictor is computedexploiting the pixels in one (or more) reference frame, bymeans of Motion Estimation ME and Motion CompensationMC. Specifically, ME looks for the best predictor in thereference frame, which is identified by means of a motionvector (MV). MC subtracts the predictor to the current blockto obtain the prediction residual, which is then processed withtransform coding.

Regardless of the coding mode, transform coding playsa central role in any video coding architecture. The inputblock, or the corresponding prediction residual, is transformed(typically using an orthonormal transform) and the transformcoefficients y are quantized with Q to obtain y. The encodedbitstream is generated by entropy coding the quantized trans-form coefficients. Since entropy coding is perfectly lossless, itdoes not leave any footprint, and therefore it is omitted fromthe scheme in Figure 1.

To avoid drift, the encoder embeds the decoder, whichreconstructs video frames in the pixel domain, so that they canbe used as reference in inter-frame coding and intra prediction.Decoded blocks x are obtained applying the inverse transformT−1 to y, and adding back the predictor (if needed). Finally,pixel values are rounded to unsigned integers. Rounding canbe considered as a second quantization step, applied in thepixel domain. However, since the distortion introduced in thiscase is far less pronounced than the one due to the quantizer Qapplied in the transform domain, it is omitted from the schemein Figure 1.

A decoder is able to reconstruct a video sequence in thepixel domain from the received compressed bitstream. In

Page 4: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

4

+ + T T�1Q

ME

MCP

-

x xyy

Fig. 1: Simplified block diagram of a conventional video encoder. Tis an orthonormal transform, Q is a quantizer, ME and MC perform,respectively, motion estimation and compensation, and P computesthe spatial prediction in intra-coded frames.

addition, the decoder can read from the bitstream some ofthe configuration parameters adopted by the encoder. In somecases these parameters can be obtained directly from parsingthe bitstream. This is the case, for example, of the GOPstructure, the quantization parameters, the coding modes, themotion vectors, etc. Instead, in other cases, the configurationof the encoder can only be determined in an indirect way. Forexample, it might be possible to infer the adopted motion esti-mation algorithms from the analysis of the motion vectors [27],or the rate control algorithm from the analysis of the temporalvariation of the quantization parameter [23]. However, whenthe decoded video sequence is re-compressed, most of theinformation related to the first coding step is seemingly lost,since the analysis of the bitstream reveals details of the secondcoding step only.

In this paper we consider the set-up illustrated in Figure 2,in which a video sequence X is encoded twice, namely bycodec c1, obtaining X1, followed by c2, obtaining X2. Asargued in Section I, the first coding step is often performedat the time the sequence X is acquired. Instead, the secondcoding step might be applied after manipulating the sequenceX1, or when uploading X1 to a video sharing web site. Inthis paper, we aim at determining the characteristics of thefirst codec, c1, given that we observe only the bitstream atthe output of the second encoder. Specifically, we assume thatc1 belongs to a group of candidate coding architectures, inwhich each architecture is the archetypal for a specific videocoding standard. In addition, we are particularly interested indetermining the GOP size, i.e., the number of frames betweentwo consecutive I-frames, adopted by c1.

In general, c1 and c2 need not to be the same. Indeed:i) they might belong to two different coding architectures;ii) they might follow the same architecture, but representtwo different implementations of the same standard; iii) theymight be the very same codec, but configured using differentencoding parameters, e.g., adopt two different GOP structures,target distortions, etc. In all cases, c2 acts as a sort of noisesource, by masking the footprints left by c1. The amount ofdistortion introduced by the second coding step determines towhat extent we are able to reveal the traces of c1. In particular,when the distortion is strong enough, e.g., when the secondcodec operates at low bitrates, the footprints left by c1 mightnot be identified. This will be thoroughly discussed in theexperiments presented in Section VI. In the next section, wepresent an analysis of the footprint left by quantization, whichprovides the basis for the video coding identification algorithm

+ + T T�1Q

ME

MCP

-

+ + T T�1Q

ME

MCP

-

c1

c2

X

X2

X1

Fig. 2: Simplified block diagram representing a processing chain withtwo codecs, c1 and c2. A video sequence X is encoded and decodedto X1, then re-encoded and decoded to X2. Each codec (i.e., c1 andc2) is represented by a simplified block diagram as in Figure 1.

described in Section V.

IV. ANALYSIS OF QUANTIZATION FOOTPRINTS

The proposed method is based on the analysis of codingfootprints left by c1. To this end, we exploit the idempotencyproperty of scalar quantization, and show how this can besomewhat extended to video coding. Scalar quantization is,by construction, an idempotent operator. This means that it ispossible to re-iterate quantization multiple times on the samescalar value, and the result will always be the same as the oneobtained by applying quantization only once.

More formally, let us consider a scalar value x ∈ R, whichis quantized to xq1 :

xq1 = Q∆1(x) = ∆1

⌊x

∆1

⌋, (1)

where ∆1 denotes the quantization step size used, and thesubscript qi denotes a signal quantized i times. If we re-quantize xq1 , we obtain

xq2 = Q∆2(xq1) = ∆2

∆1

⌊x

∆1

⌋∆2

. (2)

When we use the same quantizer and the same step size, i.e.,∆1 = ∆2, then xq2 = xq1 . Hence, we can re-iterate quantiza-tion, without affecting the signal after the first quantization.

The idempotent property can be conveniently exploited toidentify the quantizer used to produce a set of observed scalarvalues, when one is given a finite set of candidate quantizers,each one of the form in (1), but characterized by a differentstep size ∆ ∈ S. Specifically, let {x1, . . . , xP } denote a setof (unobserved) scalar values and {x1

q1 , . . . , xPq1} denote a set

of (observed) quantized values according to (1). Then, thequantization footprints can reveal the quantization step size

∆1 = max

argmin∆∈S

P∑j=1

|xjq1 −Q∆(xjq1)|

, (3)

Page 5: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

5

DPCM  /  PCM  

G1,�1

DPCM/PCM  

G2,�2

xj xjq1

xjq2

Q�(xjq2

)Q�

E(�, J )

Fig. 3: Block diagram of double DPCM/PCM compression.

where ∆ = argmin∆

f(∆) returns the argument ∆ such that

f(∆) is the minimum of function f(∆). The max{·} operatoris needed since its argument might contain more than onequantizer compatible with the observed values. This mightbe the case, e.g., when S contains integer multiples or sub-multiples of ∆1. However, as proved in [17], ∆1 = ∆1,provided that we observe a large enough number of values.

In case of double compression, it is not possible to observethe output of the first quantizer, {x1

q1 , . . . , xPq1}. However,

when ∆2 < ∆1, it is still possible to reveal the first quantiza-tion step size. Let us consider a slightly more elaborate modelillustrated in Figure 3, which represents a simplified version ofdouble video compression. Let X = {x1, x2, . . . , xP }, xj ∈R, denote a 1-dimensional sequence of samples. DifferentialPulse Code Modulation (DPCM) is an efficient lossy codingtechnique that is used to exploit the inter-sample statisticaldependency. Instead of quantizing the samples directly as inthe case of scalar quantization described in (1) (a.k.a. PulseCode Modulation (PCM)), the prediction residual is quantized:

ejq1 = Q∆1(ej) = Q∆1

(xj − P(xj)), (4)

where P(xj) denotes a predictor of the sample xj . In thesimplest case, the previously decoded sample is used as pre-dictor, i.e., P(xj) = xj−1

q1 . The sample is then reconstructedby adding back the quantized residual to the predictor:

xjq1 = P(xj) + ejq1 . (5)

When using DPCM, it is customary to periodically insertsamples which are PCM coded. Let G1 denote the numberof samples between two consecutive PCM-coded samples andX1 = {x1

q1 , . . . , xPq1} the sequence reconstructed after the first

coding step. X1 is then re-encoded with DPCM to obtain X2,with quantization step size ∆2 and a period of G2 betweenPCM-coded samples.

If G1 is known, it is possible to focus on those sampleswhich were originally PCM-coded in the first coding step, i.e.,xjq2 , j ∈ J , J = {mod(j−1, G1) = 0}. In this case, althoughthe samples produced by the first coding step are not directlyobserved, it is still possible to estimate the quantization stepsize. Indeed, the decoded samples after the second coding stepcan be written as

xjq2 = P(xjq1) + ejq2 = P(xjq1) +Q∆2(xjq1 − P(xjq1)) (6)

Equation (6) states that xjq2 is obtained by shifting xjq1 byan amount equal to the predictor P(xjq1), quantizing the resultaccording to Q∆2

(·), and shifting back the result by P(xjq1).For any possible input xjq1 to the second coding step, the outputxjq2 is equal to the input, plus an offset that depends on thequantization error introduced on the residual. More formally,

-100 -80 -60 -40 -20 0 20 40 60 80 100

x

Pr x

j q 2(x

)

Fig. 4: Histogram of the samples xjq2 , j ∈ J , when ∆1 = 10 and

∆2 = 4.

by adding and removing the term xjq1−P(xjq1) to equation (6),it simplifies to

xjq2 = P(xjq1) +Q∆2(xjq1 − P(xjq1)) + (xjq1 − P(xjq1))

−(xjq1 − P(xjq1))

= xjq1 +Q∆2(xjq1 − P(xjq1))− (xjq1 − P(xjq1))

= xjq1 + (Q∆2(ejq1)− ejq1)

= xjq1 + ηj ,(7)

where ηj is the quantization error introduced on the residualejq1 = xjq1 − P(xjq1).

It is interesting to determine the statistical distribution ofxjq2 , which depends on both the distribution of xjq1 and ηj .The form of the probability density function (p.d.f.) of xjq1 ,j ∈ J , is determined by the first quantizer:

Prxjq1

(x) ∝∑k

wkδ(x− k∆1), (8)

where δ(x) = 1 if x = 0, and equal to zero otherwise. Thevalues wk ∈ R depend on the p.d.f. of the original sequence,and they are not relevant to the present discussion. The residualis represented as a random variable, whose p.d.f. is Prejq1 (x).If Prejq1 (x) is smooth and its spread (e.g., measured by itsstandard deviation) is large when compared to ∆2, then thep.d.f. of the quantization error on the residual is uniform inthe interval [−∆2/2,∆2/2]:

Prηj (x) ∝ rect

(x

∆2

), (9)

where rect(x) = 1 if x ∈ [−1/2,+1/2] and zero otherwise.Assuming statistical independence between xjq1 and ηj , thep.d.f. of the output of the second coding step can be writtenas

Prxjq2

(x) ∝∑k

wkrect

(x− k∆1

∆2

). (10)

As an example, Figure 4 shows the empirical distributionPrxj

q2(x) obtained by applying double DPCM coding to a 1-

dimensional source and setting ∆1 = 10 and ∆2 = 4.Similarly to the case of scalar quantization in (3), when

∆2 < ∆1, it is possible to estimate the quantization step sizeof the first coding step by re-quantizing the observed samplesby varying ∆, and observing the behavior of the cost function

E(∆,J ) =∑j∈J|xjq2 −Q∆(xjq2)|. (11)

Page 6: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

6

0 5 10 15 20"

0

1

2

3

4

5

6

P j2J

jxj q 2!

Q"(x

j q 2)j

"2 = 0"2 = 1"2 = 2"2 = 4"2 = 8

(a)

0 5 10 15 20"

0

1

2

3

4

5

6

P j2J

jxj q 2!

Q"(x

j q 2)j

"2 = 0"2 = 1"2 = 2"2 = 4"2 = 8

(b)

Fig. 5: Behavior of E(∆,Jg1), when ∆1 = 10. When g1 = G1 (a)strong traces revealing ∆1 = 10 are present, while when g1 6= G1

(b) these traces are not so evident (best seen in colours).

Figure 5a shows the cost function in (11), when varying ∆,for the case ∆1 = 10. It is possible to clearly observe thefootprints left by the first coding step, as local minima when∆ = ∆1 and its sub-multiples.

So far, we have assumed that G1 is known. In practice, thisis not the case, since only the output of the second codingstep is observed. However, it is possible to compute E(∆,Jg1)for different hypotheses of the distance between consecutivePCM-coded samples, where Jg1 = {mod(j − 1, g1) = 0 ∧mod(j − 1, G2) 6= 0}, g1 = 2, 3, . . ., and select the smallestvalue of g1 for which E(∆,Jg1) exhibits the characteristicshape shown in Figure 5a. Indeed, when g1 6= G1, the costfunction is of the form shown in Figure 5b.

In the next section we show that similar principles can befollowed to identify both the video codec and the GOP sizeused by the first coding step.

V. VIDEO CODEC AND GOP SIZE IDENTIFICATION

The proposed method for the identification of the videocodec and of the GOP size follows a recompress-and-observeparadigm, i.e., re-encoding the video under analysis so as tofind a coding configuration that matches the one used in thefirst coding step. More precisely, let us consider a double-compression chain such as the one illustrated in Figure 2.A video sequence X is encoded by c1 and decoded in thepixel domain to produce X1. Then, X1 is re-encoded with c2

and decoded to X2. In the case of DPCM/PCM coding, weshowed in Section IV that the quantization footprints left bythe first coding step were clearly identifiable when ∆2 < ∆1.Similarly, in the case of video coding, we assume that c2

is configured to operate in such a way that the distortionintroduced in X2 is less than, or equal to, the one introducedby c1, so as to avoid masking completely the traces left byc1. This situation commonly arises in real-world scenarios. Forexample, this is the case of a video that is edited, and thenrecompressed using coding parameters that retain the samevisual quality as the input. For example, when the same codecis used for both the first and the second compression, the targetbitrate of the latter step is chosen to be at least as large as thebitrate of the former. This implies the use of a finer quantizerin the second coding step.

The proposed method receives as input the bitstream afterthe second encoder c2, decodes it to X2 and re-encodes this

1 29 57 85 113 141 169 197 225 253 28131

21

11

1

j

q

1 25 49 73 97 121 145 169 193 217 241 265 289

(a)

1 29 57 85 113 141 169 197 225 253 28131

21

11

1

j

q

1 25 49 73 97 121 145 169 193 217 241 265 289

(b)

1 29 57 85 113 141 169 197 225 253 28131

21

11

1

j

q

1 25 49 73 97 121 145 169 193 217 241 265 289

(c)

Fig. 6: Example of matrices Pc3(q, j) obtained for the sequenceNEWS originally encoded with MPEG-2 (GOP size: 14) and re-encoded with MPEG-4 (GOP size: 12). (a) c3 = MPEG-2; (b) c3 =MPEG-4; (c) c3 = AVC.

sequence using a third codec, c3, obtaining X3. When c3

matches the same codec and configuration parameters as c1,we expect that X3 ' X2, i.e., the output of the third codingstep is similar to its input, due to the idempotent property.In the case of video, the similarity between sequences iscomputed by means of the Peak Signal to Noise Ratio (PSNR).We enumerate different codecs c3 ∈ C, each with differentencoding parameters, and we look for the one that leads tothe highest similarity. Concretely, since the GOP size of thefirst coding step is unknown, we configure c3 to operate usingintra-frame coding only (I-frames), and repeat the encodingusing different quantization parameters (QP) q ∈ QP . Assuch, we only need to loop over two parameters (i.e., codecand QP), repeating the encoding |C| × |QP| times and, foreach (c3, q) pair, we compute the PSNR between X2 and X3

on a frame-by-frame basis. As an example, Figure 6 showsthree matrices containing the values of PSNR obtained whenanalyzing a double-compressed video sequence, which wascompressed with MPEG-2 (GOP size 14) and MPEG-4 (GOPsize 12) in the first and second coding step, respectively. Eachmatrix corresponds to a different codec, namely MPEG-2,MPEG-4 and AVC. By inspecting these matrices, we observethe following:• I-frames in X1 re-encoded as I-frames in X3 result in

higher PSNR values with respect to those obtained forP-frames re-encoded as I-frames. As a consequence, aperiodic pattern arises, whose period is equal to the GOPsize of the first coding step.

• When codec c3 matches codec c1, we can analyze howthe PSNR varies when changing the QP for those frameswhich were originally encoded as I-frames in the firstcoding step. This is equivalent to looking at the columnsof the matrix for which c3 = c1 corresponding to theI-frames. This is better illustrated in Figure 7a, in whichit is possible to observe that the PSNR vs. QP curveexhibits a local maximum corresponding to the QP value

Page 7: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

7

101

q

35

40

45

PSNR

(a)

101

q

35

40

45

PSNR

(b)

5 10 15 20 25 30

q

30

35

40

45

50

PSNR

(c)

Fig. 7: PSNR vs. q curves for different I-frames of the first GOP. (a)c3 (MPEG-2) matches c1; (b) c3 (MPEG-4) does not match c1; (c)c3 (AVC) does not match c1. Note that the q scale is logarithmic inthe top and middle figures to take into account the different mappingbetween the quantization parameter and the quantization step size inMPEG-2, MPEG-4 and AVC.

originally used by c1 to encode the frame. Conversely,when codec c3 does not match codec c1, the PSNR vs.QP function is smooth, as illustrated in Figure 7b andFigure 7c.

Starting from the above observations, we propose the follow-ing identification algorithm in order to detect the codec usedby c1 and its GOP size:

1) Initialization: Select the set of candidate codecsC and the set of quantization parameters QP =[QPmin, QPmax]. QPmin is set equal to the minimumvalue accepted by the codec that gives the highestquality, while QPmax is set to a value that is not greaterthan the QP value used by c2 (i.e., the maximum QPvalue we are able to detect).

2) Recompress: For each c3 ∈ C,

a) Encode X2 with c3 using all QP values and intra-frame coding mode only. Let Xq

3, q ∈ QP denotethe output of c3 decoded in the pixel domain.

b) Compute the matrix Pc3, whose entries are given

by

Pc3(q, j) = PSNR(X2(j), Xq

3(j)), (12)

i.e., the PSNR value computed comparing the j-thframe of X2 and of Xq

3.c) Since the bitstream used to decode X2 with c2

is assumed to be available, the matrix Pc3 isprocessed column-wise to remove the traces left byc2, for which the quantization parameter is known.

To this end we compute

P′c3(q, j) =

{Pc3 (q−1,j)+Pc3 (q+1,i)

2 q = QPc2

Pc3(q, j) otherwise,(13)

In doing so, it is possible to remove potential localmaxima obtained if the quantization parameter ofc3 matches the one of c2 instead of the one of c1.

d) To enhance the local maxima due to matchingparameters between c3 and c1, we compute thematrix

P′′c3= P′c3

−medfilt(P′c3), (14)

where medfilt(·) is a median filter applied row-by-row to enhance peaks due to I-frames in X1. Anexample is shown in Figure 8, which illustratesP′′c3

and P′c3for comparison.

3) Estimate the GOP size: Many methods for estimatingthe periodicity of a signal have been proposed in theliterature. These range from historical frequency esti-mation algorithms [33], [34] to more modern methodsfor tempo estimation [35]. However, due to the natureof the analyzed signals, in the following we propose asimple yet effective method for periodicity estimationthat enables GOP detection at reduced complexity.

a) For each c3 ∈ C,i) Compute the absolute value of Fourier trans-

form of each row of P′′c3. Then, average the

result along each column to obtain

gc3 =1

|QP|∑q∈QP

|F{P′′c3(q, ·)}|, (15)

where F(·) denotes the Fourier transform andgc3

is a row vector whose number of elementsis equal to the number of frames.

ii) Use a filterbank to evaluate the periodicity ofthe peaks in gc3 , which is an estimate of theGOP size. More specifically we compute theaverage energy obtained filtering gc3

with dif-ferent filters tuned to different candidate GOPsizes:

G1,c3 = argmaxG

N∑j=1

|(gc3 ∗ hG)(i)|2, (16)

where G = argmaxG

f(G) returns the argument

G such that f(G) is the maximum of functionf(G) and hG is a comb-filter whose distancebetween consecutive peaks is equal to G . Thatis

hG(j) =∑k

δ(j − kG) (17)

iii) Compute as quality metric of gc3its peakness

value defined as

pc3 =max(gc3)

mean(gc3), (18)

Page 8: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

8

1 29 57 85 113 141 169 197 225 253 28131

21

11

1

n

q

1 25 49 73 97 121 145 169 193 217 241 265 289

(a)

1 29 57 85 113 141 169 197 225 253 28131

21

11

1

n

q

1 25 49 73 97 121 145 169 193 217 241 265 289

(b)

Fig. 8: Effect of median filtering on P′c3 realization from Figure 6.(a) P′c3 ; (b) P′′c3 .

where max(.) computes the maximum valueand mean(.) computes the average value.

b) Estimate the GOP size G1 as the G1,c3with the

highest associated pc3. More formally,

G1 = G1,c3 , (19)

where c3 = argmaxc3

(pc3). Notice that pc3

can be

used as a confidence value of the GOP estimation.Indeed, high pc3

values are associated to GOPsestimated from very peaky (thus reliable and easierto analyze) gc3

. GOP estimation associated topc3

values below a confidence threshold can bediscarded (as shown in Section VI).

4) Identify the codec:a) For each c3 ∈ C, compute the cost function

Jc3 =∑j∈J

∑q∈QP

P′′c3(q, j), (20)

where J = {j|mod(j−1, G1) = 0}. Equation (20)computes the sum of median-filtered PSNR valuesfor those frames corresponding to the original I-frames.

b) Identify the codec as follows:

c1 = argmaxc3∈C

Jc3 . (21)

Notice that the knowledge of the bitstream after c2 is usedonly to extract the GOP and QP values of X2. However, theknowledge of the bitstream is not such a strict hypothesis,since videos are usually distributed in compressed formatand not already decoded (especially when dealing with user-generated content). Nonetheless, if X2 is available only in thepixel domain, it is still possible to apply the algorithm skippingstep 2c with reduced accuracy.

VI. EXPERIMENTAL RESULTS

In order to validate the proposed method, in this Section wepresent a set of results obtained using a dataset composed ofa wide set of video sequences encoded with different param-eters. More specifically we focus on studying the accuracyof the GOP estimation and the codec detection algorithmsfor different sequences (seq), coding standards applied during

TABLE I: List of parameters names and values used to build thedataset. Number of configurations for each parameter is also provided.

Name Value N. conf.

seq CIF Foreman, Mobile, Paris, News4CIF Ice, Harbour 6

c1

MPEG-2 (libavcodec1) MPEG-2MPEG-4 Part 4 (libavcodec) MPEG-4(a)MPEG-4 Part 4 (Microsoft variant2) MPEG-4(b)H.264/AVC (libavcodec) w/o deblock AVC(a)H.264/AVC (libavcodec) with deblock AVC(b)H.264/AVC 10 (JM3) w/o deblock AVC(c)DIRAC (libschrodinger4) DIRAC

7

R1

RL RM RH

CIF 30 dB 33 dB 36 dB4CIF 34 dB 37 dB 40 dB

3

G1 5, 7, 9, 11, 13, 14, 15, 17, 19, 21, 23 11

c2

MPEG-2 (libavcodec) MPEG-2MPEG-4 Part 4 (libavcodec) MPEG-4(a)H.264/AVC (libavcodec) AVC(a)

3

QP2

a b c d e fMPEG-2/4 1 2 4 5 7 10AVC 10 20 23 36 29 32

6

G2 12 1

the first coding step (c1), bitrates of c1 (R1), GOP sizes ofc1 (G1), coding standards applied during the second codingstep (c2), QPs of c2 (QP2), and GOP sizes of c2 (G2).Notice that, to test the effect of all these parameters on thepresented algorithm, we need a complete knowledge of thecoding history of each tested sequence. For this reason weperformed this validation step on a controlled set of videos weencoded on purpose starting from raw material, rather than onrandom videos downloaded from the web.

Table I reports all the used parameters values. We startedfrom six well known uncompressed video sequences of 10seconds each (i.e., 300 frames approximately), with differentspatial and temporal information: four at CIF spatial resolution(352 × 288), namely Foreman, Mobile, Paris, News; two at4CIF spatial resolution (704× 576), namely Ice and Harbour.For the first coding step we used MPEG-2, MPEG-4 part 4(MPEG-4), MPEG-4 part 10 H.264/AVC (AVC), and DIRAC.As it regards MPEG-4, we tested two different implementa-tions (MPEG-4(a) and MPEG-4(b)), while for AVC we usedtwo different implementations (AVC(a) and AVC(c)), one ofwhich was tested with the in-loop filter enabled too (AVC(b)).For each codec, we selected three different target bitrates byenabling rate control in order to obtain three sequences at low,medium and high quality, respectively. The mapping betweenbitrates and the obtained PRNUs is reported in Table I. Asfor the second coding step, we re-encoded all sequenceswith either MPEG-2, MPEG-4 or AVC, using a constantQP (i.e., a constant quantization step). In order to unify thenotation, the set of possible QP values for c2 is identifiedwith {a, b, c, d, e, f}, which corresponds to {1, 2, 4, 5, 7, 10}for MPEG-2/4 codecs and to {10, 20, 23, 26, 29, 32} for AVC(equalizing the value of quantization steps among the codecs).Concerning the GOP used for the first and second coding steps,we used 10 different combinations reported in Table I. Noticethat the algorithm depends on the ratio between the used GOPs(G1/G2), rather than on the exact GOP value. This justifiesthe fact that we fixed G2 while changing G1 only. Concerningthe third coding step (i.e., the analysis one), we used as c3

MPEG-2, MPEG-4(a), AVC(a) and DIRAC.

Page 9: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

9

TABLE II: a1(QP2, G1/G2) fixing R1 = RM . Bold is used for accuracy values larger than 0.5.G1/G2

5/12 7/12 9/12 11/12 13/12 15/12 17/12 19/12 21/12 23/12

QP2

a 1.00 1.00 1.00 1.00 1.00 0.89 0.99 0.90 0.78 0.78b 1.00 1.00 0.99 0.99 0.97 0.78 0.90 0.81 0.71 0.75c 1.00 0.99 0.94 0.85 0.79 0.64 0.64 0.58 0.57 0.46d 0.99 0.96 0.78 0.72 0.64 0.47 0.49 0.36 0.42 0.33e 0.92 0.83 0.51 0.46 0.44 0.29 0.28 0.22 0.21 0.21f 0.78 0.62 0.25 0.31 0.28 0.21 0.14 0.11 0.12 0.08

GOP EstimationLet us consider a sequence (seq) double encoded with

the parameters c1, R1, G1,c2, QP2 and G2. We define thedetection function as

d(seq,c1, R1, G1,c2, QP2, G2) =

{1, if G1 = G1,

0, otherwise,(22)

whose value is 1 if the estimated GOP G1 is correct and 0otherwise. To analyze the behavior of the GOP estimationalgorithm under different conditions and its robustness tovarious coding parameters, we average the detection function(22) along different dimensions, aggregating results obtainedon a subset of the dataset.

To highlight the effect of the distortion introduced by c2

quantization (QP2) and different GOP ratios (G1/G2) on GOPestimation, we define the accuracy as

a1(QP2, G1/G2) = averageseq,c1,R1,c2

[d(seq,c1, R1, G1,c2, QP2, G2)] ,

(23)where average

x[d(x)] computes the average value of d

along the x direction and a1 ∈ [0, 1] (0 for GOP nevercorrectly detected, 1 for GOP always correctly detected).Table II shows the behavior of a1(QP2, G1/G2) fixingthe average quality of c1 (R1 = RM ) with c1 ∈{MPEG-2,MPEG-4(a), AVC(a), DIRAC}, c2 ∈ {MPEG-2,MPEG-4(a),AVC(a)} and G2 ∈ {5, 7, 9, 11, 13, 15, 17, 19, 21,23} (i.e., 6× 4× 1× 10× 3× 6× 1 = 4320 sequences). Asexpected, when QP2 is low, the quantization noise introducedby c2 does not mask the artifacts introduced by c1, thusallowing an accurate GOP estimation. The same trend canbe observed for low values of G1/G2. Indeed, since all thesequences share the same number of frames, a low G1 valuesdetermines more peaks in gc3 (see (15)), thus making easierto detect the periodicity of the peaks (i.e., G1). If the qualityof c1 is decreased (R1 = RL) or increased (R1 = RH ), theartifacts introduced by c1 become stronger or weaker, respec-tively. This effect determines an average accuracy increase of5.8% for R1 = RL and an average accuracy decrease of 13.6%for R1 = RH with respect to the case R1 = RM shown inTable II.

In order to study the behavior of the GOP estimationalgorithm for each video sequence, we average the detectionfunction along c1, R1, c2, G1 and G2 defining the accuracyas

a2(QP2, seq) = averagec1,R1,G1,c2,G2

[d(seq,c1, R1, G1,c2, QP2, G2)] ,

(24)

1https://libav.org/2http://ffmpeg.org/∼michael/msmpeg4.txt3http://iphome.hhi.de/suehring/tml4http://diracvideo.org/

TABLE III: a2(QP2, seq). We use bold and italics for best and worstresults for each QP2, respectively.

seqForeman Mobile Paris News Ice Harbour

QP2

a 0.95 1.00 0.89 0.93 0.96 0.87b 0.92 0.97 0.88 0.86 0.96 0.87c 0.77 0.88 0.81 0.69 0.96 0.75d 0.62 0.75 0.68 0.55 0.78 0.31e 0.42 0.63 0.47 0.39 0.53 0.18f 0.27 0.54 0.32 0.27 0.31 0.04

TABLE IV: a3(c1,c2) values. Bold and italics are used values over0.8 and under 0.6, respectively.

c2MPEG-2 MPEG-4(a) AVC(a)

c1

MPEG-2 0.70 0.66 0.88MPEG-4(a) 0.72 0.61 0.81

AVC(a) 0.56 0.51 0.66DIRAC 0.54 0.53 0.66

whose values still ranges between 0 and 1. Table III reportsa2(QP2, seq) values obtained on the same 4320 sequencesdataset of the previous experiment. These results confirm that,despite the accuracy values slightly differ for each sequence,the general trend remains the same, i.e., the accuracy decreasesfor increasing QP2 values as expected. It is worth noting thatthe sequence providing the best results is Mobile. This is dueto the presence of complex textures with constant motion thatare prone to emphasize visual artifacts left by c1, thus makingGOP footprints stronger and easier to detect. On the otherhand, Harbour gives overall the worst results. This is due tothe lack of motion of textured blocks.

Another interesting aspect to investigate is the effect ofdifferent codecs at the first and second coding steps (i.e., c1

and c2). To this purpose we define the accuracy as

a3(c1,c2) = averageseq,R1,G1,QP2,G2

[d(seq,c1, R1, G1,c2, QP2, G2)] .

(25)Table IV reports a3(c1,c2) values computed on the samedataset of the previous experiments, highlighting the highestand lowest values using bold and italics, respectively. Noticethat, on average, the highest accuracy (over 80%) is obtainedfor c1 ∈ {MPEG-2,MPEG-4(a)} and c2 = AVC(a), while thelowest accuracy (under 60%) is for c1 ∈ {AVC(a),DIRAC}and c2 ∈ {MPEG-2,MPEG-4(a)}. This is an expected behav-ior, as MPEG-2 and MPEG-4 are older standards than AVCand DIRAC. Hence, MPEG-2 and MPEG-4 leave strongercoding artifacts that, on one hand, enable to easily detect theirpresence, and, on the other hand, increase the masking powerover previous coding steps.

Finally, in order to understand whether it is possible todiscriminate between correct and wrong GOP estimations,we studied the relationship between GOP estimation and the

Page 10: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

10

TABLE V: Confusion matrix for c1 identification with different masking c2. Bold is used to denote the elements on the diagonal, i.e.,elements that should be equal to one in the best scenario.

c1MPEG-2 MPEG-4 AVC DIRAC

c1

MPEG-2 0.94 0.96 0.96 0.05 0.04 0.03 0 0 0.01 0.01 0 0MPEG-4 (a) 0 0 0.04 0.93 0.92 0.76 0.02 0.02 0.2 0.06 0.06 0MPEG-4 (b) 0.02 0.02 0.06 0.87 0.87 0.69 0.06 0.05 0.25 0.06 0.06 0

AVC (a) 0.01 0 0 0.14 0.24 0.06 0.79 0.68 0.94 0.06 0.08 0AVC (b) 0 0.01 0 0.13 0.2 0.05 0.81 0.69 0.94 0.06 0.09 0.01AVC (c) 0.06 0.06 0.15 0 0.03 0 0.92 0.87 0.81 0.02 0.05 0.04DIRAC 0 0.02 0 0.09 0.12 0.01 0.13 0.12 0.21 0.78 0.74 0.78

MPEG-2 MPEG-4 AVC MPEG-2 MPEG-4 AVC MPEG-2 MPEG-4 AVC MPEG-2 MPEG-4 AVCc2

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

FP

TP

G 1 = 5

G 1 = 7

G 1 = 9

G 1 = 11

G 1 = 13

G 1 = 15

G 1 = 17

G 1 = 19

G 1 = 21

G 1 = 23

Fig. 9: GOP detection ROC curves for different values of G1.

peakness value associated to an estimated GOP (18). In otherwords, we computed the Receiver Operating Characteristic(ROC) curve by thresholding peakness values associated toestimated GOPs. We evaluate as True Positives (TP) thecorrectly estimated GOPs whose associated peakness is abovethe threshold, and False Positives (FP) the wrongly estimatedGOPs whose associated peakness is above the threshold.Figure 9 shows the ROC curves computed on the samedataset used for the previous experiments, analyzing separatelysequences with different G1 values. Also this experimentconfirms that the lower the G1 value, the easiest is to correctlydetect the GOP, as already shown in Table II. Moreover thisproves that we can detect with a given probability whether theGOP estimate is to be considered valid or not.

Codec Identification

To analyze the performance of the codec identificationalgorithm, we consider all the combinations of parametersreported in Table I, fixing the GOP size G1 = 14. The datasetis then composed by 2268 sequences (i.e., six sequences, sevenc1, three R1, one G1, three c2, six QP2 and one G2). Sincewe focus on codec identification, we assume G1 to be known.

Table V shows the codec identification confusion matrix asa function of the masking codec c2. The identification methodis operated at sequence level by aggregating the observationsextracted from all detected intra-coded frames as explained in(21). These results are averaged across all tested sequences.It is interesting to notice that, as highlighted also in the GOPidentification procedure, AVC and DIRAC are well maskedby MPEG-2 and MPEG-4. Instead, when the masking (c2)and the masked (c1) codecs share the same architecture,identification accuracy is increased.

TABLE VI: Codec identification accuracy for different sequences, c2

and QP2. Bold is used for values larger than 0.8

(a) c2 = MPEG-2

QP2

a b c d e f

seq

Foreman 1.00 0.95 0.90 0.90 0.81 0.67Mobile 1.00 1.00 0.81 0.71 0.57 0.62News 1.00 1.00 0.90 0.90 0.86 0.76Paris 1.00 1.00 1.00 0.90 0.86 0.76Ice 0.90 0.90 0.90 0.90 0.86 0.67Harbour 0.90 0.86 0.90 0.86 0.81 0.67

(b) c2 = MPEG-4

QP2

a b c d e fse

q

Foreman 0.95 0.90 0.90 0.90 0.81 0.52Mobile 1.00 0.90 0.62 0.57 0.71 0.62News 1.00 0.95 0.90 0.86 0.76 0.62Paris 1.00 1.00 0.95 0.90 0.90 0.71Ice 0.90 0.90 0.86 0.81 0.76 0.52Harbour 0.90 0.81 0.90 0.86 0.71 0.52

(c) c2 = AVC

QP2

a b c d e f

seq

Foreman 1.00 1.00 1.00 0.95 0.90 0.81Mobile 1.00 1.00 0.95 0.76 0.62 0.57News 1.00 0.95 0.90 0.90 0.81 0.86Paris 1.00 1.00 1.00 0.90 0.90 0.81Ice 0.90 0.86 0.76 0.67 0.62 0.67Harbour 0.90 0.86 0.86 0.67 0.48 0.43

In order to analyze the masking effect further, Table VIshows the codec identification accuracy obtained for differentc2 and QP2 values. In nearly lossless conditions (low QP2)the proposed method successfully identifies the first codec inalmost all cases. Notice that the influence of lossy compressionon the effectiveness of the proposed identification algorithmis content-dependent. Indeed, for Foreman, News or Paris,accuracy is large also for high QP2 values, whereas forHarbour the method is prone to fail when QP2 is moderatelyincreased.

A further test that we performed was to study the codecestimation accuracy for different values of R. To this pur-pose, Table VII shows the accuracy for different c2 and R,averaging results over the other parameters. Unlike in theGOP estimation case, codec estimation accuracy increaseswhen the original sequence is encoded at medium quality (i.e.,R = RM ). This is due to the fact that, at medium quality, tracesleft by c1 are stronger than for R = RH and the sequence

Page 11: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

11

TABLE VII: Codec identification accuracy for different c2 and Rvalues. Bold is used for best result for each c2.

RRL RM RH

c2

MPEG-2 0.86 0.90 0.82MPEG-4 0.83 0.85 0.78AVC 0.83 0.88 0.82

0 0.5 10

0.5

1

MPEG-2

FP rate

TP

rate

0 0.5 10

0.5

1

MPEG-4

FP rate

TP

rate

0 0.5 10

0.5

1

AVC

FP rate

TP

rate

QP2 = a

QP2 = b

QP2 = c

QP2 = d

QP2 = e

QP2 = f

Fig. 10: Detection ROC for each codec c2. Results are averaged onthe other parameters.

suffers less from other noise factor that can be introduced atR = RL.

Finally, we tested the performance of the identificationalgorithm using a threshold, instead of comparing resultsbetween different c3. To this respect, we show the ROC curvesobtained at different values of QP2 for the second codingstep. Let τ denote a threshold value. The proposed methodlabels a sequence as encoded with c1 whenever Jc1 > τ .The TP rate is the fraction of sequences originally encodedwith c1 for which Jc1

> τ . Conversely, the FP rate is thefraction of sequences not encoded with c1 for which Jc1

> τ .ROC curves are traced by varying the value of τ . Figure 10shows the ROC curves for each masking codec c2, averagingresults across all the other parameters. This allows us to studythe impact of the masking codec in terms of identificationaccuracy. These results highlight that it is possible to detectthe codec based on a threshold value on Jc1

, thus enablingthe algorithm to work also in an open-group scenario.

In order to study the dependency on the video content,Figure 11 shows individual ROC curves for each sequence,always averaging results over all the other parameters. Thesecharts confirm that codec identification is content-dependent,as already observed analyzing the results in Table VI and forthe GOP.

A test in a partially uncontrolled scenario

Up to now, we only presented results obtained on syntheticdatasets. This is of paramount importance as this is the onlyfeasible solution to study the behavior of our algorithm undermany different testing conditions. However, in order to verifythat it is possible to apply the proposed method also in a less-controlled scenario, we performed an additional experimentusing a media sharing platform as masking codec. Morespecifically, this proof of concept has been carried out using 9sequences uploaded to YouTube. The sequences were createdby encoding Foreman at low, medium, and high quality, witheither MPEG-2, MPEG-4, and AVC. YouTube re-encodedall the sequences when uploaded, acting as c2. We thendownloaded the sequences, and tested our method on them,using as c3 either MPEG-2, MPEG-4, AVC, and DIRAC.

0 0.5 10

0.5

1

Foreman

FP rate

TP

rate

0 0.5 10

0.5

1

Mobile

FP rate

TP

rate

0 0.5 10

0.5

1

Paris

FP rate

TP

rate

0 0.5 10

0.5

1

News

FP rate

TP

rate

0 0.5 10

0.5

1

Ice

FP rate

TP

rate

0 0.5 10

0.5

1

Harbour

FP rate

TP

rate

QP2 = a

QP2 = b

QP2 = c

QP2 = d

QP2 = e

QP2 = f

Fig. 11: Detection ROC for each sequence. As shown in Table VI,codec identification is sequence-dependent, although good results canbe achieved for low QP values of c2 since the masking effect is lessinfluential.

Notice that in this case we only exploited the knowledge ofG2 and not the overall bitstream downloaded from YouTube.This means that we did not remove peaks due to QP2, skippingstep 2c of the algorithm, still achieving good results. Morespecifically, we were able to always estimate the correct GOPfor sequences at low and medium quality. Conversely, GOPtraces were completely lost on sequences at high quality, thusmaking GOP detection not possible. Concerning the codecidentification, assuming G1 as known, we always achievedthe correct result.

Computational Complexity

As explained in Section V, the proposed algorithm worksby re-encoding each sequence under analysis |C|×|QP| times,which may be time consuming. As a matter of fact, the timespent to analyze a 10 second CIF sequence with our simpleimplementation using a commercially available laptop (i.e.,equipped with 8GB of RAM and a 2.2 GHz Intel Core i7processor) is approximately 100 seconds.

Even though in many forensic applications accuracy is morevaluable than time (e.g., where the number of videos to beanalyzed is limited), reducing the computational complexityjust slightly decreasing the algorithm accuracy may be inter-esting in some specific use cases. For this reason, we studiedthe possibility of reducing the computational complexity ofour algorithm by decreasing: i) the size of the search spaceQP , and; ii) the length of the sequence under analysis. Morespecifically, we tested the effect of sub-sampling the set QPto a smaller set with cardinality |QP|/ωQP. Concerning time,we selected a portion of the sequence whose length is 1/ωT ofthe total length. The total computational time is then decreasedby a factor ωQP × ωT.

Concerning GOP estimation, we restricted the analysis tothe cases in which GOP estimation is sufficiently reliable (i.e.,values of QP2 and G1/G2 such that a1(QP2, G1/G2) > 0.5in Table II). Figure 12 shows the average accuracy obtained in

Page 12: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

12

!QP

1 2 3 4 5 6 7 8 9 10

accu

racy

0.3

0.4

0.5

0.6

0.7

0.8

0.9

!T = 1!T = 2!T = 3

Fig. 12: Effect of reducing computational complexity on GOP esti-mation.

!QP

1 2 3 4 5 6 7 8 9 10

accu

racy

0.660.680.7

0.720.740.760.780.8

0.820.84

!T = 1!T = 3!T = 5

Fig. 13: Effect of reducing computational complexity on codecestimation.

estimating the GOP in this scenario for different values of ωQPand ωT. Notice that reducing the number of tested QP does notaffect accuracy. Conversely, reducing the video length highlyreduce the algorithm accuracy. This is due to the fact that, theshortest the sequence, the smaller the number of GOPs in it.Nonetheless, it is possible to reduce the computational timeby a factor 10 (i.e., ωQP = 10 and ωT = 1).

Considering the codec estimation, Figure 13 shows theaverage accuracy on the whole dataset by changing ωQP andωT. Notice that in this case it is still possible to have a reliablecodec estimation also reducing the temporal resolution (i.e.,increasing ωT). As an example it is possible to reduce thecomputational complexity by a factor 15 by using ωQP = 3and ωT = 5, decreasing the average accuracy by only 4%.

This preliminary analysis shows the actual possibility ofscaling the computational complexity of the algorithm stillachieving a high detection accuracy. As a matter of fact, it ispossible to reduce the analysis time for a sequence of morethan an order of magnitude, thus making the analysis timecomparable with the sequence length (at CIF resolution).

VII. DISCUSSION

In this paper we presented an algorithm to estimate thecodec and GOP size used in the first coding step applied to adouble encoded sequence. The algorithm exploits the idempo-tency property that video codecs inherit from quantizers and itis based on a recompress-and-observe scheme, building uponour previous work presented in [1].

A set of tests on an extended video dataset proves thevalidity of the proposed method highlighting its workingconditions in terms of coding parameters used during the firstand second coding steps. More specifically, we showed thatit is possible to correctly identify the GOP size and codec(also considering different implementations) no matters whichmasking codec is used, as long as its quality is sufficiently highto preserve traces left by the first compression. Results usinga threshold (i.e., ROC curves) also show that it is possible to

discriminate, with a certain probability, which estimates shouldbe considered as valid and which ones should be discarded.An additional proof of concept of our algorithm in a real worldscenario (i.e., sequences uploaded on YouTube) highlightsthe possibility of using it in real applications. Moreover, apreliminary analysis also shows the possibility of reducingthe algorithms’ computational complexity by only slightlydecreasing its accuracy. This possibility will be the topic forour future research.

REFERENCES

[1] P. Bestagini, A. Allam, S. Milani, M. Tagliasacchi, and S. Tubaro,“Video codec identification,” in IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), March 2012, pp.2257–2260.

[2] P. Bestagini, S. Milani, M. Tagliasacchi, and S. Tubaro, “Video codecidentification extending the idempotency property,” in European Work-shop on Visual Information Processing (EUVIP), June 2013, pp. 220–225.

[3] S. Milani, M. Fontani, P. Bestagini, M. Barni, A. Piva, M. Tagliasacchi,and S. Tubaro, “An overview on video forensics,” APSIPA Transactionson Signal and Information Processing, vol. 1, p. e2, August 2012.

[4] S. Bian, W. Luo, and J. Huang, “Exposing fake bit-rate videos andestimating original bit-rates,” IEEE Transactions on Circuits and Systemsfor Video Technology (TCSVT), vol. 24, no. 12, pp. 1–1, December 2014.

[5] B. Haskell, A. Puri, and A. Netravali, Digital Video: An Introduction toMPEG-2: An Introduction to MPEG-2, ser. Digital multimedia standardsseries. Springer, 1997.

[6] I. Richardson, H.264 and MPEG-4 Video Compression: Video Codingfor Next-generation Multimedia. Wiley, 2003.

[7] T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra, “Overview ofthe H.264/AVC video coding standard,” IEEE Transactions on Circuitsand Systems for Video Technology (TCSVT), vol. 13, no. 7, pp. 560–576,July 2003.

[8] T. Borer and T. Davies, “Dirac video compression using open standards,”BBC RD White Paper, Tech. Rep., 2005.

[9] A. Piva, “An overview on image forensics,” ISRN Signal Processing,vol. 2013, p. 22, November 2013.

[10] Z. Fan and R. de Queiroz, “Maximum likelihood estimation of JPEGquantization table in the identification of bitmap compression history,” inIEEE International Conference on Image Processing (ICIP), September2000, pp. 948–951.

[11] Z. Fan and R. L. de Queiroz, “Identification of bitmap compressionhistory: JPEG detection and quantizer estimation,” IEEE Transactionson Image Processing (TIP), vol. 12, no. 2, pp. 230–235, April 2003.

[12] J. Lukas and J. Fridrich, “Estimation of primary quantization matrixin double compressed JPEG images,” in Digital Forensic ResearchWorkshop (DFRWS), August 2003.

[13] T. Bianchi and A. Piva, “Detection of nonaligned double JPEG com-pression based on integer periodicity maps,” IEEE Transactions onInformation Forensics and Security (TIFS), vol. 7, no. 2, pp. 842–848,April 2012.

[14] ——, “Reverse engineering of double jpeg compression in the presenceof image resizing,” in IEEE International Workshop on InformationForensics and Security (WIFS), December 2012, pp. 127–132.

[15] P. Ferrara, T. Bianchi, A. D. Rosa, and A. Piva, “Reverse engineeringof double compressed images in the presence of contrast enhancement,”in IEEE International Workshop on Multimedia Signal Processing(MMSP), September 2013, pp. 141–146.

[16] W. Lin, S. Tjoa, H. Zhao, and K. Liu, “Digital image source coderforensics via intrinsic fingerprints,” IEEE Transactions on InformationForensics and Security (TIFS), vol. 4, no. 3, pp. 460–475, August 2009.

[17] M. Tagliasacchi, M. V. Scarzanella, P. L. Dragotti, and S. Tubaro,“Transform coder identification,” in IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), May 2013, pp.5785–5789.

[18] ——, “Transform coder identification with double quantized data,” inIEEE International Conference on Image Processing (ICIP), September2013, pp. 1660–1664.

[19] G. J. Sullivan, J.-R. Ohm, W. Han, and T. Wiegand, “Overview of thehigh efficiency video coding HEVC standard,” IEEE Transactions onCircuits and Systems for Video Technology (TCSVT), vol. 22, no. 12,pp. 1649–1668, December 2012.

Page 13: Codec and GOP Identification in Double Compressed Videossim1mil/papers/TIPcodec.pdf · video sequences generated by encoding content with a diversity of codecs (MPEG-2, MPEG-4, H.264/AVC,

13

[20] H. Li and S. Forchhammer, “MPEG2 video parameter and no referencePSNR estimation,” in Picture Coding Symposium (PCS), May 2009, pp.1–4.

[21] W. Wang and H. Farid, “Exposing digital forgeries in video by detectingdouble quantization,” in ACM Multimedia and Security Workshop, 2009,pp. 39–48.

[22] Y. Chen, K. Challapali, and M. Balakrishnan, “Extracting coding param-eters from pre-coded MPEG-2 video,” in IEEE International Conferenceon Image Processing (ICIP), October 1998, pp. 360–364.

[23] M. Tagliasacchi and S. Tubaro, “Blind estimation of the QP parameterin H.264/AVC decoded video,” in International Workshop on ImageAnalysis for Multimedia Interactive Services (WIAMIS), April 2010, pp.1–4.

[24] W. Luo, M. Wu, and J. Huang, “MPEG recompression detectionbased on block artifacts,” in SPIE Conference on Security, Forensics,Steganography, and Watermarking of Multimedia Contents, February2008, p. 12.

[25] D. Vazquez-Padin, M. Fontani, T. Bianchi, P. Comesana, A. Piva,and M. Barni, “Detection of video double encoding with GOP sizeestimation,” in IEEE International Workshop on Information Forensicsand Security (WIFS), December 2012, pp. 151–156.

[26] G. Valenzise, M. Tagliasacchi, and S. Tubaro, “Estimating QP andmotion vectors in H.264/AVC video from decoded pixels,” in ACMWorkshop on Multimedia in Forensics, Security and Intelligence (MiFor),2010, pp. 89–92.

[27] S. Milani, M. Tagliasacchi, and S. Tubaro, “Identification of the motionestimation strategy using eigenalgorithms,” in IEEE International Con-ference on Image Processing (ICIP), September 2013, pp. 4477–4481.

[28] S. Milani, P. Bestagini, M. Tagliasacchi, and S. Tubaro, “Multiplecompression detection for video sequences,” in IEEE InternationalWorkshop on Multimedia Signal Processing (MMSP), September 2012,pp. 112–117.

[29] S. Milani, M. Tagliasacchi, and S. Tubaro, “Discriminating multipleJPEG compression using first digit features,” in IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP),March 2012, pp. 2253–2256.

[30] W. Luo, Y. Wang, and J. Huang, “Security analysis on spatial 1steganography for JPEG decompressed images,” IEEE Signal ProcessingLetters (SPL), vol. 18, no. 1, pp. 39 –42, January 2011.

[31] H. Farid, “Exposing digital forgeries from JPEG ghosts,” IEEE Trans-actions on Information Forensics and Security (TIFS), vol. 4, no. 1, pp.154 –160, March 2009.

[32] G. Valenzise, M. Tagliasacchi, and S. Tubaro, “Revealing the traces ofJPEG compression anti-forensics,” IEEE Transactions on InformationForensics and Security (TIFS), vol. 8, no. 2, pp. 335–349, February2013.

[33] V. F. Pisarenko, “The retrieval of harmonics from a covariance function,”Geophysical Journal of the Royal Astronomical Society, vol. 33, no. 3,pp. 347–366, September 1973.

[34] R. Schmidt, “Multiple emitter location and signal parameter estimation,”IEEE Transactions on Antennas and Propagation (TAP), vol. 34, no. 3,pp. 276–280, March 1986.

[35] J. Zapata and E. Gomez, “Comparative evaluation and combination ofaudio tempo estimation approaches,” in AES Conference on SemanticAudio, July 2011.

Paolo Bestagini was born in Novara (Italy) onFebruary 22, 1986. He received the M.Sc. inTelecommunications Engineering and the Ph.D inInformation Technology at the Politecnico di Milanoin 2010 and 2014, respectively. He is currently aPost-Doc researcher at the Image and Sound Pro-cessing Group, Politecnico di Milano. His researchactivity is focused on multimedia forensics andacoustic signal processing for microphone arrays.

138 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 1, JANUARY 2015

[34] J. Han, D. Farin, and P. de With, “A mixed-reality system forbroadcasting sports video to mobile devices,” IEEE Multimedia, vol. 18,no. 2, pp. 72–84, Feb. 2011.

[35] X. Zhu, P. Agrawal, J. Singh, T. Alpcan, and B. Girod, “Distributed rateallocation policies for multihomed video streaming over heterogeneousaccess networks,” IEEE Trans. Multimedia, vol. 11, no. 4, pp. 752–764,Jun. 2009.

[36] P. Goudarzi and M. R. N. Ranjbar, “Bandwidth allocation for videotransmission with differentiated quality of experience over wirelessnetworks,” Comput. Electr. Eng., vol. 37, no. 1, pp. 75–90, 2011.

[37] S. Parakh and A. Jagannatham, “Game theory based dynamic bit-rate adaptation for H.264 scalable video transmission in 4G wirelesssystems,” in Proc. Int. Conf. Signal Process. Commun. (SPCOM),Jul. 2012, pp. 1–5.

[38] S. Asioli, N. Ramzan, and E. Izquierdo, “A game theoretic approach tominimum-delay scalable video transmission over P2P,” Signal Process.,Image Commun., vol. 27, no. 5, pp. 513–521, May 2012.

[39] S. Meng, L. Liu, and J. Yin, “Scalable and reliable IPTV service throughcollaborative request dispatching,” in Proc. IEEE Int. Conf. Web Services(ICWS), Jul. 2010, pp. 179–186.

[40] S. Milani, S. Busato, and G. Calvagno, “Multiple description peer-to-peer video streaming using coalitional games,” in Proc. 19th Eur. SignalProcess. Conf. (EUSIPCO), Aug./Sep. 2011, pp. 1105–1109.

[41] F. Fu and M. van der Schaar, “A systematic framework for dynamicallyoptimizing multi-user wireless video transmission,” IEEE J. Sel. AreasCommun., vol. 28, no. 3, pp. 308–320, Apr. 2010.

[42] S. Milani, M. Gaggio, and G. Calvagno, “3DTV streaming over peer-to-peer networks using FEC-based noncooperative multiple description,” inProc. IEEE Symp. Comput. Commun. (ISCC), Jun./Jul. 2011, pp. 13–18.

[43] T. Wiegand, Version 3 of H.264/AVC, document JVT of ISO/IEC MPEG& ITU-T VCEG ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6,Redmond, WA, USA, Jul. 2004.

[44] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss,“An architecture for differentiated service,” IETF, Toronto, ON, Canada,Tech. Rep. RFC 2475, Dec. 1998.

[45] Wireless LAN Medium Access Control (MAC) and Physical Layer(PHY) Specifications: Medium Access Control (MAC) Enhancements forQuality of Service (QoS), IEEE Standard 802.11e-2005.

[46] J. Heinanen and R. Guerin, “A two rate three color marker,” IETF,Toronto, ON, Canada, Tech. Rep. RFC 2698, Sep. 1999.

[47] S. Milani and G. Calvagno. (2009, May). A low-complexity rateallocation algorithm for joint source-channel video coding. SignalProcess., Image Commun. [Online]. 24(5), pp. 368–383. Available:http://dx.doi.org/10.1016/j.image.2009.02.005

[48] L. Lima, M. Mauro, T. Anselmo, D. Alfonso, and R. Leonardi, “Optimalrate adaptation with integer linear programming in the scalable extensionof H.264/AVC,” in Proc. 18th IEEE Int. Conf. Image Process. (ICIP),Sep. 2011, pp. 1625–1628.

[49] S. Milani and G. Calvagno, “Low-complexity cross-layer optimizationalgorithm for video communication over wireless networks,” IEEETrans. Multimedia, vol. 11, no. 5, pp. 810–821, Aug. 2009.

[50] A. Fiandrotti, D. Gallucci, E. Masala, and E. Magli, “Traffic prioritiza-tion of H.264/SVC video over 802.11e ad hoc wireless networks,” inProc. 17th Int. Conf. Comput. Commun. Netw. (ICCCN), Aug. 2008,pp. 1–5.

[51] H. Liu and Y. Zhao, “Adaptive EDCA algorithm using video predictionfor multimedia IEEE 802.11e WLAN,” in Proc. Int. Conf. WirelessMobile Commun. (ICWMC), Jul. 2006, p. 10.

[52] I. V. Bajic, “The effects of channel correlation on the performanceof some multiple description schemes,” in Proc. 10th Can. WorkshopInform. Theory (CWIT), Edmonton, AB, Canada, Jun. 2007, pp. 93–96.

[53] R. Bernardini, M. Durigon, R. Rinaldo, P. Zontone, and A. Vitali, “Real-time multiple description video streaming over QoS-based wirelessnetworks,” in Proc. Int. Conf. Image Process. (ICIP), San Antonio, TX,USA, Sep. 2007, pp. 245–248.

[54] M. J. Osborne, An Introduction to Game Theory. London, U.K.: OxfordUniv. Press, 2004.

[55] D. Gross and C. M. Harris, Fundamentals of Queueing Theory.New York, NY, USA: Wiley, 1998.

[56] S. Milani, G. Calvagno, R. Bernardini, and P. Zontone, “Cross-layer jointoptimization of FEC channel codes and multiple description coding forvideo delivery over IEEE 802.11e links,” in Proc. 2nd Int. Conf. NextGenerat. Mobile Appl., Services Technol. (NGMAST), Cardiff, Wales,Sep. 2008, pp. 472–478.

[57] A. Bovik, The Essential Guide to Image Processing. New York, NY,USA: Academic, 2009.

[58] K. Nichols, S. Blake, F. Baker, and D. Black, “Definition of thedifferentiated services field (DS field) in the IPv4 and IPv6 headers,”IETF, Toronto, ON, Canada, Tech. Rep. RFC 2474, Dec. 1998.

[59] S. Milani. (2014). Simone Milani’s Home Page [Online]. Available:http://www.dei.uniud.it/∼sim1mil/MDP2Pgametheory

[60] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: From error visibility to structural similarity,” IEEETrans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[61] F. Xiao. (2000). DCT-Based Video Quality Evaluation [Online].Available: http://compression.ru/video/quality_measure/vqm.pdf

Simone Milani (M’09) was born in Cam-posampiero, Italy, in 1978. He received the Laureadegree in telecommunication engineering and thePh.D. degree in electronics and telecommunicationengineering from University of Padova, Padova,Italy, in 2002 and 2007, respectively.

He was a Visiting Ph.D. Student with University ofCalifornia at Berkeley, Berkeley, CA, USA, in 2006.He was a Post-Doctoral Researcher with Universityof Udine, Udine, Italy; University of Padova; andPolitecnico di Milano, Milan, Italy, from 2007 to

2013. He has also been with STMicroelectronics, Agrate, Italy. He is currentlyan Assistant Professor with the Department of Information Engineering,University of Padova. His research interests include digital signal processing,image and video coding, 3-D video processing and compression, joint source-channel coding, robust video transmission, distributed source coding, multipledescription coding, and multimedia forensics.

Giancarlo Calvagno (M’92) was born in Venezia,Italy, in 1962. He received the Laurea degreein ingegneria elettronica and the Ph.D. degree inelectrical engineering from University of Padova,Padova, Italy, in 1986 and 1990, respectively.

He was with the Coordinated Science Laboratory,University of Illinois at Urbana-Champaign, Cham-paign, IL, USA, from 1988 to 1990, as a VisitingScholar. Since 1990 he has been with the Diparti-mento di Ingegneria dell’ Informazione, Universityof Padova, where he is currently an Associate Pro-

fessor. His research interests include signal reconstruction, multidimensionalsignal processing, and image and video coding.

Simone Milani was born in Camposampiero, Italy,in 1978. He received the Laurea degree in telecom-munication engineering and the Ph.D. degree inelectronics and telecommunication engineering fromUniversity of Padova, Padova, Italy, in 2002 and2007, respectively. He was a Visiting Ph.D. Studentwith University of California at Berkeley, Berke-ley, CA, USA, in 2006. He was a Post-DoctoralResearcher with University of Udine, Udine, Italy;University of Padova; and Politecnico di Milano,Milan, Italy, from 2007 to 2013. He has also been

with STMicroelectronics, Agrate, Italy. He is currently an Assistant Professorwith the Department of Information Engineering, University of Padova. Hisresearch interests include digital signal processing, image and video coding,3-D video processing and compression, joint source-channel coding, robustvideo transmission, distributed source coding, multiple description coding,and multimedia forensics.

Marco Tagliasacchi is currently Associate Professorat the “Dipartimento di Elettronica, Informazione eBioingegneria - Politecnico di Milano”, Italy. Hereceived the “Laurea” degree (2002, cum Laude)in Computer Engineering and the Ph.D. in Elec-trical Engineering and Computer Science (2006),both from Politecnico di Milano. He was visitingacademic at the Imperial College London (2012)and visiting scholar at the University of Califor-nia, Berkeley (2004). His research interests includemultimedia forensics, multimedia communications

(visual sensor networks, coding, quality assessment) and information retrieval.Dr. Tagliasacchi coauthored more than 120 papers in international journalsand conferences, including award winning papers at MMSP 2013, MMSP2012, ICIP 2011, MMSP 2009 and QoMex 2009. He has been activelyinvolved in several EU-funded research projects. He co-coordinated two ICT-FP7 FET-Open projects (GreenEyes and REWIND). Dr. Tagliasacchi is anelected member of the IEEE Information Forensics and Security Technicalcommittee for the term 2014-2016, and served as member of the IEEE MMSPTechnical Committee for the term 2009-2012. He is currently Associate Editorfor the IEEE Transactions on Circuits and Systems for Video Technologies(2011 best AE award) and APSIPA Transactions on Signal and InformationProcessing. Dr. Tagliasacchi was General co-Chair of IEEE Workshop onMultimedia Signal Processing (MMSP 2013, Pula, Italy) and TechnicalProgram Coordinator of IEEE International Conference on Multimedia&Expo(ICME 2015, Turin, Italy).

Stefano Tubaro was born in Novara in 1957. Hecompleted his studies in Electronic Engineering atthe Politecnico di Milano, Italy, in 1982. He thenjoined the Dipartimento di Elettronica, Informazionee Bioingegneria of the Politecnico di Milano, first asa researcher of the National Research Council, andthen (in November 1991) as an Associate Professor.Since December 2004 he has been appointed as FullProfessor of Telecommunication at the Politecnicodi Milano. His current research interests are onadvanced algorithms for video and sound processing.

Stefano Tubaro authored over 180 scientific publications on internationaljournals and congresses and he is co-author of more than 15 patents. Inthe past few years he has focused his interest on the development ofinnovative techniques for image and video tampering detection and, in general,for the blind recovery of the “processing history” of multimedia objects.Stefano Tubaro coordinates the research activities of the Image and SoundProcessing Group (ISPG) at the Dipartimento di Elettronica, Informazionee Bioingegneria of the Politecnico di Milano. He had the role of ProjectCoordinator of the European Project ORIGAMI: A new paradigm for high-quality mixing of real and virtual and of the research project ICT-FET-OPENREWIND: REVerse engineering of audio-VIsual coNtent Data. This lastproject was aimed at synergistically combining principles of signal processing,machine learning and information theory to answer relevant questions on thepast history of such objects. Stefano Tubaro is a member the IEEE MultimediaSignal Processing Technical Committee and of the IEEE SPS Image Videoand Multidimensional Signal Technical Commitee. He was in the organizationcommittee of a number of international conferences: IEEE-MMSP-2004/2013,IEEE-ICIP-2005, IEEE-AVSS-2005/2009, IEEE-ICDSC-2009, IEEE-MMSP-2013, IEEE-ICME-2015 to mention a few. From May 2012 to April 2015he was an Associate Editor of the IEEE Transactions on Image Processing,currently he is an Associate Editor of the IEEE Transactions on InformationForensics and Security.