Download - Multimedia 1 2(Song Ngu)

5/17/2018 Multimedia 1 2(Song Ngu) - slidepdf.com

http://slidepdf.com/reader/full/multimedia-1-2song-ngu 1/1719/14/2006 Nguyen Chan Hung – Hanoi University of Technology 1

Công nghệ Multimedia

Khái quát Giới thiệu

Chương 1: Nền tảng k ĩ thuật nén Chương 2: Các k ĩ thuật multimedia

Jpeg

Mpeg-1/Mpeg-2 Audio&Video Mpeg-4 Mpeg-7 (Giới thiệu vắn tắt) HDTV (Giới thiệu vắn tắt) H261/H263 (Giới thiệu vắn tắt) Model-Based coding (Giới thiệu vắn tắt)

Chương 3: Mạng multimedia



Multimedia Technology

Overview Introduction

Chapter 1: Background of compressiontechniques

Chapter 2: Multimedia technologies JPEG

MPEG-1/MPEG-2 Audio & Video MPEG-4 MPEG-7 (brief introduction) HDTV (brief introduction) H261/H263 (brief introduction) Model base coding (MBC) (brief introduction)

Chapter 3: Multimedia Network



Giới thiệu Tầm quan tr ọng của các k ĩ thuật Multimedia: -> Multimedia có ởkhắp nơi

Trong PC: Real player, Quicktime, Media Âm nhạc, hình ảnh miễn phí trên internet (mp2, mp3, mp4, asf, ra, ram, mid,

DIVX, v..v...) Hội thảo tr ực tuyến âm thanh, hình ảnh Dịch vụ quảng cáo trên web, truyền số liệu Giáo dục từ xa. Y học từ xa ........

Trong truyền hình và các thiết bị điện tử dân dụng: DVB-T/DVB-C/DVB-S (Digital Video Broadcastsing-Terrestrial/Cable/Satellite _

Truyền hình số mặt đất/cáp/vệ tinh) -> biểu diễn MPEG-2 chất lượng cao hơnhẳn truyền hình tương tự truyền thống.

Truyền hình tương tác -> Các ứng dụng internet trên truyền hình (Mail,Web, E-commerce_thương mại điện tử) -> không cần đợi PC để khởi động, tắt máy. Các đầu đọc CD/VCD/DVD/Mp3

Đồng thời xuất hiện trên các thiết bị cầm tay ( ĐTDĐ thế hệ 3G, PDAkhông dây)


http://slidepdf.com/reader/full/multimedia-1-2song-ngu 4/171

9/14/2006 Nguyen Chan Hung – Hanoi University of Technology 4

Introduction

The importance of Multimedia technologies: Multimedia everywhere !! On PCs:

Real Player, QuickTime, Windows Media. Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,

mov, ra, ram, mid, DIVX, etc) Video/Audio Conferences. Webcast/ Streaming Applications Distance Learning (or Tele-Education) Tele-Medicine Tele-xxx (Let’s imagine !!)

On TVs and other home electronic devices:

DVB-T/DVB-C/DVB-S (Digital Video Broadcasting –Terrestrial/Cable/Satellite) shows MPEG-2 superior quality over

traditional analog TV !! Interactive TV Internet applications (Mail, Web, E-commerce) on a TV !! No need to wait for a PC to startup and shutdown !!

CD/VCD/DVD/Mp3 players Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!




Giới thiệu (2)

Mạng Multimedia Internet được thiết kế vào những năm 60 cho các

mạng tốc độ thấp với những ứng dụng văn bản

nhàm chán. -> Độ tr ễ cao, jitter cao. -> Những ứng dụng multimedia yêu cầu c ó sự biến

đổi mạnh mẽ của cơ sở hạ tầng internet. Nhiều cơ cấu tổ chức được nghiên cứu và triển khai

để hỗ tr ợ cho thế hệ multimedia internet tiếp theo.(VD: intServ, DiffServ)

Trong tương lai, tất cả mọi tivi (và PC) sẽ kết nối

internet và bắt sóng miễn phí với hàng triệu tr ạmphát sóng trên toàn thế giới. Hiện tại, mạng multimedia chạy trên ATM (đã cổ),

IPv4, và tương lai là IPv6 -> nên sẽ bảo đảm được

chất lượng dịch vụ QoS (Quality of Service)




Introduction (2)

Multimedia network The Internet was designed in the 60s for low-speed inter-

networks with boring textual applications High delay,

high jitter. Multimedia applications require drastic modifications

of the INTERNET infrastructure. Many frameworks have been being investigated and

deployed to support the next generation multimediaInternet. (e.g. IntServ, DiffServ) In the future, all TVs (and PCs) will be connected to the

Internet and freely tuned to any of millions broadcast

stations all over the World. At present, multimedia networks run over ATM (almostobsolete), IPv4, and in the future IPv6 shouldguarantee QoS (Quality of Service) !!




Chương 1: N ền tảng k ĩ thuật nén

Tại sao phải nén ? Trong truyền thông: Để thu hẹp dải thông trong các ứng

dụng mạng multimedia như streaming, video theo yêu cầu

VOD (video on demand), internet phone. Các vật chứa k ĩ thuật số (VCD, DVD, băng v..v..) -> giảm

kích cỡ, giảm g i á cả, tăng dung lượng và chất lượng cấtgiữ âm thanh, hình ảnh.

Hệ số nén hay tỉ lệ nén Tỉ lệ giữa dữ liệu nguồn v à dữ liệu nén (VD: 10:1)

2 loại nén: Nén không tổn hao Nén tổn hao




Chapter 1: Background of compressiontechniques

Why compression ? For communication: reduce bandwidth in multimedia

network applications such as Streaming media, Video-on-Demand (VOD), Internet Phone

Digital storage (VCD, DVD, tape, etc) Reduce size &

cost, increase media capacity & quality. Compression factor or compression ratio

Ratio between the source data and the compressed data.(e.g. 10:1)

2 types of compression: Lossless compression Lossy compression




2.1. Nội dung thông tin và dư thừa

Nội dung thông tin: Entropy là đại lượ ng đo của nội dung thông tin. Entropy

quy định giớ i hạn dướ i của tốc độbit hay dòng dữ liệu. -> Biểu diễn bở i bits/đơn vị nguồn đầu ra (như bits/pixel)

Tín hiệu càng nhiều thông tin thì entropy càng cao

Nén tổn hao thì làm giảm entropy còn nén không tổn haothì không

Dư thừa thông tin: Là sựkhác nhau giữa tốc độ thông tin và tốc độ bit

Thườ ng thườ ng tốc độ thông tin thấp hơn tốc độ bit r ất nhiều

Nén là để loại bỏ sựdư thừa




Information content and redundancy

Information rate Entropy is the measure of information content.

Expressed in bits/source output unit (such as bits/pixel).

The more information in the signal, the higher theentropy.

Lossy compression reduce entropy while lossless

compression does not. Redundancy

The difference between the information rate and bit

rate. Usually the information rate is much less than the bit

rate. Compression is to eliminate the redundancy.




2.2. Entropy (Bổ sung 1)

For a discrete source X with a finite alphabet of N symbols (x 0, . . ., xN .1)and a probability mass function of p(x), the entropy of the source inbits/symbol is given by

and measures the average number of bits/symbol required to describe thesource.

Such a discrete source is encountered in image compression, in which theacquired digital image pixels can take on only a finite number of values asdetermined by the number of bits used to represent each pixel.

It is easy to show (using the method of Lagrange multipliers) that theuniform distribution achieves maximum entropy, given by H(X) = log2 N .

A uniformly distributed source can be considered to have maximumrandomness when compared with sources having other distributions Combining this with the intuitive English text example mentioned previously, it is apparent that entropy provides a measure of the compressibility of a

source. High entropy indicates more randomness; hence the source

requires more bits on average to describe a symbol.




Entropy (bổ sung 2)

Calculating Entropy—An Example

An example illustrates the computation of entropy the difficulty indetermining the entropy of a fixed-length signal. Consider the

four-point signal [3/4 1/4 0 0]. There are three distinct values (or symbols) in this signal, with

probabilities 1/4, 1/4, and 1/2 for the symbols 3/4, 1/4, and 0,respectively. The entropy of the signal is then computed as

This indicates that a variable length code requires 1.5bits/symbol on average to represent this source.

In fact, a variable-length code that achieves this entropy is [10 110] for the symbols [3/4 1/4 0].




2.3. Nén không tổn hao

Dữ liệu giải mã giống hệt dữ liệu nguồn VD: Các file đầu r a của các chương trình tiện ích

như pkzip hay Gzip Hệ sốnén khoảng 2:1 – 5:1 (tùy theo độdư thừa

thông tin)

Không thể bảo đảm 1 tỉ lệ truyền cốđịnh -> vì tốcđộ dữ liệu đầu ra biến đổi -> nảy sinh các vấn đề

cho cơ cấu ghi và truyền thông.




Lossless Compression

The data from the decoder is identical to thesource data. Example: archives resulting from utilities such as

pkzip or Gzip

Compression factor is around 2:1.

Can not guarantee a fix compression ratioThe output data rate is variable problems

for recoding mechanisms or communicationchannel.




2.4. Nén tổn hao:

Dữ liệu giải nén khác dữliệu nguồn nhưng sự khácbiệt không thể phân biệt đượ c rõ ràng bằng tai

hoặc mắt thườ ng. Phù hợ p vớ i âm thanh, hình ảnh nén.

Hệ sốnén cao hơn so vớ i nén không tổn hao (lên tớ i100:1)

Dựa trên những kiến thức về sựnhận thức về thị

giác và thính giác Có thểấn định 1 hệ sốnén cố định




Lossy Compression

The data from the expander is not identical tothe source data but the difference can not be

distinguished auditorily or visually. Suitable for audio and video compression.

Compression factor is much higher than that of lossless. (up to 100:1)

Based on the understanding of

psychoacoustic and psychovisual perception. Can be forced to operate at a fixed

compression factor.




2.5. Quá trình nén:

Truyền thông (giảm chi phí kết nối dữ liệu) Dữ liệu -> Bộ nén (mã hoá) -> kênh truyền dẫn -> bộ

giãn (giải mã) -> dữ liệu Cơ cấu ghi (tăng thờ i gian phát lại: tỉ lệ vớ i hệ số

nén) Dữ liệu -> nén (mã hoá) -> thiết bị chứa (băng, đ ĩ a,

Ram ...) -> bộgiãn (giải mã) -> Dữ liệu




Process of Compression

Communication (reduce the cost of the datalink) Data→Compressor (coder)→transmission channel→Expander (decoder) →Data'

Recording (extend playing time: in proportionto compression factor Data →Compressor (coder) →Storage device

(tape, disk, RAM, etc.) →Expander (decoder) →Data‘




2.6. Lấy mẫu và lư ợ ng tử hoá:

Tại sao lấy mẫu? Máy tính không thểxử lí tr ực tiếp tín hiệu tương tự

PCM (Pulse code modulation) - Điều xung mã: Lấy mẫu tín hiệu tương tựở tốc độkhông đổi v à sửdụng một số bit

không đổi (thườ ng là 8 hay 16) để biểu diễn các mẫu.

Tốc độbit = tốc độ lấy mẫu * số bit/mẫu

Lượ ng tửhoá: Ánh xạcác tín hiệu tương tựđã lấy mấu (có độ chính xác vô

hạn) sang các mức r ờ i r ạc (độ chính xác hữu hạn)

Biểu diễn mỗi mức r ờ i r ạc bằng 1 số.




Sampling and quantization

Why sampling? Computer can not process analog signal directly.

PCM Sample the analog signal at a constant rate and

use a fixed number of bits (usually 8 or 16) to

represent the samples. bit rate = sampling rate * number of bits per

sample

Quantization Map the sampled analog signal (generally, infinite

precision) to discrete level (finite precision).

Represent each discrete level with a number.




2.7. Mã hoá dự đoán:

Dựđoán: Dùng các mẫu tr ướ c đó đểướ c lượ ng mẫu hiện thờ i.

Đối vớ i hầu hết tín hiệu, sự khác nhau của giá tr ị dựđoán vớ i giátr ị thực tế là nhỏ -> ta có thể dùng số bit nhỏ hơn đểmã hoá sựsai khác trong khi vẫn duy trì đượ c cùng 1 độ chính xác.

Gửi đi độ sai khác của mẫu vớ i giá tr ị dựđoán đượ c tạo r a từ các

mẫu tr ướ c. Nhiễu là hoàn toàn không thể dựđoán đượ c

Hầu hết các Codec yêu cầu dữ liệu phải đượ c xử lí tr ướ c, nếu

không Codec sẽ hoạt động kém khi có nhiễu.

ổ




Predictive Coding (bổ sung)

In predictive coding, rather than directly coding the data itself, the coded data consists of a difference signal formed by subtracting a prediction of the data from the data itself.

The prediction for the current sample is usually formed using past data . A predictiveencoder and decoder are shown in Figure, with the difference signal given by d . If the

internal loop states are initialized to the same values at the beginning of the signal, then y = x . If the predictor is ideal at removing redundancy, then the difference signal contains

only the “new” information at each time instant that is unrelated to previous data. This “new” information is sometimes referred to as the innovation , and d is called the

innovations process . If predictive coding is used, an appropriate predictor must bedetermined.




Predictive coding

Prediction Use previous sample(s) to estimate the current

sample. For most signal, the difference of the prediction

and actual values is small. We can use smaller

number of bits to code the difference whilemaintaining the same accuracy !!

Noise is completely unpredictable Most codec requires the data being preprocessed or

otherwise it may perform badly when the data containsnoise.

á ố ê




2.8. Mã hoá thống kê: Mã Huffman

Gán mã ngắn cho mẫu có xác suất xuất hiện caovà gán mã dài cho mẫu ít xuất hiện hơn

Sựgán bit dựa trên sự thống kê của dữ liệunguồn.

Thống kê dữ liệu nguồn đượ c thực hiện tr ướ c quátrình gán bit.

Còn gọi là VLC – Variable Length Coding

(Một v í dụvề Huffman code) Mã Morse..




Statistical coding: the Huffman code

Assign short code to the most probable datapattern and long code to the less frequent

data pattern. Bit assignment based on statistic of the

source data.

The statistics of the data should be knownprior to the bit assignment.

2 9 Nh điể é




2.9. Như ợ c điểm của nén:

Dễ gây lỗi dữ liệu Nén loại bỏphần dư thừa tuy nhiên những phần này

lại l à yếu tốcần thiết đểngăn c h o dữ liệu không bị lỗi.

Đòi hỏi yêu cầu che giấu đối vớ icác ứng dụng thờ igian thực Cần thêm mã sửa lỗi, do đó cộng thêm phần dư thừavào dữ liệu nén.

Méo nhân tạo (Artifact): Xuất hiện khi mã hoá loại bỏ 1 phần entropy Hệ sốnén càng cao càng có nhiều méo nhân tạo.

D b k f i




Drawbacks of compression

Sensitive to data error Compression eliminates the redundancy which is essential

to making data resistant to errors.

Concealment required for real time application Error correction code is required, hence, adds redundancy

to the compressed data.

Artifacts Artifacts appear when the coder eliminates part of the

entropy.

The higher the compression factor, the more the artifacts.

2 10 Một í d ề ã h á Tậ hợ á điể




2.10. Một ví dụ v ề mã hoá: Tập hợp các điểmmàu.

Trong 1 tấm ảnh, giá tr ị điểm ảnh được tập hợp trongvài cực đại.

Mỗi tập hợp đại diện cho 1 vùng màu của 1 đối tượng

trong ảnh (ví dụ: bầu tr ời xanh) Quá trình mã hoá:

Chia giá tr ị điểm ảnh thành 1 số lượng giới hạn củacác tập hợp

dữ liệu. (VD: tập hợp các điểm ảnh của bầu tr ời xanh hay đồngcỏ xanh) Gửi thông tin của tấm ảnh bao gồm màu chính của mỗi tập hợp

và 1 con số nhận dạng cho mỗi tập hợp.

Với mỗi điểm ảnh, truyền đi: Màu trung bình của vùng màu mà nó gần nhất Sự khác nhau của nó so với tập hợp màu trung bình ( -> có thể

được mã hoá để giảm dư thừa khi mà các sự sai khác gần nhưnhau) -> có thể dự đoán

A di l Cl t i l i l




A coding example: Clustering color pixels

In an image, pixel values are clustered in severalpeaks Each cluster representing the color range of one

object in the image (e.g. blue sky) Coding process:

1. Separate the pixel values into a limited number of dataclusters (e.g., clustered pixels of sky blue or grass green)

2. Send the average color of each cluster and anidentifying number for each cluster as side information.

3. Transmit, for each pixel:

The number of the average cluster color that it is close to. Its difference from that average cluster color. ( can be

coded to reduce redundancy since the differences are oftensimilar !!) Prediction

2 11 Mã h á i i kh




2.11. Mã hoá vi sai khung:

Mã hoá vi sai khung = dự đoán từ khung hìnhtr ước đó.

1 khung hình được chứa trong bộ mã hoá để sosánh với khung hiện tại -> gây ra độ tr ễ 1 khung Với ảnh t ĩ nh:

Chỉ cần gửi dữ liệu của 1 khung đầu tiên Toàn bộ sai số dự đoán sau có giá tr ị 0 Thỉnh thoảng truyền lại khung để cho phép bên nhận (nếu

mới được bật) có được điểm khởi đầu

-> FDC giảm thông tin của ảnh t ĩ nh nhưng lại đểsót lại khá nhiều dữ liệu cho ảnh động (VD: mộtchuyển động của camera)

Frame Differential Coding




Frame-Differential Coding

Frame-Differential Coding = prediction from aprevious video frame.

A video frame is stored in the encoder for

comparison with the present frame causesencoding latency of one frame time. For still images:

Data can be sent only for the first instance of a frame All subsequent prediction error values are zero. Retransmit the frame occasionally to allow receivers that

have just been turned on to have a starting point.

FDC reduces the information for still images, butleaves significant data for moving images (e.g. amovement of the camera)

2 12 Dự bá bù h ể độ




2.12. Dự báo bù chuyển động Dữ liệu trong FDC có thể bị loại bỏ bằngcách so sánh điểm ảnh hiện tại với vị trí

của đối tượng tương ứng trong khunghình tr ước đó (-> chứ không phải vị tríkhông gian tương ứng trong khung tr ước

đó) Bộ mã hoá ước lượng sự chuyển động

trong ảnh để tìm vùng tương ứng trongkhung hình tr ước đó

Bộ mã hoá tìm phần giống của khungtr ước với khung mới sắp truyền đi.

Sau đó n ó gửi 1 Véctơ chuyển động,véctơ này sẽ cho bộ giải mã biết phầnnào của khung tr ước đó sẽ được dùngđể dự đoán khung mới.

Đồng thời n ó cũng gửi sai số dự đoánđể khôi phục khung mới .

Sơ đồ trên -> không có bù chuyển động.Sơ đồ dưới -> có bù chuyển động.

Motion Compensated Prediction




Motion Compensated Prediction

More data in Frame-Differential Coding canbe eliminated by comparing the presentpixel to the location of the same objectin the previous frame. ( not to thesame spatial location in the previous frame)

The encoder estimates the motion in theimage to find the corresponding area in aprevious frame.

The encoder searches for a portion of aprevious frame which is similar to the part

of the new frame to be transmitted. It then sends (as side information) amotion vector telling the decoder whatportion of the previous frame it will use topredict the new frame.

It also sends the prediction error so thatthe exact new frame may be reconstituted See top figure without motion

compensation – Bottom figureWithmotion compensation

Motion compensation (Bổ sung)




Motion compensation (Bổ sung)

Actions:

1. Compute Motion

Vector 2. Shift Data from Picture

N Using Vector to MakePredicted Picture N+1

3. Compare ActualPicture with PredictedPicture

4. Send Vector andPrediction Error

2 12 1 Thông tin không thể dự báo




2.12.1. Thông tin không thể dự báo

Thông tin không thể dự báo từ khung tr ướcđó:

1. Sự thay đổi của phông nền (VD: phong cảnh nềnthay đổi)

2. Thông tin mới của vật thể bị che phủ mới lộ ra

do chuyển động của vật thể ngang qua nền,hoặc rìa của khung phong cảnh (VD: khuôn mặtcủa cầu thủ bị che bởi trái bóng đang bay)

Unpredictable Information




Unpredictable Information

Unpredictable information from the previousframe:

1. Scene change (e.g. background landscapechange)

2. Newly uncovered information due to object

motion across a background, or at the edges of apanned scene. (e.g. a soccer ’s face uncoveredby a flying ball)

2 12 2 Xử lí thông tin không thể dự




2.12.2. Xử lí thông tin không thể dự

báo trước (bổ sung) Phông thay đổi

ảnh mã hoá trong phải được gửi đầu tiên ->yêu cầu nhiều dữ liệu hơnảnh dự đoán (P picture)

Ảnh mã hóa trong được gửi 2 lần/s -> Thời gian và tần số gửi c ó t hể đượcđiều chỉnh để phù hợp với sự thay đổi phông.

Thông tin bị che khuất: Ảnh mã hoá dự đoán hai chiều Bi-directionally

Trong hệ thống phải có đủ chỗ chứa khung để chờ ảnh phía sau để có đượcthông tin mong muốn. Để giới hạn bộ nhớ của bộ giải mã, bộ mã hóa chứa các ảnh và gửi các ảnh

tham khảo đượcyêucầu tr ước khi gửi ảnh dự đoán hai chiều

Trong kỹ thuật nén MPEG:

Các ảnh được nén trong được gọi là ảnh loại I (I picture) Các ảnh được mã hóa chỉ sử dụng các ảnh tham chiếu ngược gọi là ảnh P

hay ảnh dự đoán (P picture) Các ảnh được mã hóa từ việc nội suy cả các ảnh tham chiếu ngược và tham

chiếu thuận gọi là ảnh B (B picture)

Dealing with unpredictable Information




Dealing with unpredictable Information

Scene change An Intra-coded picture (MPEG I picture ) must be sent for astarting point require more data than Predicted picture (P picture )

I pictures are sent about twice per second Their time and sendingfrequency may be adjusted to accommodate scene changes

Uncovered information Bi-directionally coded type of picture, or B picture . There must be enough frame storage in the system to wait for the

later picture that has the desired information.

To limit the amount of decoder’s memory, the encoder storespictures and sends the required reference pictures beforesending the B picture.

In MPEG: Pictures which are intracoded only are termed I pictures; Pictures which are encoded using only backward references are

termed P pictures for Predictive Pictures which are encoded frominterpolation of both a backward

reference and a forward reference are termed B pictures

2.13. Mã hoá biến đổi (Transform Coding)




2.13. Mã hoá biế n đổi (Transform Coding)

Biến đổi giá tr ị không gian của điểm ảnh thành cácgiá tr ị của các hệ số biến đổi trong miền tần số

Số hệ số tạo r a bằng với số điểm ảnh được biếnđổi

Chỉ một số ít hệ số chứa hầu hết nội dung (năng

lượng) của ảnh các hệ số này có thể được mãhoá tiếp bởi mã hoá entropy không tổn hao

Quá trình biến đổi tập trung năng lượng vào các hệ

số đặc biệt (chủ yếu là các hệ số có tần số thấp)

Transform Coding




Transform Coding

Convert spatial image pixel values totransform coefficient values

the number of coefficients produced isequal to the number of pixels transformed. Few coefficients contain most of the

energy in a picture coefficients may befurther coded by lossless entropy coding

The transform process concentrates the energy into particular coefficients (generally the “low frequency” coefficients )

Mã hoá biến đổi (Transform Coding) (2)




Mã hoá biế n đổi (Transform Coding) (2)

Khái niệm về histogram..

2 13 1 Các loại mã biến đổi ảnh:




2.13.1. Các loại mã biế n đổi ảnh:

Các loại mã hoá ảnh: Fourier r ời r ạc (DFT) Karhonen-Loeve Walsh-Hadamard Lapped orthogonal Cosine r ời r ạc (DCT) -> dùng trong MPEG 2

Wavelet -> Mới Những sự khác biệt giữa các phương pháp mã hoá

biến đổi: Khả năng tập trung năng lượng vào một số ít hệ số Vùng ảnh hưởng của mỗi hệ số trong ảnh khôi phục Sự xuất hiện và khả năng nhìn thấy các nhiễu mã hóa sinh

ra do sự lượng tử hoá các hệ số biến đôi

Types of picture transform coding




Types of picture transform coding

Types of picture coding: Discrete Fourier (DFT) Karhonen-Loeve Walsh-Hadamard Lapped orthogonal Discrete Cosine (DCT) used in MPEG-2 !

Wavelets New ! The differences between transform coding methods:

The degree of concentration of energy in a few coefficients

The region of influence of each coefficient in thereconstructed picture The appearance and visibility of coding noise due to coarse

quantization of the coefficients

2.13.2. Mã hoá DCT có tổn hao




Mã hoá không tổn hao không thể đạt đượchệ số nén cao (khoảng 4:1 hoặc í t hơn)

Mã hoá tổn hao = loại bỏ thông tin 1 cáchchọn lọc sao cho khó phân biệt giữa sảnphẩm nguồn v à sản phẩm được tái tạo bằng

thị giác và thính giác hoặc gây ra ít sự méodạng nhất. Mã hoá tổn hao có thể được thực hiện bởi:

Loại bỏ một số hệ số DCT Điều chỉnh độ thô của quá trình lượng tử hóa các

hệ số -> biện pháp tốt hơn.

DCT Lossy Coding




y g

Lossless coding cannot obtain highcompression ratio (4:1 or less)

Lossy coding = discard selective informationso that the reproduction is visually or aurallyindistinguishable from the source or havingleast artifacts.

Lossy coding can be achieved by: Eliminating some DCT coefficients Adjusting the quantizing coarseness of the

coefficients better !!

2.14. Hiện tượng mặt nạ




ệ ợ g ặ ạ

Hiện tượng mặt nạ làm cho một số loại nhiễu mãhóa tr ở nên không nhìn thấy hoặc không nghe thấyđược. Trong audio, 1 âm thuần nhất sẽ che dấu năng lượng ở

cả tần số cao hơn và thấp hơn (với ảnh hưởng yếu hơn)

Trong video, những lề tương phản cao che dấu nhiễu

ngẫu nhiên Nhiễu sinh ra với tốc độ bit thấp và thuộc một

trong các loại tần số, không gian, hoặc thời gian.

Ví dụ về mặt nạ âm thanh: tiếng bom nổ át tiếngchim hót..

Masking




g

Masking make certain types of coding

noise invisible or inaudible due to some

psycho-visual/acoustical effect. In audio, a pure tone will mask energy of higher

frequency and also lower frequency (with weaker

effect). In video, high contrast edges mask random noise.

Noise introduced at low bit rates falls in thefrequency, spatial, or temporal regions

2.15. Lượng tử hoá biế n đổi:




g

Lượng tử hoá biến đổi l à k ĩ thuật chính trong mã hoá tổn hao làm giảm đáng kể tốc độ bit

Trong một biến đổi, lượng tử hoá thô các hệ số không quan

tr ọng ( ít được chú ý, có năng lượng thấp, khó nhìn thấy hoặc

nghe được)

Có thể áp dụng cho toàn bộ một tín hiệu hay cho các thành phầntần số riêng lẻ của một tín hiệu đã được mã hóa biến đổi.

Lượng tử hoá biến đổi cũng đồng thời điều khiển tốc độ

bit để: Biến một dòng bít thành một kênh tốc độ bit không đổi

Ngăn cản hiện tượng bộ đệm tràn hoặc r ỗng.

Variable quantization




Variable quantization is the main technique of lossycoding greatly reduce bit rate.

Coarsely quantizing the less significant coefficientsin a transform ( less noticeable / low energy / lessvisible/audible)

Can be applied to a complete signal or to individualfrequency components of a transformed signal.

VQ also controls instantaneous bit rate in order to: Match average bit rate to a constant channel bit rate. Prevent buffer overflow or underflow.

2.16. Mã hoá Run-level




Mã hoá Run-level = mã hoá một dòng zerotheo sau bởi một giá tr ị khác zero

Thay vì gửi tất cả các giá tr ị zero 1 cách riêng biệtthì chỉ gửi chiều d à i của dòng dữ liệu.

Hữu ích cho các dữ liệu có dòng Zero dài

Các dòng này dễ mã hoá bởi mã Huffman

Ví dụ (Ví dụ 1 người chăn bò đếm bòđực và bò cái)

Run-Level coding




"Run-Level" coding = Coding a run-length of zeros followed by a nonzero level.

Instead of sending all the zero valuesindividually, the length of the run is sent.

Useful for any data with long runs of zeros.

Run lengths are easily encoded by Huffman code

Mã hoá Run-level ( Bổ sung)




Let an event represent the pair (run, level), where “run” represents the

number of zeros and “level” represents the magnitude of thenonzero coefficient. This coding process is sometimes called “run-length coding” Then, a

table is built to represent each event by a specific codeword (i.e., a sequenceof bits).

Events that occur more often are represented by shorter codewords,and less frequent events are represented by longer codewords.

This entropy coding process is therefore called VLC or Huffman coding.

Table shows part of a sample VLC table. In this table, the last bit “s” of each codeword denotes the sign of the level, “0” for positive and “‘1” for negative.

It can be seen that more likely events (i.e., short runs and low levels), arerepresented with short codewords, and vice versa.

At the decoder, all the above steps are reversed one by one. All the steps can be exactly reversed except for the quantization step,

which is where loss of information arises This is known as “lossy”compression.

Bảng VLC mẫu




Mố i liên hệ giữ a các kỹ thuật đã học




Quy trình nén MPEG Dự báo bù chuyển động (MOTION

ESTIMATION) Mã hóa biến đổi (DISCRETE COSINE

TRANSFORM - DCT)

Lượng tử hóa biến đổi (QUANTIZATION)

ZIG ZAG SCAN RUN LEVEL CODING (RLC) Mã hóa thống kê - Huffman (VARIABLE

LENGTH CODING – VLC)

Mố i liên hệ giữ a các kỹ thuật nén




Các phươngpháp nén

Nén khôngtổn hao

Nén tổn hao

Mã hóa

biến đổi

VLC

(Huffman)

RLC Lượng tửhóa biến đổi

Mã hóa

dự đoán

2.17. Tổng kế t:




Quá trình nén Lấy mẫu v à lượng tử hoá

Mã hoá: Mã hoá tổn hao và không tổn hao

Mã hoá vi sai khung Dự báo bù chuyển động

Lượng tử hoá biến đổi

Mã hoá Run-level Hiện tượng mặt nạ

Key points:




Compression process Quantization & Sampling

Coding: Lossless & lossy coding

Frame-Differential Coding Motion Compensated Prediction

Variable quantization

Run level coding Masking

Mã hóa Huffman (bổ sung) Bài tập mẫuA i l l f th f H ff d f i id i i hi h




As a simple example of the use of Huffman codes for images, consider an image in whichthe pixels (or the difference values) can have one of 8 brightness values.

This would require 3 bits per pixel (2^3=8) for conventional representation. From ahistogram of the image, the frequency of occurrence of each value can be determined andas an example might show the following results (Table 1), in which the various brightnessvalues have been ranked in order of frequency. Huffman coding provides a straightforwardway to assign codes from this frequency table, and the code values for this example areshown.

Note that each code is unique and no sequence of codes can be mistaken for any other value, which is a characteristic of this type of coding. Table 1. Example of Huffman codes assigned to brightness values

Brightness Value Frequency Huffman Code4 0.45 15 0.21 013 0.12 00116 0.09 00102 0.06 00017 0.04 000011 0.02 000000

0 0.01 000001 Notice that the most commonly found pixel brightness value requires only a single bit, but

some of the less common values require 5 or 6 bits, more than the three that a simplerepresentation would need. Multiplying the frequency of occurrence of each value times thelength of the code gives an overall average of

0.45·1 + 0.21·2 + 0.12·4 + 0.09·4 + 0.06·4 + 0.04·5 + 0.02·6 + 0.01·6 = 2.33 bits/pixel

Bài tập chương 1




BT 1: Cho bảng 1 ( không có phần mã Huffman) Hỏi: ( chữa BT mẫu) Entropy của ảnh trên là bao nhiêu

BT 2 : (có bảng mã HM) câu hỏi: (Ôn tập) Nếu mã hóa nhị phân bình thường thì cần bao nhiêu bit

Nếu mã hóa Huffman thì cần bao nhiêu bit nhận xé t sựhiệu quả của mã HM.

Có nhận xét gì về bảng mã hóa HM (độ dài từ mã)

BT3: (chữa mẫu v à ô n tập) Cho hai hình vẽ về 2 ảnh, tính ra số bit cần thiết để mã

hóa.. (TH số)

BT3:




Tính xem số bit tối thiểu để mã hóa 2 ảnhsau: Hinh trái 63 con 0 và 1 con 1

Hình phải 32 con 0 và 32 con 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 10 0 0 0 0 0 0 00 0 0 0 0 0 0 0

0 0 1 0 1 0 0 0

0 0 0 0 0 0 0 00 0 1 0 1 0 1 00 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 10 0 0 0 0 0 0 00 0 0 0 0 0 0 0

BT 3 (chữ a)




Ảnh trái: H(x) = -63/64 log2 63/63 – 1/64 log2 1/64 = 0,116

bit/pixel Ảnh phải:

H(x) = -32/64 log2 32/64 – 32/64 log2 32/64 = 1

bit/pixel.

Chương 2: các k ĩ thuật multimedia




Nội dung JPEG

MPEG-1/MPEG-2 Video MPEG-1 Layer 3 Audio (mp3)

MPEG-4

MPEG-7 (giới thiệu) HDTV (giới thiệu)

H261/H263 (giới thiệu) Mã hoá dựa trên mô hình hóa (model base coding

- MBC) (giới thiệu)

Chapter 2: Multimedia technologies




Roadmap JPEG

MPEG-1/MPEG-2 Video

MPEG-1 Layer 3 Audio (mp3) MPEG-4

MPEG-7 (brief introduction)

HDTV (brief introduction) H261/H263 (brief introduction)

Model base coding (MBC) (brief introduction)

JPEG (Joint Photographic Experts Group –nhóm chuyên gia nghiên cứu ảnh)




nhóm chuyên gia nghiên cứ u ảnh) Bộ mã hoá JPEG

Chia ảnh thành các khối 8*8 pixels

Tính toán biến đổi cosine r ời r ạc cho mỗi khối

Bộ lượng tử hóa làm tròn hệ số DCT dựa theo ma tr ận lượng tử tổn

hao nhưng lại c ho tỉ lệ nén lớn Tạo ra 1 chuỗi cáchệ số DCT bằng cách quét ziczac

Dùng 1 mã dài biến đổi (Variable Length Code – VLC) để mã hóa các hệsố DCT

Ghi dòng dữ liệu nén ra file ( *.jpeg hay *.jpg)

Bộ giải mã JPEG File dòng dữ liệu vào IDCT (Inverse DCT – biến đổi DCT ngược)

ảnh

JPEG (Joint Photographic Experts Group)




JPEG encoder Partitions image into blocks of 8 * 8 pixels

Calculates the Discrete Cosine Transform (DCT) of each block.

A quantizer rounds off the DCT coefficients according to the

quantization matrix . lossy but allows for large compression ratios. Produces a series of DCT coefficients using Zig-zag scanning

Uses a variable length code (VLC) on these DCT coefficients

Writes the compressed data stream to an output file (*.jpg or *.jpeg).

JPEG decoder File input data stream Variable length decoder IDCT (Inverse

DCT) Image

JPEG – quét Zig-zag




JPEG – Zig-zag scanning




JPEG - DCT

DCT giống DFT > Biến đổi tín hiệu hoặc ảnh từ miền




DCT giống DFT -> Biến đổi tín hiệu hoặc ảnh từ miềnkhông gian sang miền tần số

DCT đòi hỏi ít phép nhân hơn DFT

Ảnh đầu vào A: Ảnh A là ma tr ận điểm ảnh có kích thước N2 (r ộng) * N1

(cao)

A(i,j) là độ chói của điểm ảnh ở hàng i cột j

Ảnh đầu r a B : B(k1,k2) là hệ số DCT ở hàng k1 và cột k2 của ma tr ận

DCT

JPEG - DCT

DCT is similar to the Discrete Fourier Transform




DCT is similar to the Discrete Fourier Transformtransforms a signal or image from the spatial domain tothe frequency domain.

DCT requires less multiplications than DFT

Input image A: The input image A is N2 pixels wide by N1 pixels high;

A(i,j) is the intensity of the pixel in row i and column j;

Output image B: B(k1,k2) is the DCT coefficient in row k1 and column k2 of

the DCT matrix

JPEG – Ma trận lượng tử hoá




Ma tr ận lượng tử hóa là ma tr ận 8*8 củacácbước lượng tử – mỗiphần tử ứng với một hệ số DCT

Thường là đối xứng Các bước lượng tử sẽ là:

Nhỏ ở phía trên bên trái (tần số thấp) Lớn ở phía dưới bên phải (tần số cao) Bước lượng tử = 1 là chính xác nhất

Bộ lượng tử chia hệ số DCT cho bước lượng tử tương ứng của nó,sau đó làm tròn tới số nguyên gần nhất Các bước lượng tử lớn sẽ làm cho các hệ số nhỏ giảm xuống bằng 0

Kết quả là: Nhiều hệ số tần số cao biến thành zero -> loại bỏ dễ dàng Các hệ số tần số thấp chỉ chịu sự điều chỉnh nhỏ.

JPEG - Quantization Matrix




The quantization matrix is the 8 by 8 matrix of step sizes(sometimes called quantums ) - one element for each DCTcoefficient.

Usually symmetric. Step sizes will be:

Small in the upper left (low frequencies), Large in the lower right (high frequencies)

A step size of 1 is the most precise. The quantizer divides the DCT coefficient by its corresponding

quantum, then rounds to the nearest integer. Large quantums drive small coefficients down to zero. The result:

Many high frequency coefficients become zero remove easily. The low frequency coefficients undergo only minor adjustment.

Minh hoạ quá trình mã hoá JPEG




1255 -15 43 58 -12 1 -4 -6

11 -65 80 -73 -27 -1 -5 1

-49 37 -87 8 12 6 10 8

27 -50 29 13 3 13 -6 5

-16 21 -11 -10 10 -21 9 -6

3 -14 0 14 -14 16 -8 4

-4 -1 8 -13 12 -9 5 -1

-4 2 -2 6 -7 6 -1 3

78 -1 4 4 -1 0 0 0

1 -5 6 -4 -1 0 0 0

-4 3 -5 0 0 0 0 0

2 -3 1 0 0 0 0 0

-1 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Q

DCT Coefficients Quantization result

K ết quả scan Zigzag : 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

dễ dàng mã hoá bằng Run-length Huffman

JPEG Coding process illustrated




1255 -15 43 58 -12 1 -4 -6

11 -65 80 -73 -27 -1 -5 1

-49 37 -87 8 12 6 10 8

27 -50 29 13 3 13 -6 5

-16 21 -11 -10 10 -21 9 -6

3 -14 0 14 -14 16 -8 4

-4 -1 8 -13 12 -9 5 -1

-4 2 -2 6 -7 6 -1 3

78 -1 4 4 -1 0 0 0

1 -5 6 -4 -1 0 0 0

-4 3 -5 0 0 0 0 0

2 -3 1 0 0 0 0 0

-1 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

Q

DCT Coefficients Quantization result

Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

Easily coded by Run-length Huffman coding

MPEG (Moving pic expert group – nhómchuyên gia nghiên cứ u ảnh động)




MPEG là trái tim của: Đầu thu TV k ĩ thuật số

Bộ giải mã HDTV Đầu đọc DVD

Hội thảo truyền hình

Internet video. v.v.. Các chuẩn MPEG:

MPEG – 1; MPEG – 2; MPEG - 4; MPEG – 7 MPEG – 3 bị bỏ qua và tr ở thành dạng mở r ộng

của MPEG2

MPEG (Moving Picture Expert Group)




MPEG is the heart of: Digital television set-top boxes

HDTV decoders DVD players

Video conferencing

Internet video, etc MPEG standards:

MPEG-1, MPEG-2, MPEG-4, MPEG-7 (MPEG-3 standard was abandoned and became

an extension of MPEG-2)

Các chuẩn MPEG:




MPEG –1 (đã lạc hậu) 1 chuẩn để lưu tr ữ và phục hồi hình ảnh âm thanh trên các vật liệu chứa

media (digital media)

Ứ ng dụng: VCD (video compact disk)

MPEG – 2 (ứng dụng r ộng rãi) 1 chuẩn cho tivi số

ứng dụng: DVD (digital versatile disk), HDTV(high definition TV), DVB(European Digital Video Broadcasting Group), v.v.

MPEG – 4 (mớ iứng dụn g – vẫn còn đang nghiên cứu) 1 chuẩn cho cácứng dụng multimedia vớ i độ nén cao

ứng dụng: Internet, TV cáp, studio ảo, v.v.

MPEG – 7 (vẫn đang nghiên cứu phát triển) Là 1 chuẩn hỗ tr ợ cho tìm kiếm thông tin (gọi là “Giao diện m ô tả nội dung

Multimedia” - MCDI)

Ứ ng dụng: Internet, Hệ thống tìm kiếm Video, thư viện số..

MPEG standards




MPEG-1 (Obsolete) A standard for storage and retrieval of moving pictures and audioon storage media

application: VCD (video compact disk)

MPEG-2 (Widely implemented) A standard for digital television Applications: DVD (digital versatile disk), HDTV (high definition

TV), DVB (European Digital Video Broadcasting Group), etc.

MPEG-4 (Newly implemented – still beingresearched) A standard for multimedia applications Applications: Internet, cable TV, virtual studio, etc.

MPEG-7 (Future work – ongoing research) Content representation standard for information search

( “Multimedia Content Description Interface”) Applications: Internet, video search engine, digital library

Các chuẩn MPEG-2 chính thứ c




Chuẩn Quốc Tế ISO/IEC 13818-2 “Phươngpháp mã hóa chung của ảnh động và âm

thanh kết hợp”) ATSC (Uỷ ban các hệ thống truyền hình tiên

tiến) tài liệu A/54 “Hướng dẫn sử dụng chuẩnti vi số ATSC)

MPEG-2 formal standards




The international standard ISO/IEC 13818-2"Generic Coding of Moving Pictures and

Associated Audio Information”

ATSC (Advanced Television SystemsCommittee) document A/54 "Guide to the Use of

the ATSC Digital Television Standard”

Cấ u trúc dữ liệu ảnh MPEG:




Dòng dữ liệu ảnh MPEG-2 được xây dựng theo các lớp từ thấp đếncao như sau: PIXEL là đơn vị cơ sở BLOCK là 1 mảng 8x8 pixels MACROBLOCK gồm 4 block luma và 2 block chroma (dùng cho

bù chuyển động, lượng tử hóa) SLICE gồm các macroblock với số lượng có thể thay đổi (để

khắc phục lỗi tryền dẫn) PICTURE gồm các khung (hoặc tr ường) của các slice GROUP OF PICTURE (GOP) gồm các picture với số lượng có

thể thay đổi SEQUENCE chứa các GOP với số lượng có thể thay đổi (dùng

để thiết lập các tham số Video) PACKETIZED ELEMENTARY STREAM – luông cơ sở đóng gói

(tùy chọn)

MPEG video data structure




The MPEG 2 video data stream is constructed in layers from lowest to highest as follows: PIXEL is the fundamental unit

BLOCK is an 8 x 8 array of pixels MACROBLOCK consists of 4 luma blocks and 2 chroma

blocks

SLICE consists of a variable number of macroblocks PICTURE consists of a frame (or field) of slices

GROUP of PICTURES (GOP) consists of a variable

number of pictures SEQUENCE consists of a variable number of GOP’s

PACKETIZED ELEMENTARY STREAM (opt)

MPEG layers




Pixel và block:




Pixel = “ phần tử ảnh” Là một điểm lấy mẫu trong không gian của tấm

ảnh

1 điểm ảnh màu có thể được đặc tr ưng số hoábằng một số lượng bit biểu diễn c h o mỗi giá tr ị

của 3 màu cơ bản Block

1 block = 1 ma tr ận 8x8 pixels

1 block là đơn vị cơ sở cho mã hoá DCT

Pixel & Block




Pixel = "picture element". A discrete spatial point sample of an image.

A color pixel may be represented digitally as anumber of bits for each of three primary color values

Block = 8 x 8 array of pixels.

A block is the fundamental unit for the DCT coding(discrete cosine transform).

Macroblock




1 macroblock = ma tr ận 16x16 của các điểm ảnh chói (Y) pixels ( =4 blocks = ma tr ận 2x2 block)

Số lượng của chroma pixel (Cr, Cb) thay đổi phụ thuộc vào cấu trúcmàu (chroma pixel) cấu trúc này được biểu thị ở phần tiếp đầucủa chuỗi (sequence) (ví dụ: 4:2:0)

Macroblock là đơn vị cơ sở cho bù chuyển động và sẽ có vectơchuyển động kết hợp với n ó nếu nó được mã hóa bằng mã dự đoán

1 macroblock được phân loại: Mã hóa theo tr ường ( 1 khung quét xen kẽ gồm 2 tr ường bán ảnh)

Mã hóa khung ( phụ thuộc vào cách rút ra 4 block từ mộtmacroblock)

Macroblock




A macroblock = 16 x 16 array of luma (Y) pixels ( =4 blocks = 2 x 2 block array). The number of chroma pixels (Cr, Cb) will vary

depending on the chroma pixel structureindicated in the sequence header (e.g. 4:2:0, etc) The macroblock is the fundamental unit for motion

compensation and will have motion vector(s)associated with it if is predictively coded. A macroblock is classified as

Field coded (

An interlaced frame consists of 2 field) Frame coded depending on how the four blocks are

extracted from the macroblock.

Slice

ề




Các ảnh (picture) được chia ra nhiều slice (dải) 1 slice gồm 1 số bất kì các macroblock liên tiếp

(từ trái sang phải), nhưng thông thường là 1hàng liền nhau của các macroblock.

1 slice không mở r ộng ra quá 1 hàng.

Tiếp đầu của Slice mang thông tin địa chỉ chophép bộ giải mã huffman đồng bộ lại ở các

biên của slice

Slice




Pictures are divided into slices. A slice consists of an arbitrary number of

successive macroblocks (going left to right),but is typically an entire row of macroblocks.

A slice does not extend beyond one row.

The slice header carries address informationthat allows the Huffman decoder to

resynchronize at slice boundaries

Picture

1 ảnh nguồn là 1 ma tr ận chữ nhật liền kề của các pixel




1 ảnh có thể là 1 khung video hoàn chỉnh (“frame picture”) hoặc1 tr ường quét xen kẽ từ 1 ảnh quét xen kẽ (“field picture”)

1 field pic không có 1 dòng tr ống nào giữa các dòng 1 ảnh (còn gọi là đơn vị truy nhập video) bắt đầu với một mã

khởi đầu v à một tiếp đầu. Tiếp đầu gồm: LoạI ảnh (I, P, B) Thông tin tham chiếu thời gian Khoảng tìm kiếm vectơ chuyển động Dữ liệu tuỳ chọn người sử dụng

1 frame picture gồm:

1 khung của nguồn quét liên tục (progressive) hay 2 bán ảnh quét xen kẽ của 1 ảnh nguồn quét xen kẽ

Picture

A i i i t l f i l




A source picture is a contiguous rectangular array of pixels. A picture may be a complete frame of video ("frame picture") or

one of the interlaced fields from an interlaced source ("fieldpicture").

A field picture does not have any blank lines between its activelines of pixels. A coded picture (also called a video access unit) begins with a

start code and a header . The header consists of: picture type (I, B, P) temporal reference information motion vector search range optional user data

A frame picture consists of: a frame of a progressive source or a frame (2 spatially interlaced fields) of an interlaced source

I, P, B Pictures

Ảnh mã hoá được chia làm 3 loạI: I, P, B

I i t I t d d Pi t (ả h ã hó t )




I picture = Intra coded Pictures (ảnh mã hóa trong) Tất cả các macroblock đều dùng mã hoá không có dự đoán

Ảnh I cần cho phép phía thu có “điểm bắt đầu” cho dự đoán sau khi thay đổikênh và cho phép khôi phục lại sau các lỗi.

P picture = Predicted Pictures ( ảnh dự đoán) Các macroblock có thể được mã hoá với dự đoán tr ước từ các ảnh tham

khảo I và P tr ước đó hoặc các macroblock có thể được mã hoá trong

B picture = Bi-directionally predicted pictures (ảnh dự đoán 2chiều) Các macroblock có thể được mã hoá bằng dự báo tr ước từ các ảnh tham

khảo I và P tr ước đó

Các macroblock có thể được mã hoá bằng dự báo sau từ các ảnh tham khảo

I và P tiếp theo Các macroblock có thể được mã hoá bằng dự đoán nội suy từ các ảnh tham

khảo I và P ở cả quá khứ và tương lai.

Các macroblock có thể được mã hoá trong (ko có dự đoán)

I, P, B Pictures

E d d i t l ifi d i t 3 types: I P and B




Encoded pictures are classified into 3 types: I, P, and B. I Pictures = Intra Coded Pictures

All macroblocks coded without prediction Needed to allow receiver to have a "starting point" for prediction after

a channel change and to recover from errors P Pictures = Predicted Pictures

Macroblocks may be coded with forward prediction from referencesmade from previous I and P pictures or may be intra coded

B Pictures = Bi-directionally predicted pictures Macroblocks may be coded with forward prediction from previous I

or P references Macroblocks may be coded with backward prediction from next I or

P reference Macroblocks may be coded with interpolated prediction from past

and future I or P references Macroblocks may be intra coded (no prediction)

Nhóm ảnh (GOP) Lớp GOP là tuỳ chọn trong MPEG2 GOP bắt đầu với m ã k hởi đầu và header

Header mang:




Header mang:

Thông tin về thời gian mã hóa Thông tin về soạn thảo Video (editing) Dữ liệu tuỳ chọn của người sử dụng

Ảnh mã hoá đầu tiên trong Gop luôn là ảnh I Chiều dàI điển hình là 15 pic với cấu trúc như sau (minh họa ở dưới)

I B B P B B P B B P B B P B B cung cấp ảnh I với tần số đầy đủ để cho phép bộ giải mãgiải mã 1 cách chính xác

I B B P PB B B B P B

Time

Forward motion compensation

Bidirectional motion compensation

Group of pictures (GOP)

The group of pictures layer is optional in MPEG-2. GOP begins with a start code and a header




GOP begins with a start code and a header The header carries

time code information editing information

optional user data First encoded picture in a GOP is always an I picture

Typical length is 15 pictures with the following structure (in display order): I B B P B B P B B P B B P B B Provides an I picture with sufficient

frequency to allow a decoder to decode correctly

I B B P PB B B B P B

Time

Forward motion compensation

Bidirectional motion compensation

Sequence (chuỗi):

1 sequence bắt đầu với mộtmãkhởi đầu duy nhất dài32bit th là 1 h d




32bit theo sau là 1 header

Header mang các thông tin:

Kích thước ảnh Tỉ số diện mạo (Aspect ratio) Tốc độ khung và tốc độ bit Các ma tr ận lượng tử hoá tuỳ chọn Kích thướcyê u cầu của bộ đệm giải mã Cấu trúc màu (chroma pixel) Dứ liệu tuỳ chọn người sử dụng

Thông tin chuỗi cần cho việc thay đổi kênh Độ dài chuỗi phụ thuộc vào giá tr ị tr ễ đổi kênh chấp

nhận được

Sequence

A sequence begins with a unique 32 bit start code followed byh d




A sequence begins with a unique 32 bit start code followed bya header. The header carries:

picture size aspect ratio frame rate and bit rate optional quantizer matrices

required decoder buffer size chroma pixel structure optional user data

The sequence information is needed for channel changing. The sequence length depends on acceptable channel change

delay.

Packetized Elementary Stream (PES)

Đầu r a của bộ mã hóa MPEG Audio hoặc Video được gọi l à l uồng cơ sở (ES)




Đầu r a của bộ mã hóa MPEG Audio hoặc Video được gọi l à l uồng cơ sở (ES) đó l à một tín hiệu gần thời gian thực và không có giới hạn.

Để cho thuận tiện, nó được cắt thành các khối dữ liệu có kích thước thích hợp

gọi là Packetized Elementary Stream (PES).

Các khối dữ liệu n à y cầncó t iếp đầu mang thông tin và đánh dấu vị trí bắt đầu của

các khối v à p hải có nhãn thời gian bới vì quá trình đống gói làm sai lệch tr ục thời gian.

Video Elementary Stream - video ES (luồng video cơ sở), gồm tất cả dữ liệu

Video cho 1 chuỗi, bao gồm tiếp đầu của chuỗi và các thành phần phụ của 1chuỗi

1 ES chỉ mang 1 loại dữ liệu (hình ảnh hoặc âm thanh) từ một bộ mã hoá hình

ảnh hoặc âm thanh

Các gói PES có độ dài biến đổi, khác với các gói vận chuyển có chiềudà i cố

định, và có thể dài hơn nhiều so với các gói vận chuyển

Packetized Elementary Stream (PES)

The output of a single MPEG audio or video coder is called an




The output of a single MPEG audio or video coder is called anElementary Stream .

An Elementary Stream is an endless near real-time signal.

For convenience, it can be broken into convenient-sized data blocks in

a Packetized Elementary Stream (PES).

These data blocks need header information to identify the start of the

packets and must include time stamps because the packetizing processdisrupts the time axis.

Video Elementary Stream (video ES), consists of all the video data for asequence, including the sequence header and all the subparts of a sequence.

An ES carries only one type of data (video or audio) from a single video or

audio encoder. PES packets have variable length, not corresponding to the fixed packet

length of transport packets, and may be much longer than a transport packet.

MPEG Packetized Elementary Stream (PES) (BS)

The figure shows that one video PES and a number of audioPES can be combined to form a Program Stream provided




PES can be combined to form a Program Stream , providedthat all of the coders are locked to a common clock .

Time stamps in each PES ensure lip-sync between the

video and audio.

Intra Frame Coding - Mã hoá trong ảnh

Mã hóa trong ảnh chỉ liên quan với thông tin trong khung hiện tại (ko

liên quan tới khung nào khác trong chuỗi video)




liên quan tới khung nào khác trong chuỗi video) Sơ đồ khối mã hoá trong khung MPEG (hình dưới) -> giống JPEG

( xem lại cơ cấu mã hóa JPEG)

Các khối cơ bản của mã hoá trong ảnh: Bộ lọc video (tùy chọn)

Bộ biến đổi DCT

Bộ lượng tử hoá các hệ số DCT

Bộ mã hóa chiều dài biến đổi (VLC-variable length coder)

Intra Frame Coding

Intra coding only concern with information within the current

frame, (not relative to any other frame in the video sequence)




, (not relative to any other frame in the video sequence) MPEG intra-frame coding block diagram (See bottom Fig)

Similar to JPEG (Let’s review JPEG coding mechanism !!) Basic blocks of Intra frame coder:

Video filter Discrete cosine transform (DCT) DCT coefficient quantizer Run-length amplitude/variable length coder (VLC)

Bộ lọc video:

Hệ thống thị giác của con người:Nh ả hất ới á th đổi ủ độ hói




Hệ thống thị giác của con người: Nhạy cảm nhất với các thay đổi của độ chói

ít nhạy cảm nhất với sự thay đổi màu

MPEG sử dụng không gian màu YCbCr để đặc tr ưng cho giá tr ịdữ liệu thay cho RGB:

Y là tín hiệu chói

Cb là tín hiệu sai phân màu xanh

Cr là tín hiệu sai phân màu đỏ Thế nào là “4:4:4”, “4:2:0”, v.v, dạng video ?

4:4:4 là tín hiệu YCbCr video đầy đủ mỗi macroblock gồm 4

Y block, 4 Cb block, 4 Cr block lãng phí dải thông. 4:2:0 được sử dụng nhiều nhất trong MPEG2

Video Filter

Human Visual System (HVS) is Most sensitive to changes in luminance,




Most sensitive to changes in luminance, Less sensitive to variations in chrominance.

MPEG uses the YCbCr color space to represent the

data values instead of RGB, where: Y is the luminance signal, Cb is the blue color difference signal,

Cr is the red color difference signal. What is “4:4:4”, “4:2:0”, etc, video format ?

4:4:4 is full bandwidth YCbCr video each macroblock

consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks

waste of bandwidth !! 4:2:0 is most commonly used in MPEG-2

Color Subsampling formats (BS)

Legends:

Y

4:4:4 Format 4:2:2 Format For PAL system (720 *576

lines, 8bits each sample)




Y

Cr

Cb

4:1:1 Format 4:2:0 Format

4:4:4 Format:

• Bit rate = (720 + 720 + 720)*576 *8 *25 = 249 Mbps

4:2:2 Format:

• Bit rate = (720 + 360 + 360)*576 *8 *25 = 166 Mbps

4:2:0 Format:

• Bit rate = (720 + 360)* 576*8 *25 = 124,4 Mbps

4:1:1 Format:

• Bit rate = (720 + 180 + 180)*576 *8 *25 = 124,4 Mbps

Ứ ng dụng của các dạng màu:

Định dạngmàu

Thứ tự thời giantrong macroblock Ứng dụng




Đị ạ gmàu ứ ự ờ g

trong macroblock Ứ g ụ g

4:2:0

(6 block)

YYYYCbCr TV và các thiết bị giải

trí dân dụng

4:2:2

(8 block)

YYYYCbCrCbCr • Thiết bị studio

• Thiết bị soạn thảoVideo chuyên nghiệp

4:4:4(12 block)

YYYYCbCrCbCrCbCrCbCr

Đồ họa máy tính

Applications of chroma formats

chroma_for Multiplex order (time)Application




mat within macroblock Application

4:2:0(6 blocks) YYYYCbCr

Main stream television,Consumer entertainment.

4:2:2

(8 blocks)YYYYCbCrCbCr

Studio production

environments

Professional editing

equipment,

4:4:4

(12 blocks)YYYYCbCrCbCrCbCrCbCr Computer graphics

MPEG profiles và các mứ c:

MPEG2 được chia làm vài profile Các đặc đIểm của profile chính:

ầ




Định dạng mầu 4:2:0 Ảnh I, P, B

Không có khả năng thay đổi tỉ lệ Main profile được chia nhỏ thành các mức:

MP@ML (Main profile main level): Được thiết kế với chuẩn CCIR601 cho video số quét xen kẽ 720x576 (PAL) hay 720x483 (NTSC) 30 Hz quét liên tục, 60 Hz quét xen kẽ. Tốc độ bit cao nhất 15Mbit/s

MP@HL (Main profile high level): Giới hạn trên: 1152x1920, 60 Hz quét liên tục 80 Mbits/s

MPEG Profiles & levels

MPEG-2 is classified into several profiles.

Main profile features:

mailto:MP@ML

mailto:MP@HL

mailto:MP@HL

mailto:MP@ML




p 4:2:0 chroma sampling format I, P, and B pictures

Non-scalable Main Profile is subdivided into levels.

MP@ML (Main Profile Main Level): Designed with CCIR601 standard for interlaced standard digital

video. 720 x 576 (PAL) or 720 x 483 (NTSC) 30 Hz progressive, 60 Hz interlaced Maximum bit rate is 15 Mbits/s

MP@HL (Main Profile High Level): Upper bounds: 1152 x 1920, 60Hz progressive 80 Mbits/s

Mã hoá/giải mã MPEG:




MPEG encoder/decoder




Dự đoán: Dự đoán sau được thực hiện bằng

cách lưu các ảnh cho đến khi ảnhtham khảo mong muốn sẵn sàng




tham khảo mong muốn sẵn sàng,tr ước khi mã hoá các khung đangđược chứa.

Bộ mã hoá sẽ quyết định để dùng 1trong 3 cách: Dự đoán tr ước từ các ảnh tr ước đó

Dự đoán sau từ các ảnh phía sau

Hay dự đoán nội suyMục đích giảm thiểu sai số dự đoán

Bộ mã hoá phải truyền các ảnh theo 1tr ật tự khác với ảnh nguồn để cho bộ

giải mã có các ảnh tham khảo tr ướckhi giải mã ảnh dự đoán.

Bộ giải mã phải lưu tr ữ 2 khung

Prediction

Backward prediction is done by

storing pictures until the desiredanchor picture is available beforedi h d f




encoding the current stored frames. The encoder can decide to use:

Forward prediction from a previous

picture, Backward prediction from a following

picture, or Interpolated prediction

to minimize prediction error.

The encoder must transmit pictures inan order differ from that of sourcepictures so that the decoder has theanchor pictures before decodingpredicted pictures. (See next slide)

The decoder must have two framestored.

Quá trình sắp xế p lại ảnh I P B

Các ảnh được mã hoá và giải mã theo các thứ tự khác với thứ tựhiển thị




ị

Do quá trình dự đoán 2 chiều của ảnh B

Ví dụ chúng ta có 1 GOP dài 12 ảnh Thứ tự nguồn và thứ tự đầuvàobộ mã hoá:

1 2 3 4 5 6 7 8 9 10 11 12 13

I B B P B B P B B P B B I Thứ tự mã hoá và thứ tự trong dòng bit mã hoá:

1 4 2 3 7 5 6 10 8 9 13 11 12

I P B B P B B P B B I B B Thứ tự đầu r a bộ giải mã và thứ tự hiển thị (giống đầu vào)

I P B Picture Reordering

Pictures are coded and decoded in a different order than they are displayed.




Due to bidirectional prediction for B pictures. For example we have a 12 picture long GOP: Source order and encoder input order:

I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)B(12) I(13)

Encoding order and order in the coded bitstream: I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)

B(12) Decoder output order and display order (same as

input): I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)

B(12) I(13)

Công thứ c DCT và IDCT

DCT: Eq1 -> dạng thường

Eq2 > dạng ma trận




Eq2 -> dạng ma tr ận

IDCT: Eq3 -> dạng thường Eq4 -> dạng ma tr ận

Trong đó: F(u,v) = ma tr ận DCT 2 chiều

N*N u,v,x,y = 0,1,2…N-1

x,y là các tọa độ không gian u,v là tọa độ tần số trong miền

biến đổi C(u) * C(v) = 1/√2 với u,v =0 C(u) * C(v) = 1 trong các

tr ường hợp khác

DCT and IDCT formulas

DCT: Eq 1 Normal form Eq 2 Matrix form




Eq 2 Matrix form IDCT:

Eq 3 Normal form Eq 4 Matrix form

Where: F(u,v) = two-dimensional

NxN DCT.

u,v,x,y = 0,1,2,...N-1 x,y are spatial coordinates in

the sample domain. u,v are frequency coordinates

in the transform domain.

C(u), C(v) = 1/(square root(2)) for u, v = 0. C(u), C(v) = 1 otherwise.

DCT vs DFT:

Khái niệm DCT giống DFT ngoại tr ừ: DCT tập trung năng lượng vào các hệ số tần số thấp




ập g g ợ g ệ ptốt hơn DFT.

DCT là thuần thực, DFT là phức (biên độ, pha) DCT hoạt động trên 1 block của các điểm ảnh tạo ra

các hệ số giống với các hệ số miền tần số được tạo rabởi DFT DCT N điểm có độ phân giải tần số giống như DFT 2N điểm N tần số của DFT 2N điểm tương ứng với N điểm ở nửa trên

của vòng đơn vị trong tần số phức

Với đầu vào lặp theo chu kỳ, biên độ của hệ số DFTkhông đổi (pha của đầu vào ko ảnh hưởng). Với DCTthì ko phải như vậy

DCT versus DFT

The DCT is conceptually similar to the DFT, except: DCT concentrates energy into lower order coefficientsbetter than DFT




better than DFT. DCT is purely real, the DFT is complex (magnitude and

phase). A DCT operation on a block of pixels produces coefficients

that are similar to the frequency domain coefficientsproduced by a DFT operation.

An N-point DCT has the same frequency resolution as a 2N-point DFT. The N frequencies of a 2N point DFT correspond to N points

on the upper half of the unit circle in the complex frequencyplane.

Assuming a periodic input, the magnitude of the DFTcoefficients is spatially invariant (phase of the input doesnot matter). This is not true for the DCT.

The weighting process (BS)




Ma trận lượng tử hoá:

Chú ý giá tr ị cáchệ số DCT là: Nhỏ ở trên bên trái




Nhỏ ở trên bên trái(tần số thấp)

Lớn ở góc dưới bênphải (tần số cao)

xem lại JPEG Tại sao?

HVS ít nhạy cảm vớicác lỗi ở tần số caohơn các tần số thấp

Tần số càng cao

càng nên được lượngtử hoá thô hơn

Quantization matrix

Note DCT

coefficients are: Small in the upper left




pp(low frequencies),

Large in the upper right

(high frequencies) Recall the JPEG

mechanism !! Why ?

HVS is less sensitiveto errors in highfrequency coefficientsthan it is for lower

frequencies higher frequencies

should be morecoarsely quantized !!

Kế t quả ma trận DCT (ví dụ)

Sau khi lượng tửhoá phù hợp, kết




hoá phù hợp, kếtquả là 1 ma tr ận

có nhiều giá tr ị 0

Result DCT matrix (example)

After adaptivequantization, the




quantization, theresult is a matrix

containing manyzeros.

Quét MPEG:

Trái -> quét ziczac (như JPEG) Phải -> quét thay phiên xen kẽ -> tốt hơn cho khung quét

xen kẽ




xen kẽ

MPEG scanning

Left

Zigzag scanning (like JPEG) Right Alternate scanning better for interlaced frames !




Huffman/Run-level coding:

Mã Huffman kết hợp với mã hóa Run-level và thuậtquét ziczac được ứng dụng cho các hệ số DCTử




lượng tử hoá

Run-level = một dãy các số 0 tiếp theo các mứckhác 0

Mã Huffman cũng được á p dụng cho nhiều loại

thông tin phụ khác nhau Mã Huffman là một mã entropy, nó tạo ra được một

cách tối ưu độ dài từ mã trung bình ngắn nhất c h o 1

nguồn tin. Độ dài từ mã trung bình này >= entropy của nguồn

Huffman/Run-Level Coding

Huffman coding in combination with Run-Level coding and zig-zag scanning is applied toquantized DCT coefficients




quantized DCT coefficients.

"Run-Level" = A run-length of zeros followed by anon-zero level. Huffman coding is also applied to various types of

side information. A Huffman code is an entropy code which is

optimally achieves the shortest average possible code word length for a source.

This average code word length is >= the entropyof the source.

Minh hoạ mã Huffman/run-level

Sử dụng ma tr ận đầu ra

DCT ở slide tr ước, sau khiđược quét ziczac -> đầu ra

ẽ là 1 h ỗi ố

Zero

Run-Length Amplitude

MPEG

Code Value

N/A 8 (DC Value) 110 1000




sẽ là 1 chuỗi số:4,4,2,2,2,1,1,1,1,0 (12 số0),1,0 (41 số 0)

Các giá tại này được tratrong bảng các mã có

chiều dài biến đổi Các giá tr ị xuất hiện nhiều

nhất được gán các mãngắn

Các giá tr ị xuất hiện ít nhấtđược gán các mã dài

0 4 0000 1100

0 4 0000 11000 2 0100 0

0 2 0100 0

0 2 0100 0

0 1 110

0 1 110

0 1 110

0 1 110

12 1 0010 0010 0

EOB EOB 10

Huffman/Run-Level coding illustrated

Using the DCT output

matrix in previous slide,after being zigzagscanned the output

Zero

Run-Length Amplitude

MPEG

Code Value

N/A 8 (DC Value) 110 1000




scanned the outputwill be a sequence of

number: 4, 4, 2, 2, 2, 1,1, 1, 1, 0 (12 zeros), 1, 0(41 zeros)

These values are lookedup in a fixed table of variable length codes The most probable

occurrence is given arelatively short code,

The least probableoccurrence is given arelatively long code.

0 4 0000 1100

0 4 0000 11000 2 0100 0

0 2 0100 0

0 2 0100 0

0 1 110

0 1 110

0 1 110

0 1 110

12 1 0010 0010 0

EOB EOB 10

Minh hoạ mã huffman/run-level (2)

12 số 0 đầu được mã hoá hiệu quả chỉ bằng 9bits 41 số 0 sau bị loại bỏ thay bởi 2 bit chỉ thị End Of




41 số 0 sau bị loại bỏ, thay bởi 2 bit chỉ thị End Of

Block (EOB) Các hệ số DCT lượng tử hoá lúc này được thể hiện

bởi 1 chuỗi 61 bit nhị phân (xem bảng)

Chú y r ằng block nguyên bản 8x8 với 8 bit/ pixel đòi

hỏi 512 bit cho hiển thị đầy đủ bộ mã hóa Huffman

đã đạt tốc độ nén xấp xỉ 8,4:1

Huffman/Run-Level coding illustrated (2)

The first run of 12 zeroes has been efficientlycoded by only 9 bits

f




The last run of 41 zeroes has been entirely

eliminated, represented only with a 2-bit End Of Block (EOB) indicator.

The quantized DCT coefficients are now

represented by a sequence of 61 binary bits (Seethe table).

Considering that the original 8x8 block of 8-bit

pixels required 512 bits for full representation, the compression rate is approx. 8,4:1.

Quá trình truy ền dữ liệu MPEG: MPEG đóng gói toàn bộ dữ liệu vào các gói có kích thước cố định là 188 byte để

truyền

Dữ liệu âm thanh, hình ảnh được đặt vào trong các gói PES tr ước khi được cắt rathành các gói vận chuyển có độ dài cố định

1 ói PES ó hể dài h hiề ới 1 ói ậ h ể d đó ầ hâ đ




1 gói PES có thể dài hơn nhiều so với 1 gói vận chuyển do đó cần phân đoạn: Header PES được đặt ngay tiếp theo header gói vận chuyển

Các phần liên tiếp nhau của gói PES sau đó được đặt vào phần tải tr ọng của gói vậnchuyển

Không gian còn lại trong tải tr ọng của gói vận chuyển sẽ được thêm vào các byte chèn0xFF

Mỗi g ó i vận chuyển bắt đầu với 1 byte đồng bộ giá tr ị 0x47 Trong hệ thống truyền dẫn ATSC mặt đất DTV VSB của Mỹ, byte đồng bộ không được xử

lí, nhưng được thay thế bằng một biểu tượng đồng bộ đặc biệt khác phù hợp cho truyềndẫn RF

Header gói vận chuyển chứa 1 PID 13 bit (ID của gói), PID này dùng để xác định 1 luồng

cơ sở âm thanh, hình ảnh hay các phần tử chương trình khác PID 0x0000 được dành riêng cho gói vận chuyển mang bảng liên kết chương trình PAT

PAT tr ỏ tới bảng ánh xạ chương trình PMT bảng này lại tr ỏ tới các phần tử riêng biệtcủa một chương trình

MPEG Data Transport MPEG packages all data into fixed-size 188-byte packets for transport. Video or audio payload data placed in PES packets before is broken up

into fixed length transport packet payloads. A PES packet may be much longer than a transport packet Require

segmentation:




segmentation: The PES header is placed immediately following a transport header

Successive portions of the PES packet are then placed in the payloads of transport packets. Remaining space in the final transport packet payload is filled with stuffing

bytes = 0xFF (all ones). Each transport packet starts with a sync byte = 0x47.

In the ATSC US terrestrial DTV VSB transmission system, sync byte is notprocessed, but is replaced by a different sync symbol especially suited to RFtransmission.

The transport packet header contains a 13-bit PID (packet ID) , whichcorresponds to a particular elementary stream of video, audio, or other programelement.

PID 0x0000 is reserved for transport packets carrying a program association table (PAT).

The PAT points to a Program Map Table (PMT) points to particular elementsof a program

PAT & PMT (BS)




MPEG – Program Stream (PS) (BS)

Program Streams have variable length packets with headers .




They are used in data transfers to and fromoptical and hard disks , which are error freeand in which files of arbitrary sizes are

expected. VCD/DVD uses Program Streams.

MPEG Transport Stream (vs. Program stream) (BS)

For transmission and digital broadcasting, several programs and

their associated PES can be multiplexed into a single Transport Stream . A Transport Stream differs from a Program Stream in that:




PES packets are further subdivided into short fixed-size

packets Multiple programs encoded with different clocks can be

carried. How ?: Transport stream has a program clock reference

(PCR) mechanism which allows transmission of multiple clocks

One of these clocks is selected and regenerated at thedecoder.

A Single Program Transport Stream (SPTS) is also possibleand this may be found between a coder and a multiplexer.

Gói vận chuyển MPEG:




Tr ường thích nghi: 8 bit định độ dài tr ường thích nghi Nhóm đầu tiên của các cờ gồm 8

cờ 1 bit: Cờ chỉ thị gián đoạn, cờchỉ thị truy cập ngẫu nhiên, chỉ thịluồng cơ sở ưu tiên, cờ PCR, cờOPCR, cờ ghép nối, cờ vậnchuyển dữ liệu riêng, cờ tr ườngthích nghi mở r ộng

PCR_flag OPCR_flag splicing_point_flag transport_private_data_flag adaptation_field_extension_flag

Các tr ường tuỳ chọn sẽ xuất hiện nếuđược chỉ thị bởi 1 trong các cờ đi tr ước.

Phần còn lại của tr ường thích nghi đượcđiền với các byte chèn 0xFF

MPEG Transport packet




Adaptation Field: 8 bits specifying the length of the

adaptation field. The first group of flags consists of

eight 1-bit flags:

discontinuity_indicator random_access_indicator elementary_stream_priority_in

dicator

PCR_flag OPCR_flag splicing_point_flag transport_private_data_flag adaptation_field_extension_flag The optional fields are present if

indicated by one of the preceding flags. The remainder of the adaptation field is

filled with stuffing bytes (0xFF, allones).

Quá trình tách lu ồng chuyển vận MPEG-TS

Quá trình tách luồng chuyển vận MPEG (TS) bao gồm:1. Tìm PAT bằng cách chọn các gói với PID = 0x00002. Đọc các PID của các PMT

Đ á PID h á hầ tử ủ h t ì h




3. Đọc các PID cho các phần tử của chương trình mong

muốn từ các PMT của nó (ví dụ, 1 chương trình cơ bảnsẽ có PID cho âm thanh và PID cho hình ảnh)4. Dò các gói với các PID mong muốn và định tuyến chúng

đến cá cbộ giải mã

1 luồng chuyển vận MPEG2 có thể mang: Dòng video Dòng audio Dữ liệu khác

luồng chuyển vận MPEG2 là định dạng gói cho truyềnthông dữ liệu đường xuống (downstream) trên mạng CATV

Demultiplexing a Transport Stream (TS)

Demultiplexing a transport stream involves:1. Finding the PAT by selecting packets with PID = 0x00002. Reading the PIDs for the PMTs

R di h PID f h l f d i d




3. Reading the PIDs for the elements of a desired program

from its PMT (for example, a basic program will have aPID for audio and a PID for video)4. Detecting packets with the desired PIDs and routing them

to the decoders

A MPEG-2 transport stream can carry: Video stream Audio stream Any type of data MPEG-2 TS is the packet format for CATV downstream

data communication.

Định thời và đi ều khiển đệm: Điểm A: Đầu

vào bộ mã hoá tốc độ khôngđổi

Điểm B: Đầu rabộ mã hoá tốc độ thay đổi




tốc độ thay đổi Điểm C: Đầu ra

bộ đệm mã hoá tốc độ khôngđổi

Điểm D: Kênhgiao tiếp + bộđệm giải mã tốc độ không đổi

Điểm E: Đầuvào bộ giải mã tốc độ thay

đổi Điểm F: Đầu ra

bộ giải mã tốc độ không đổi

Timing & buffer control Point A:

Encoder input

Constant/specifi

ed rate Point B:

Encoder outputVariable rate




Variable rate Point C:

Encoder buffer outputConstant rate

Point D:Communicationchannel +decoder buffer Constantrate

Point E:Decoder input Variable rate

Point F:Decoder output

Constant/specified rate

Đồng bộ thời gian Bộ giải mã được đồng bộ với bộ mã hoá bởi các nhãn thời gian

Bộ mã hoá chứa bộ dao động chủ và bộ đếm, được gọi là đồng hồ thờigian hệ thống (STC ) (xem sơ đồ khối ở trên) STC thuộc về 1 chương trình riêng và là đồng hồ chủ của bộ mã hoá




video, audio cho chương trình đó

Nhiều chương trình, mỗi chương trình có STC riêng, có thể được ghép vào1 luồng

1 thành phần chương trình có thể thậm chí không có nhãn thời gian ->nhưng sẽ không thể đồng bộ với các thành phần khác

Ở đầu vào bộ mã hoá, (điểm A), thời gian xuất hiện của video pic hayaudio block đầu vào được đánh dấu bằng cách lấy mẫu STC.

Độ tr ễ tổng cộng của bộ đệm mã hoá và giải mã được cộng thêm vàoSTC, tạo nên nhãn thời gian hiển thị (PTS ) PTS sau đó được chèn vào vị trí đầu tiên của gói thể hiện các ảnh và

block audio đó, ở điểm B

Timing - Synchronization

The decoder is synchronized with the encoder by time stamps

The encoder contains a master oscillator and counter, called theSystem Time Clock (STC) . (See previous block diagram.) The STC belongs to a particular program and is the master




The STC belongs to a particular program and is the master clock of the video and audio encoders for that program.

Multiple programs, each with its own STC, can also bemultiplexed into a single stream.

A program component can even have no time stamps butcan not be synchronized with other components.

At encoder input, (Point A), the time of occurrence of an inputvideo picture or audio block is noted by sampling the STC.

A total delay of encoder and decoder buffer (constant) is

added to STC, creating a Presentation Time Stamp (PTS) , PTS is then inserted in the first of the packet(s) representing

that picture or audio block, at Point B.

Đồng bộ thời gian (2) Nhãn thời gian giải mã DTS có thể được kết hợp 1 cách tùy chọn

vào dòng bit -> nó thể hiện cho thời điểm m à dữ liệu phải được lấy đingay từ bộ đệm giải mã và đem giải mã. DTS và PTS giống nhau ngoại tr ừ tr ường hợp sắp xếp lạI các ảnh B

DTS chỉ được sử dụng cho những nơi cần sắp xếp lại




DTS chỉ được sử dụng cho những nơi cần sắp xếp lại.

PTS hay DTS được chèn vào với khoảng thời gian =< 700mS Trong ATSC -> PTS hay DTS phải được chèn vào đầu mỗi ảnh mã hóa Thêm vào đó, đầu r a của bộ đệmmãhoá (điểm C) được dán nhãn thời

gian bằng các giá tr ị STC, và được gọi là: Tham chiếu đồng hồ hệ thống (SCR) trong luồng chương trình. Tham chiếu đồng hồ chương trình (PCR) trong luồng chuyển vận

Chu kỳ chèn của PCR =< 100mS

Chu kỳ chèn của SCR =< 700mS

PCR và/hoặc SCR được sử dụng để đồng bộ STC của bộ giải m ã với STCcủa bộ mã hoá

Timing – Synchronization (2)

Decode Time Stamp (DTS) can optionally combined into the bit

stream

represents the time at which the data should be takeninstantaneously from the decoder buffer and decoded. DTS and PTS are identical except in the case of picture reordering for B

pictures.




The DTS is only used where it is needed because of reordering.

Whenever DTS is used, PTS is also coded. PTS (or DTS) inserted interval ≤ 700 mS. In ATSC PTS (or DTS) must be inserted at the beginning of each

coded picture (access unit ).

In addition, the output of the encoder buffer (Point C) is timestamped with System Time Clock (STC) values, called: System Clock Reference (SCR) in a Program Stream. Program Clock Reference (PCR) in a Transport Stream.

PCR time stamp interval ≤ 100mS. SCR time stamp interval ≤ 700mS. PCR and/or the SCR are used to synchronize the decoder STC

with the encoder STC.

Đồng bộ thời gian (3) Tất cả các dòng video audio nằm trong cùng 1 chương trình phảI lấy nhãn thời

gian của chúng từ 1 STC chung để có thể đồng bộ các bộ giải mã video vàaudio với nhau Tốc độ dữ liệu v à tốc độ gói trên kênh (ở đầu r a bộ ghép kênh) có thể hoàn

toàn không đồng bộ với đồng hồ thời gian hê thống STC




Các nhãn thời gian PCR cho phép sự đồng bộ của các chương trình khácnhau với STC khác nhau ghép kênh với nhau trong khi vẫn cho phép tái tạolại STC của mỗi chương trình

Nếu không xẩy ra hiện tượng tràn hoặc r ỗng bộ đệm thì độ tr ễ trong bộ đệmvà kênh dẫn của cả video và audio là không đổi

Đầuv àobộ mã hoá và đầu r a bộ giải mã chạy với tốc độ bằng nhau và khôngđổi

Tr ễ từ đầuvàobộ mã hoá và đầu r a bộ giải m ã l à cố định Nếu không cần sự đồng bộ chính xác, thì đồng hồ giải mã có thể chạy tự

do các khung video có thể lặp lại hoặc bỏ qua khi cần thiết để ngăn cảnviệc r ỗng hoặc tràn bộ đệm.

Timing – Synchronization (3)

All video and audio streams included in a program must get their

time stamps from a common STC so that synchronization of thevideo and audio decoders with each other may be accomplished. The data rate and packet rate on the channel (at the multiplexer

output) can be completely asynchronous with the System Time




p ) p y y yClock (STC)

PCR time stamps allows synchronizations of differentmultiplexed programs having different STCs while allowing STCrecovery for each program.

If there is no buffer underflow or overflow delays in the buffers

and transmission channel for both video and audio areconstant.

The encoder input and decoder output run at equal and constantrates.

Fixed end-to-end delay from encoder input to decoder output If exact synchronization is not required, the decoder clock can be

free running video frames can be repeated / skipped asnecessary to prevent buffer underflow / overflow, respectively.

HDTV (High definition television)

High definition television (HDTV) first came topublic attention in 1981, when NHK, theJapanese broadcasting authority, first




demonstrated it in the United States. HDTV is defined by the ITU-R as:

'A system designed to allow viewing at about

three times the picture height, such that thesystem is virtually, or nearly, transparent to thequality or portrayal that would have been

perceived in the original scene ... by a discerningviewer with normal visual acuity.'

HDTV (Truy ền hình độ nét cao)

HDTV lần đầu đến với công chúng vào năm1981, khi NHK, đài truyền hình Nhật Bản, thửnghiệm lần đầu tiên ở Mĩ




nghiệm lần đầu tiên ở M ĩ

HDTV được định ngh ĩ a bởi ITU-R như là:

1 hệ thống thiết kế để cho phép một người với thị

giác bình thường từ 1 khoảng cách gấp 3 lầnchiều cao ảnh, nhận thức khung cảnh với chấtlượng gần như cảnh gốc.

HDTV (2)

HDTV proposals are for a screen which is wider than the conventional

TV image by about 33%. It is generally agreed that the HDTV aspectratio will be 16:9, as opposed to the 4:3 ratio of conventional TVsystems. This ratio has been chosen because psychological tests haveshown that it best matches the human visual field.It also enables use of existing cinema film formats as additional source




It also enables use of existing cinema film formats as additional source

material, since this is the same aspect ratio used in normal 35 mm film.Figure 16.6(a) shows how the aspect ratio of HDTV compares with thatof conventional television, using the same resolution, or the samesurface area as the comparison metric.

To achieve the improved resolution the video image used in HDTV

must contain over 1000 lines, as opposed to the 525 and 625 providedby the existing NTSC and PAL systems. This gives a much improvedvertical resolution. The exact value is chosen to be a simple multiple of one or both of the vertical resolutions used in conventional TV.

However, due to the higher scan rates the bandwidth requirement for

analogue HDTV is approximately 12 MHz, compared to the nominal 6MHz of conventional TV

HDTV (2)

HDTV yêu cầu 1 màn hình r ộng hơn màn hình tivi quy ước thông

thường khoảng 30%. Điều này cho phép r ằng tỉ lệ màn ảnh sẽ là16:9 khác với tỉ lệ 4:3 của hệ thống tivi quy ước. Chọn tỉ lệ này vì các thử nghiệm tâm lí đã chỉ ra r ằng nó phù hợp

hất ới át ủ ời




nhất với quan sát của con người.

Nó cũng cho phép việc sử dụng các dạng phim chiếu bóng hiện có,vì đây cũn g l à tỉ lệ màn ảnh sử dụng cho phim 35mm thông thường.

Để nhận độ phân giải cao hơn, các ảnh dùng trong HDTV phải chứa

hơn 1000 dòng, khác với hệ NTSC và PAL hiện tại chỉ có 525 hay625 dòng.

Điều này đem lại độ phân giải theo chiều dọccaohơn. Giá tr ị chínhxác được chọn lựa l à bội số của một độ phân giải của TV thường.

Tuy vậy, do tốc độ quét cao hơnnêndải thông yêu cầu cho HDTVtương tự xấp xỉ 12MHz, so với 6MHz của TV thường.

HDTV (3)

The introduction of a non-compatible TV transmission format for

HDTV would require the viewer either to buy a new receiver, or tobuy a converter to receive the picture on their old set.

The initial thrust in Japan was towards an HDTV format which iscompatible with conventional TV standards and which can be




compatible with conventional TV standards, and which can be

received by conventional receivers, with conventional quality.However, to get the full benefit of HDTV, a new wide screen, highresolution receiver has to be purchased.

One of the principal reasons that HDTV is not already common is

that a general standard has not yet been agreed. The 26th CCIRplenary assembly recommended the adoption of a single, worldwidestandard for high definition television.

Unfortunately, Japan, Europe and North America are all investing

significant time and money in their own systems based on their own,current, conventional TV standards and other nationalconsiderations.

HDTV (3)

Sự đưa ra định dạng truyền dẫn TV không tương thích cho HDTV

sẽ yêu cầu người xem hoặc phải mua 1 bộ thu mới hoặc phải muabộ biến đổi để nhận được hình ảnh trên TV cũ của họ.

Xu hướng ở Nhật hướng tới 1 định dạng HDTV tương thích với hệthống TV cũ và có thể thu được bằng TV thường với chất lượng




thống TV cũ, và có thể thu được bằng TV thường với chất lượng

bình thường. Tuy nhiên để có được lợi ích đầy đủ từ HDTV, thì phải mua 1

màn ảnh r ộngvàmột đầu thu có độ nét cao. 1 trong những nguyên nhân chính mà HDTV chưa thông dụng đó là

1 chuẩn chung vẫn chưa được thừa nhận. Hội nghị CCIR lần thứ 26 khuyến nghị 1 chuẩn hệ thống toàn cầu

cho TV độ nét cao. Tuy vậy, Nhật, Châu Âu, Bắc M ĩ đã và đang đầu tư 1 số tiền và thời

gian cho việc phát triển hệ thống của riêng họ dựa trên chuẩn TVthông thường của các nước này.

H261- H263

The H.261 algorithm was developed for the purpose of image

transmission rather than image storage. It is designed to produce a constant output of p x 64 kbivs, where

p is an integer in the range 1 to 30. This allows transmission over a digital network or data link of




g

varying capacity. It also allows transmission over a single 64 kbit/s digital

telephone channel for low quality video-telephony, or at higher bitrates for improved picture quality.

The basic coding algorithm is similar to that of MPEG in that it isa hybrid of motion compensation, DCT and straightforwardDPCM (intra-frame coding mode), without the MPEG I, P, Bframes.

The DCT operation is performed at a low level on 8 x 8 blocks of

error samples from the predicted luminance pixel values, withsub-sampled blocks of chrominance data.

H261- H263

Thuật toán H261 được phát triển với mục đích truyền ảnh

hơn là c hứa ảnh. Nó được thiết kế để sinh ra một đầu r a tốc độ không đổi p

x 64 kbps, trong đó p là 1 số nguyên từ 1->30 Cho phép truyền qua 1 mạng số hay kết nối dự liệu có dung lượng




Cho phép truyền qua 1 mạng số hay kết nối dự liệu có dung lượng

biến đổi Nó cũng cho phép truyền từng 64kbit/s qua kênh thoại số cho

video phone chất lượng thấp, hoặc ở tốc độ bit cao hơn với chấtlượng ảnh cao hơn.

Thuật mã hoá cơ bản giống với MPEG, đó l à hệ thống lai của bùchuyển động, DCT và DPCM đơn giản không có cơ cấu khungMPEG I P B

DCT được thực hiện ở mức thấp trên 8x8 block của các lỗi dựđoán từ các giá tr ị điểm ảnh chói đã được dự đoán, với c á c mẫublock phụ của dữ liệu màu.

H261-H263 (2)




H261-H263 (3)

H.261 is widely used on 176x 144 pixel images. The ability to select a range of output rates for the algorithm

allows it to be used in different applications. Low output rates ( p = 1 or 2) are only suitable for face-to-face

(videophone) communication H 261 is thus the standard used in




(videophone) communication. H.261 is thus the standard used inmany commercial videophone systems such as the UKBT/Marconi Relate 2000 and the US ATT 2500 products.

Video-conferencing would require a greater output data rate ( p >6) and might go as high as 2 Mbit/s for high quality transmissionwith larger image sizes.

A further development of H.261 is H.263 for lower fixedtransmission rates.

This deploys arithmetic coding in place of the variable lengthcoding (See H261 diagram), with other modifications, the datarate is reduced to only 20 kbit/s.

H261-H263 (3) H261 được sử dụng r ộng rãi với ảnh 176x144 pixel

Khả năng lựa chọn khoảng r ộng các tốc độ đầu ra cho phép nóđược dùng trong nhiều ứng dụng khác nhau

Tốc độ đầu ra thấp (p = 1 hay 2) chỉ phù hợp cho giao tiếp mặt đốimặt. H261 do đó được dùng trong các hệ thống videophone thương




mại như UK BT/Marconi Relate 2000 và các sản phẩm US ATT2500

Hội thảo hình ảnh sẽ yêu cầu tốc độ dữ liệu đầu r a lớn hơn (p>6) vàcó thể chạy với tốc độ cao 2Mbit/s cho truyền dẫn tốc độ cao với

các cỡ ảnh lớn hơn. Phát triển x a hơn của của H261 là H263 cho tốc độ truyền dẫn thấp

hơn.

H263 dùng thuật toán mã hoá số học thay thế cho VLC (nhìn sơ đồH261), và với một số cải tiến khác cho tốc độ dữ liệu giảm xuốngđến 20kbit/s

Model Based Coding (MBC)

At the very low bit rates (20 kbit/s or less) associated with video

telephony, the requirements for image transmission stretch thecompression techniques described earlier to their limits.

In order to achieve the necessary degree of compression theyoften require reduction in spatial resolution or even the




q p

elimination of frames from the sequence. Model based coding (MBC) attempts to exploit a greater degree

of redundancy in images than current techniques, in order toachieve significant image compression but without adversely

degrading the image content information. It relies upon the fact that the image quality is largely subjective. Providing that the appearance of scenes within an observed

image is kept at a visually acceptable level, it may not matter thatthe observed image is not a precise reproduction of reality.

Model Based Coding (MBC)

Ở tốc độ bit r ất thấp 20kbit/s hoặc thấp hơn nữa trong các ứng dụng

videophone, các k ĩ thuật nén được m ô tả đã bị đẩy đến giới hạn củachúng.

Để đạt được mứcnéncần thiết người ta phải giảm độ phân giảihoặc thậm chí loại bớt các khung trong chuỗi ảnh




hoặc thậm chí loại bớt các khung trong chuỗi ảnh.

Phương pháp mã hóa bằng mô hình hóa MBC cố gắng khai thácđộ dư thừa trong ảnh ở mức độ lớn hơn các k ĩ thuật hiện tại, đểđạt hệ số nén cao nhưng không cần phải giảm quá nhiều các

thông tin của ảnh Nó dựavàomột hiện tượn g l à r ằng chất lượng ảnh phụ thuộc vào

yếu tố chủ quan.

Với điều kiện l à sự xuất hiện của khung cảnh trong 1 ảnh quan sátđượccóchất lượng chấp nhận được, sẽ khó nhận rav iệc ảnh quansát không phải là 1 sản phẩm tái tạo chính xác của ảnh thực.

Model Based Coding (2)

One MBC method for producing an artificial image of a head sequenceutilizes a feature codebook where a range of facial expressions,sufficient to create an animation, are generated from sub-images or templates which are joined together to form a complete face.

The most important areas of a face, for conveying an expression, arethe eyes and mouth, hence the objective is to create an image in which




the movement of the eyes and mouth is a convincing approximation tothe movements of the original subject. When forming the synthetic image, the feature template vectors which

form the closest match to those of the original moving sequence areselected from the codebook and then transmitted as low bit rate coded

addresses. By using only 10 eye and 10 mouth templates, for instance, a total of 100 combinations exists implying that only a 6-bit codebook addressneed be transmitted.

It has been found that there are only 13 visually distinct mouth shapes

for vowel and consonant formation during speech. However, the number of mouth sub-images is usually increased, to

include intermediate expressions and hence avoid step changes in theimage.

Model Based Coding (2)

1 trong các phương pháp MBC tạo r a 1 ảnh nhân tạo của cái đầu sử dụng

bảng mã hóa chứa một dải các đặc tr ưng của khuôn mặt đủ để tạo r a 1hoạt hình, tạo r a từ các ảnh con hoặc các template có sẵn ghép vào nhauđể tạo nên 1 khuôn mặt hoàn chỉnh.

Vùng quan tr ọng nhất của 1 khuôn mặt để truyền cảm chính là mắt vàmồm, do đó để bức ảnh tạo r a c ó sức thuyết phục thì chuyển động của mắt




và miệng phải gần giống với chuyển động của người thật. Khi tạo một bức ảnh nhân tạo, các véctơ đặc tr ưng gần nhất với chuỗi

chuyển động gốc chọn từ bảngmãv àsẽ được truyền đi dưới dạng địa chỉđược mã hóa với tốc độ r ất thấp.

Bằng cách chỉ sử dụng 10 mẫu mắt và 10 mẫu miệng cho sẵn, tổng cộngsẽ có 100 sự kết hợpmàc hỉ cần truyền đi 1 địa chỉ codebook 6bit. Người ta đã tìm thấy r ằng chỉ có 13 kiểu mồm để phát âm các nguyên âm

và phụ âm trong khi nói. Tuy nhiên, số lượng ảnh phụ về mồm thường được tăng lên, để mô tả cả

các cách diễn đạt tức thời và do đó tránh đượccácbước thay đổi đột ngộttrong ảnh.

Model Based Coding (3) Another common way of representing objects in three-

dimensional computer graphics is by a net of

interconnecting polygons. A model is stored as a set of linked arrays which specify

the coordinates of each polygon vertex, with the linesconnecting the vertices together forming each side of apolygon.

To make realistic models, the polygon net can be




To make realistic models, the polygon net can beshaded to reflect the presence of light sources.

The wire-frame model [Welch 19911 can be modified tofit the shape of a person's head and shoulders. Thewire-frame, composed of over 100 interconnectingtriangles, can produce subjectively acceptable synthetic

images, providing that the frame is not rotated by morethan 30" from the full-face position. The model, (see the Figure) uses smaller triangles in

areas associated with high degrees of curvature wheresignificant movement is required.

Large flat areas, such as the forehead, contain fewer triangles.

A second wire-frame is used to model the mouthinterior.

Model Based Coding (3) 1 cách khác để diễn tả đồ hoạ máy tính ba chiều l à bằng 1 mạng

lưới các đa giác liên kết nhau

1 mô hình được chứa dưới dạng một tập hợp các ma tr ận liênkết được chia ra thành các khối đa giác đều nhau, với các đườngnối giữa các đỉnh tạo ra các mặt của đa giác.

Để tạo r a mẫu thực tế, lưới đa giác có thể được tạo bóng để thể




hiện lại sự xuất hiện của các nguồn sáng. Mẫu khung dây Welch 1991 có thể được thay đổi để tạo dáng

giống như đầu v à v a i của một người. Khung dây, gồm hơn 100tam giác liên kết với nhau, có thể tạo r a bức ảnh nhân tạo chấp

nhận được 1 cách chủ quan, với điều kiện r ằng khung đó khôngbị quay hơn 30’’ so với vị trí có thể thấy toàn bộ khuôn mặt

Mô hình trong hình vẽ sử dụng các tam giác nhỏ hơn trong cácvùng được liên kết với độ cong cao, nơi có các chuyển động

quan tr ọng. Các vùng bằng phẳng, r ộng như trán có ít tam giác

Khung dây thứ hai được dùng để mô hình hóa phía trong miệng.

Model based coding (4) A synthetic image is created by texture mapping detail from an

initial full-face source image, over the wire-frame, Facial

movement can be achieved by manipulation of the vertices of thewire-frame. Head rotation requires the use of simple matrix operations upon

the coordinate array. Facial expression requires the manipulation




of the features controlling the vertices. This model based feature codebook approach suffers from thedrawback of codebook formation.

This has to be done off-line and, consequently, the image isrequired to be prerecorded, with a consequent delay.

However, the actual image sequence can be sent at a very lowdata rate. For a codebook with 128 entries where 7 bits arerequired to code each mouth, a 25 frameh sequence requiresless than 200 bit/s to code the mouth movements.

When it is finally implemented, rates as low as 1 kbit/s areconfidently expected from MBC systems, but they can onlytransmit image sequences which match the stored model, e.g.head and shoulders displays.

Model based coding (4) 1 bức ảnh nhân tạo được tạo r a bằng cách ánh xạ 1 các chi tiết (texture) từ

ảnh nguồn ban đầu có toàn bộ khuôn mặt lên khung dây, chuyển động của

mặt có thể tạo r a bằng việc kéo các đỉnh khung Sự quay đầu đòi hỏi sử dụng các thao tác đơn giản trên ma tr ận tiến hành

trên toạ độ ma tr ận. Tr ạng thái của khuôn mặt y êucầu phải kéo các đỉnhđiều khiển đặc tr ưng.




Phương pháp mô hình hóa dựa trên codebook này có nhược điểm do quátrình tạo bảng mã codebook.

Nó phải được thực hiện Ofline, yêu cầugh i lại ảnh tr ước và do đó gây ratr ễ.

Tuy nhiên, chuỗi ảnh thật c ó thể được gửi ở tốc độ dữ liệu r ất thấp.Vớicodebook có 128 giá tr ị ở đó mồm được mãhoábởi 7 bit, một chuỗi 25khung yêu cầu phải nhỏ hơn 200bits/s để mã hoá chuyển động của mồm

Khi được hoàn thiện, hệ thống MBC có thể đạt các tốc độ thấp đến 1kbit/s,

nhưng chúng chỉ có thể truyền các chuỗi ảnh phù hợp với các mô hình đãcó sẵn, ví dụ, thể hiện đầu và vai.

Key points:

JPEG coding mechanism DCT/ Zigzag Scanning/ Adaptive

Quantization / VLC MPEG layered structure:

Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream(PES)




MPEG compression mechanism: Prediction Motion compensation Scanning YCbCr formats (4:4:4, 4:2:0, etc) Profiles @ Level I,P,B pictures & reordering Encoder/ Decoder process & Block diagram

MPEG Data transport

MPEG Timing & Buffer control STC/SCR/DTS PCR/PTS

Các điểm quan trọng Cơ chế mã hoá JPEG DCT quét ziczac lượng tử hoá thích nghi

VLC Cấu trúc lớp của MPEG

Pixel, Block, Macroblock, tr ường mã hoá DCT/ khung mã hoá DCT,slice, Picture, GOP, sequene, PES

Cơ chế nén MPEG:




Dự đoán Bù chuyển động Quét Các dạng YcbCr (4:4:4, 4:2:0, etc)

Profiles @ Level I,P,B picture, sự sắp xếp lại Quá trình mã hoá/giải mã, sơ đồ khối

Truyền dữ liệu MPEG

Định thời và điều khiển đệm STC/SCR/DTS PCR/PTS

Technical terms Macro blocks

HVS = Human Visual System GOP = Group of Pictures VLC = Variable Length Coding/Coder IDCT/DCT = (Inverse) Discrete Cosine Transform




PES = Packetized Elementary Stream MP@ML = Main profile @ Main Level PCR = Program Clock Reference SCR = System Clock Reference STC = System Time Clock PTS = Presentation Time Stamp DTS = Decode Time Stamp PAT = Program Association Table PMT = Program Map Table

Các cụm từ k ĩ thuật

Macroblock

HVS = Human Visual System GOP = Group of picture

VLC = Variable Length Coding/Coder

IDCT/DCT = (Inverse) Discrete Cosine Transform




PES = Packetized Elementary Stream

MP@ML = Main Profile @ Mail Level

PCR = Program Clock Reference

SCR = System Clock Reference STC = System time clock

PTS = Presentation Time Stamp

DTS = Decode Time Stamp

PAT = Program Association Table

PMT = Program Map Table

mailto:MP@ML

mailto:MP@ML