Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable...

23
Reproduction of 22.2 multichannel audio with virtual rendering ITU-R Workshop “Topics on the Future of Audio in Broadcasting” Wed 15th July 2015 Popov Room 16:30 - 20:00 Satoshi OODE Science and Technology Research Laboratories Japan Broadcasting Corporation (NHK)

Transcript of Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable...

Page 1: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Reproduction of 22.2 multichannel audiowith virtual rendering

ITU-R Workshop “Topics on the Future of Audio in Broadcasting”Wed 15th July 2015

Popov Room 16:30 - 20:00

Satoshi OODEScience and Technology Research Laboratories

Japan Broadcasting Corporation (NHK)

Page 2: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Overview 8K SHV broadcasting service◦ is planned to begin in 2016.◦ is composed of stereo, 5.1ch and 22.2ch

with metadata related to dialogue level control.

Requirement of 22.2 multichannel audio◦ Three dimensional spatial impression is achieved by

loudspeakers placed in 30-45 degree intervals.

Reproductions of 22.2 multichannel audio for home use◦ Theatrical environment using 24 loudspeakers or more.◦ Rendering to other channel configurations such as 9.1ch.◦ Loudspeakers integrated with the display using virtual

rendering (Binaural reproduction over loudspeakers).◦ Headphone using virtual rendering (Binaural reproduction).

Page 3: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

22.2 multichannel audiospecified in Rec. BS.2051

The 22.2 multichannel sound system◦ is specified as system H in Rec.

BS.2051.◦ consists of three layers.

Top layer: 9 channels including overhead loudspeaker.

Middle layer: 10 channels. Bottom layer: 3 channels including 2 LFE

channels

Top layer9 channels

Middle layer10 channels

Bottom layer3 channels + 2 LFE

TpFC

TpBCTpBL

TpSiL

TpFL

TpBR

TpSiR

TpFR

TpC

FCFLc

BL

BCBR

SiR

FRc

FR

SiL

FL

BtFC

BtFL BtFR

LFE2LFE1

Page 4: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Roadmap of 8K SHV broadcasting service

(2000) 2004 2012 2016 20202008 2018

London Olympics Tokyo OlympicsRio Olympics

Pilot broadcasting

Rec. BS.2051 Rec. BT.2020 Rec. BT.2077

Rec. BS.1116, BS.1196(Rec. BS.1770)

8K SHV pilot broadcasting service is planned to begin in 2016. Recommendations BT.2020 and BS.2051 were developed and

some related recommendations were revised. ARIB Standard B32 which specifies audio coding was also

updated in Japan.

ARIB STD-B32

2014

R&D Start

Full service

Page 5: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Audio servicein 8K SHV broadcasting

The revised ARIB Standard STD-B32 provides specifications of audio coding for the advanced satellite broadcasting system.

New features are as follows.◦ Transmission of the down-mixing coefficients for each programme.◦ Dialogue level control and dialogue replacement.◦ Audio coding including lossless transmission.

NHK plans to broadcast 8K SHV…◦ using MPEG-4 AAC in satellite broadcasting.◦ using stereo, 5.1ch and 22.2ch audio formats.◦ with metadata related to down-mixing for each programme.◦ with metadata related to dialogue level control function.

The information is reported in Report ITU-R BS.2159-7.

Page 6: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Dialogue level control function Many complaints with regard to the intelligibility of dialogue

although “Dialogue” is the most important contents. The listeners can separately control the level of dialogue and

that of total level.

In traditional channel-based,total level can be changed.

Dialogue level is increased.

Dialogue level is decreased.

BGM+SE

Dialogue BGM+SE

Dialogue

BGM+SEDialogue

BGM+SEDialogue

In 8K SHV broadcasting,relative dialogue level will be able to

be changed.

Page 7: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Dialogue level control in 22.2 multichannel audio

Top layer9 channels

Middle layer10 channels

Bottom layer3 channels + 2 LFE

TpFC

TpBCTpBL

TpSiL

TpFL

TpBR

TpSiR

TpFR

TpC

FCFLc

BL

BCBR

SiR

FRc

FR

SiL

FL

BtFC

BtFL BtFR

LFE2LFE1

Some channels are used only for dialogue depending on individual programmes. Dialogue channels have a flag of “dialogue”

Page 8: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Metadata for dialogue level control specified in ARIB STD-B32

Descriptor Explanationext_dialogue_status Existence of dialogue channels.num_dialogue_chans Number of main dialogue channels.num_additional_lang_chans Number of alternative dialogue channels.dialogue_src_index[i] Index of dialogue channels.dialogue_main_lang_comment_bytes Byte count of characters indicating content of main dialogue.dialogue_main_lang_comment_data Byte data of characters indicating content of main dialogue.dialogue_main_lang_code Language code of main dialogue.dialogue_additional_lang_code[i] Language code of ith alternative dialogue.dialogue_additional_lang_comment_bytes[i]

Byte count of characters indicating content of ith alternative dialogue.

dialogue_additional_lang_comment_data[i]

Byte data of characters indicating content of ith alternative dialogue.

dialogue_gain_index[i] Gain of alternative dialogue channels. (0000: 0 dB, 0001: –1 dB, 0010: –2 dB, ..., 1110: –14 dB, 1111: –∞ dB)

sn_dialogue_plus_index Maximum gain of dialogue channels in receiver. (000: 0 dB, 001: +3 dB, 010: +6 dB, ..., 110: +18 dB, 111: +∞ dB)

sn_dialogue_minus_index Minimum gain of dialogue channels in receiver. (000: 0 dB, 001: –3 dB, 010: –6 dB, ..., 110: –18 dB, 111: –∞ dB)

additional_dialogue_data_sync Data stream element in which alternative dialogue data is packed.additional_dialogue_index Index of alternative dialogue channels corresponding to the “i” of

dialogue_additional_lang_code[i].

Limitation of dialogue level control is important for broadcasterbecause dialogue source is not always clean.

Page 9: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Viewing angleof 8K Super Hi-Vision

System parameters are specified in Rec. ITU-R BT.2020 8K SHV has viewing angle of 100° in azimuth and 60° in elevation

when the listener is positioned at 0.75 Height of the display. 8K SHV requires wider and higher sound fields than 5.1ch or

stereo to match the visual image with the sound image .

Sound Stage

Viewing distance

Left

1.5H

0.75H

Right

Stereo5.1ch

22.2ch

Page 10: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Characteristics of22.2 multichannel audio

Stable localization of frontal sound over the entire area of the large-screen image

Sound image reproduced in all directions around the listener, including elevation

3D spatial impression augmenting the listener’s sense of reality

Wide listening area with excellent sound quality

Compatible with existing multichannel sound systems

Suitable for live recording, mixing, and transmission

5.1

22.2

Page 11: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Loudspeaker Intervals-Requirements of 22.2 multichannel audio-

24 loudspeakers 15 degree intervals

8 loudspeakers 45 degree intervals

-3.50

-3.00

-2.50

-2.00

-1.50

-1.00

-0.50

0.00

0.50

Top layer (45°) (40 subjects)

Middle layer (0°)(20 subjects)

Mea

n di

ffer

ence

gra

de sc

ore

(30゚)(45゚) ( 60゚ ) ( 90゚ )( 120゚ )12 8 6a6b 4a4b 3a3b

Loudspeaker arrangement(Loudspeaker intervals)

Spatial impression by 8 loudspeakers is almost the same as that by 24 loudspeakers.

Reference Test item

Page 12: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Reproductions of 22.2 multichannel audio for home use

Theatrical environment using 24 loudspeakers or more.

Rendering to other channel configurations such as 9.1ch.

Down-mixing to stereo or 5.1ch

Loudspeakers integrated with display using virtual rendering.

Headphone with virtual rendering (Binaural).

Page 13: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Theatrical environmentusing 24 loudspeakers or more

22.2 multichannel audio is usually reproduced by 24 loudspeakers Additional loudspeakers are added depending on listening

environment. Subwoofers of base management for full range channels (room size). Full range loudspeakers on the side to keep uniformity (room shape).

Not absolute positions of channels but relative positions are important.

FLIdeal environment

FL FL

Page 14: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Rendering to other channel configurations such as 9.1ch Rendering based on channel positionWhen the number of loudspeaker is large,

rendering is used. Down-mixingWhen 7.1ch, 5.1ch or stereo is used,

down-mixing coefficients are usedto prevent making dialogue uncleardue to spatial masking.

The spatial impression is deteriorated with decreasing number of loudspeakers.

FLFLc FC

SiL

BLBC

FRFRc

SiR

BR

L C R

Ls Rs

FLFC

SiL

BLBC

FR

SiR

BR

TC

L R

Ls Rs

FLFC

FR

L C R

Ls Rs

L R

stereo

5.1ch

9.1ch

22.2ch

Rendering

Down-mix

Page 15: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Loudspeakers integrated with display using virtual rendering

– How to introduce 8K SHV Audio into the home environment High-quality sound requires

24 discrete loudspeakers. Installing 24 loudspeakers is

over equipped for home environment.

– Compact and convenient system Loudspeaker system should be integrated with SHV display.

– 12 loudspeakers systemintegrated with 85 inches SHV FPDwas developed.

loudspeaker

12 loudspeakers integrated with 8K SHV Flat Panel Display

Page 16: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Loudspeakers integrated with display using virtual rendering For front 11 channels 8 channels around the display

are directly reproduced.(marked as red circles)

3 front channels on the display are reproduced using amplitude panning of vertical pair-wise.

Vertical pair-wise panning method

Horizontal pair-wise panning method

loudspeaker

12 loudspeakers integrated with 8K SHV Flat Panel Display

Page 17: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Loudspeakers integrated with display using virtual rendering

11 side, back and overhead channels around the listener are reproduced by binaural reproduction over 11 front loudspeakers.(* to simulate acoustical propagation characteristics from the loudspeaker to each ear.)

Binaural reproduction over loudspeakers has been studied since the 1960s. Some studies suggested horizontally

arrayed loudspeakers can operate binaural reproduction very well.The system realizes immersive

audio with only front loudspeakers.

loudspeaker

12 loudspeakers integrated with 8K SHV Flat Panel Display

Page 18: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Loudspeakers integrated with display using virtual rendering

Binaural Reproduction over loudspeakers

1z 2zTpCTpSiL TpSiR

TpBL TpBRTpBC

BC

SiR

BR

SiL

BL

1TpBLF

2TpBLF

1x 2x

22.2 multichannel sound field* is simulated using HRTFs corresponding to each loudspeaker

*11 side, back and overhead channels

Page 19: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Loudspeakers integrated with display using virtual rendering

Binaural Reproduction over loudspeakers

1x 2x

1y 2y

H

Original sound field

Reproduced sound field

11G12G

22G21G

Acoustic crosstalk

Crosstalk cancelation is achieved by unit matrix

=

2

1

2221

1211

2

1

GGGG

xx

Hyy

Condition number is one of the factors for system stability

Page 20: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Loudspeakers integrated with display using virtual rendering

– Condition number is one of the factors which indicates the stability of the binaural reproduction.

– Increase of the number of loudspeaker realizes more stable binaural reproduction regardless of loudspeaker configuration

0.5 1 1.5 2

x 104

1

2

3

4

5

6

7

8Two loudspeakers

Frequency [Hz]

Con

d(G

)

0.5 1 1.5 2

x 104

1

2

3

4

5

6

7

8Four loudspeakers

Frequency [Hz]

Con

d(G

)

0.5 1 1.5 2

x 104

1

2

3

4

5

6

7

8

Frequency [Hz]C

ond(

G)

Six loudspeakers

0.5 1 1.5 2

x 104

1

2

3

4

5

6

7

8Twelve loudspeakers

Frequency [Hz]

Con

d(G

)

Con

ditio

n nu

mbe

rs

Con

ditio

n nu

mbe

rs

Con

ditio

n nu

mbe

rs

Con

ditio

n nu

mbe

rs

≦12 loudspeaker

control2 loudspeaker

control

Page 21: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Headphone with virtual rendering(Binaural reproduction)

For mobile or personal use, 22.2 multichannel headphone processor was developed.22.2 multichannel sound field is simulated using HRTFs corresponding to each loudspeaker. The system in which audio engineer’s HRTFs are installed has already used in 22.2 multichannel sound production.

1z 2zTpCTpSiL TpSiR

TpBL TpBRTpBC

BC

SiR

BR

SiL

BL

1TpBLF

2TpBLF

1x 2x

FLBtFL

TpFLFLcBtFC FC

TpFC

FR

BtFRTpFR

FRc

22.2 multichannel headphone processor

Page 22: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Conclusion 8K SHV broadcasting service◦ is composed of stereo, 5.1ch and 22.2ch,◦ with metadata related to dialogue level control.

toward personalization, especially for the older person.toward adaptation of the listening environment.

Reproductions of 22.2 multichannel audio for home use◦ Theatrical environment using 24 loudspeakers or more.◦ Rendering to other channel configurations such as 9.1ch.◦ Loudspeakers integrated with the display using virtual

rendering (Binaural reproduction over loudspeakers).◦ Headphone using virtual rendering (Binaural reproduction).

Page 23: Reproduction of 22.2-multichannel audio with virtual rendering · 22.2 multichannel audio Stable localization of frontal sound over the entire area of the large-screen image Sound

Thank you for your attention