Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial...

44
Audio challenges in virtual and augmented reality devices Dr. Ivan Tashev Partner Software Architect Audio and Acoustics Research Group Microsoft Research Labs - Redmond

Transcript of Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial...

Page 1: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Audio challenges in virtual and augmented reality devices

Dr. Ivan Tashev

Partner Software Architect

Audio and Acoustics Research Group

Microsoft Research Labs - Redmond

Page 2: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

In memoriam: Steven L. Grant

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 2

Page 3: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

In this talk

• Devices for virtual and augmented reality

• Binaural recording and playback

• Head Related Functions and their personalization

• Object-based rendering of spatial audio

• Modal-based rendering of spatial audio

• Conclusions

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 3

Page 4: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Colleagues and contributors:

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 4

Hannes Gamper

Microsoft Research

David Johnston

Microsoft Research

Ivan Tashev

Microsoft ResearchMark R. P. Thomas

Dolby Laboratories

Jens Ahrens

Chalmers University,

Sweden

Page 5: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Devices for Augmented and Virtual RealityThey both need good spatial audio

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 5

Page 6: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Augmented vs. Virtual Reality

• Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are augmented (or supplemented) by computer-generated sensory input such as sound, video, graphics

• Virtual reality (VR) is a computer technology that replicates an environment, real or imagined, and simulates a user's physical presence in a way that allows the user to interact with it. It artificially creates sensory experience, which can include sight, touch, hearing, etc.

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 6

Page 7: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Components of the devices

• Optical projection• Display

• Head mounted displays

• Eyeglasses

• Head-up display

• Sound system• Loudspeakers of headphones

• Head and body position and orientation tracking

• Computer

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 7

Sensorama virtual reality headset (1950s)

Credit: Wikipedia

Page 8: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Examples for AR/VR devices

• Oculus Rift: VR glasses• Head-up display, controller

• Needs PC to operate

• Smartphone holders• Simple low-cost VR solution

• Limited applicability

• Microsoft HoloLens: AR device• Head-up display

• Autonomous device

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 8

Page 9: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Usage scenarios

• Gaming

• Entertainment

• Productivity

• Science

• Design and art

• Education

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 9

Page 10: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Audio components of AR/VR devices

• Capture user’s voice• Voice input is critical modality in the UI

• Capture the audio environment• Especially important for AR devices

• Adds value for some usage scenarios

• Spatial audio rendering engine• To be en par with the holographic imaging

• Personalization of the spatial audio engine• We all hear differently

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 10

In this talk

Page 11: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Binaural recording and playbackA brief history

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 11

Page 12: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Binaural recording and reproduction

• Théâtrophone, 1881

• Binaural recordings – mid 50’s

• Problems:• Fixed scene

• HRTFs mismatch

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 12

Neuman KU-100

Page 13: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Headphones for Spatial Audio rendering

• Headphones

• Head orientation tracking• direction, elevation, roll

• Head position is a plus• x, y, z (cameras or Kinect)

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 13

Jabra Intelligent headsetMSR experimental prototypeOSSIC X Dysonics Rondo Motion

Page 14: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Head Related Transfer Functions and their personalizationThey are just directivity patterns

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 14

Page 15: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Spatial hearing

• Direction and distance perception

• Within audible frequency and dynamic range

• Delivered to one or both ears

• Contains auditory localisation cues:• Interaural time and level differences

• Spectral cues

• Reverberation/reflections

• Dynamic and multimodal cues

• (expectation and experience)

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 15

HRTFs

Page 16: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

HRTF personalisation

• Describe acoustic path from sound source to ear entrances Contain all interaural and spectral localisation cuesAre a function of sound direction Can be considered distance-independent for radii > 1m

• Head and torso geometry affects wave propagationAnthropometric features are individual HRTFs are individual Spatial hearing is individual!

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 16

Page 17: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Measuring HRTFs

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 17

HRTF measurement rig Measurement locations

Page 18: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Example measurements

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 18

Head-related transfer function (HRTF)

Page 19: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Example measurements (2)

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 19

HRTF in horizontal plane,

right ear

-0.5

0

0.5 -0.5

0

0.5

-0.5

0

0.5

yx

z

-100 0 10050

100

1000

10000

Direction (deg)

Fre

qu

en

cy (

Hz)

HRTF for 1000 Hz,

right ear

HRTF for azimuth,

90°, both ears

Page 20: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

MSR HRTF database

• ~300 subjects

• HRTFs measured at 400 locations

• High resolution 3D head scans

• Direct anthropometrics measurements• Head width, depth, height, etc.

• Questionnaire • Hat size, shirt size, jeans size, etc.

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 20

3-D head scan

Page 21: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

HRTF personalization: direct estimation

• Given 3-D head scan

• Trace acoustic propagation from source positions to ear entrances

• Good results with high-resolution scan

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 21

Gamper, H.; Thomas, M. R. P. & Tashev, I. J. (2015). “Estimation of multipath propagation delays and interaural time

differences from 3-D head scans.” Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP).

Page 22: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Anthropometrics-based HRTF personalization:Magnitude Synthesis

1. Measure anthropometric features on a large database of people.

2. Represent a new candidate’s anthropometric features as a sparse combination α of people in the database.

3. Combine HRTFs with same weights α to synthesize personalized HRTF.

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 22

P. Bilinski, J. Ahrens, M. R. P. Thomas, I. J. Tashev, J. C. Platt, “HRTF magnitude synthesis via sparse representation of

anthropometric features,” ICASSP, 2014.

I. J. Tashev, “HRTF phase synthesis via sparse representation of anthropometric features,” ITA Workshop, 2014.

Page 23: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

HRTF personalization: parametrization

• Given 3-D head scan

• Fit sphere to scan

• Parameterize ITD models

• Works with noisier (e.g., Kinect 1) scans

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 23

Sphere fitted to 3-D head scan.

Gamper, H.; Thomas, M. R. P. & Tashev, I. J. (2015). “Anthropometric parameterisation of a spherical scatterer ITD model

with arbitrary ear angles.” Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

Page 24: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

HRTF personalization: Eigen-face approach

• Given single (Kinect 2) depth image

• Fit average face to scan

• Use deformations as a features in ML-based estimator

• HRTF personalization in less than 10 seconds!

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 24

Average face fitted to depth image.

Page 25: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Object-based Rendering of Spatial AudioTwo ears – three dimensions?

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 25

Page 26: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Object-based spatial audio rendering

Dynamic cuesInteraural and spectral cues

Reverb and reflections

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 26

Spatial rendering engine

Head models and

Measurements

(HRTFs)

Room models and

measurements

Orientation and

Position tracking

input

audio stream

Page 27: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Composing a Sound Scene - World

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 27

1. Source radiation pattern 𝑓 𝜃, 𝜙, 𝜔

2. Scaling (gain)

3. Rotation 𝛼, 𝛽, 𝛾

4. Translation 𝑥, 𝑦, 𝑧

5. Repeat for all sources

𝑥

𝑦

𝑧

𝜃

𝜙

Page 28: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Composing a Sound Scene -View

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 28

1. Source radiation pattern 𝑓 𝜃, 𝜙, 𝜔

2. Scaling (gain)

3. Rotation 𝛼, 𝛽, 𝛾

4. Translation 𝑥, 𝑦, 𝑧

5. Repeat for all sources

Page 29: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Composing a Sound Scene –View

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 29

View transform into user space

(𝑥0, 𝑦0, 𝑧0, 𝛼0, 𝛽0, 𝛾0)

Page 30: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Composing a Sound Scene – Projection

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 30

View transform into user space

(𝑥0, 𝑦0, 𝑧0, 𝛼0, 𝛽0, 𝛾0)

‘Project’ onto user’s head-related

transfer function (HRTF)

Entire scene becomes two signals

that are fed to the headphones.

Page 31: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Modal-based Rendering of Spatial AudioCan we model the sound field?

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 31

Page 32: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Detour: Fourier Analysis of a 1D Signal

• Consider a signal 𝑥 𝑡 .

• Any time series can be decomposed as a weighted sum of sin() and cos() functions.

• These weights are its Fourier spectrum 𝑋 𝜔 .

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 32

=

+

+

+

+

Page 33: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Plane Wave Density Function

• Plane Wave Density Function, or Signature Function

𝑠 𝜃, 𝜙, 𝑡 ↔ 𝑆 𝜃, 𝜙, 𝜔

• Plane wave density function is very powerful• Continuous representation of a sound field at a single point*

• Can be used to place a user at the same location; an audio panorama in which the user can freely rotate in 𝛼, 𝛽, 𝛾 .

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 33

*Under certain conditions, we can theoretically extrapolate the pressure field to any point in space (though extremely difficult to do in practice).

Time ↔ Frequency

Page 34: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Fourier Transforms with Respect to Space

• Notice Fourier Transform is with respect to time; spatial component (𝜃, 𝜙) is unaffected.

• We can also take Fourier Transforms with respect to space using the spherical harmonics as basis functions.

• Notice continuous space maps to discrete order 𝑚 and degree 𝑛.

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 34

𝑠 𝜃, 𝜙, 𝑡

Ǎ𝑠𝑛𝑚 𝑡

𝑆 𝜃, 𝜙, 𝜔

ም𝑆𝑛𝑚 𝜔

Time ↔ Frequency

Time ↔ Frequency

Space ↔ ModalSpace ↔ Modal

Page 35: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Spherical Harmonics

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 35

+ =

Page 36: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

First-Order Sound field Decomposition

• B-Format• Three fig-8 (pressure gradient) microphones and one

omnidirectional microphone.

• Arranged to be near-coincident.

• Provides first order decomposition of plane wave density function

• Used in Ambisonics

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 36

Courtesy ambisonia.com

Page 37: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Higher Orders: Spherical Microphone Array

• Higher orders: sample the pressure on a sphere 𝑝 𝜃, 𝜙, 𝑡

• Plane waves are scattered by the rigid surface• Spherical wave reflects outward

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 37

𝑥

𝑦

𝑧

𝜃

𝜙

Page 38: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Devices for sound field decomposition

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 38

16-ch. 115 mm spherical mic.

array

64-ch. 200mm spherical mic.

array

16-ch. 115 mm cylindrical mic.

array

Page 39: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Modal-based spatial audio rendering

• Decomposition of the sound field• Spherical harmonics: representation of the sound field as a sum of basis functions of

degree n and order m

• Cylindrical harmonics – the 2D version in horizontal plane only

• Properties• Linear – allows summing of independently processed signals

• The sound field can be rotated – allows compensation for head movement

• Convolution is multiplication – allows fast application of the HRTFs

• Same steps:• Reflect the head rotation

• Apply HRTFs (decomposed in spherical harmonics)

• Add reverberation

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 39

Page 40: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

ConclusionsWhat exactly were the challenges?

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 40

Page 41: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Spatial Audio

• Technique to make the listener to perceive the sound coming from any desired position. Also known as 3D audio.

• Challenges:• Low latency head orientation tracking

• HRTF personalization

• Realistic reverberation and distance cues

• Applications go way beyond augmented and virtual reality devices • Listening to stereo music

• Watching movies with surround sound

• Gaming

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 41

Page 42: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Available today or coming soon:

• HoloLens Development Kit (March 2016):• Spatial sound engine in Direct X

• Spatial sound in Unity plug-in (Microsoft HRTF extension)

• Full spatial audio support in Windows 10• Object-based spatial audio engine is part of Windows SDK

• Virtual Surround Sound support in Windows 10• Wrapper around the spatial audio engine

• Proven in real applications architecture and algorithms

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 42

Page 43: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Shameless plug

• Immediately precedes ICASSP

• More at http://www.hscma2017.org/

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 43

Page 44: Audio for Augmented and Virtual Reality Devices challenges in virtual and... · Modal-based spatial audio rendering •Decomposition of the sound field •Spherical harmonics: representation

Finally …

Questions?

Sep 15, 2016 IWAENC Audio challenges in virtual and augmented reality devices 44