3D Audio: pt3
Transcript of 3D Audio: pt3
![Page 1: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/1.jpg)
3D Audio: pt3
Sound synthesis and spatial processing Politecnico di Milano – Polo Regionale di Como
![Page 2: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/2.jpg)
SUMMARY • Wavefield Synthesis
• strict foundation on the basic laws of acoustical wave propagation
• Ambisonics • represent the sound field by an expansion into
three-dimensional basis functions • Microphone Acquisition
![Page 3: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/3.jpg)
WAVEFIELD SYNTHESIS
![Page 4: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/4.jpg)
Wavefield synthesis • Basic idea:
• reconstruct the soundfield within a volume by recreating the wavefront in a closed surface that contains it (through sampling)
![Page 5: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/5.jpg)
Wavefield synthesis • Huygens principle:
“every points of a wavefront can be considered as the source of an elementary wave that interacts with the surrounding environment”
● The wavefield produced by a primary
source Ψ can be reconstructed using a distribution of secondary sources, for example obtained by placing the speakers P1, P2, …, PM
![Page 6: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/6.jpg)
Wavefield synthesis
![Page 7: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/7.jpg)
Wavefield synthesis • Let:
• Q(z , ω) be the Fourier Transform of the emitting source; • z be the vector of polar coordinates (r,θ,φ) also referred as the
listener position; • P(z,ω) be the Fourier Transform of the acoustic pressure in any
point z • V be the volume that encloses the listening area • S be the surface that encloses the volume V
!
![Page 8: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/8.jpg)
Wavefield synthesis • The Huygens principle is mathematically described by the
Kirchhoff-Helmholtz integral
where: • G is the Green Function. The GF describes the contribution of
a point source at position z’ to the soundfield at position z. • Point source Green Function:
• P(z’,ω) is the pressure along the enclosing surface S • ∂ / ∂n P(z’,ω) is the pressure gradient along the surface normal
![Page 9: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/9.jpg)
Wavefield synthesis
The wavefield in a volume V is completely described by the pressure P(z’, ω) along the enclosing surface S and the pressure gradient ∇P(z’, ω) along the surface normal n.
According to the Huygens principle, we can interpret the above integral as a set of secondary sources along the surface S
![Page 10: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/10.jpg)
Wavefield synthesis • Choose G as Green’s function of a point source
(monopole)
∂ / ∂n( G ) is the Green’s function of a dipole • Approximate the distribution of monopoles and dipoles
on the surface S by suitable types of loudspeakers. • Excite the loudspeakers with suitable driving signals
which are derived from P(z’, ω) and ∇P(z’, ω)
![Page 11: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/11.jpg)
Acquisition system • The holophonic acquisition system is derived from the
Huygens principle and the Kirchhoff integral, and consists of the spatial sampling of
• 1) pressure P(z’, ω) • 2) pressure gradient ∇P(z’, ω)
• Pairs of microphones • omnidirectional for pressure • directional for gradient (figure eight)
• Pairs of speakers • Monopolar (closed speakers) • Dipolar (open speakers)
• As shown in the K-H integral, the omnidirectional microphone feeds the dipole while the il directional one feeds the non-dipolar (closed) speaker
![Page 12: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/12.jpg)
acquisition system
![Page 13: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/13.jpg)
acquisition system
![Page 14: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/14.jpg)
Wavefield reconstruction • Microphones are now replaced by speakers
• driving signals are the same that were acquired by the microphones
• Speakers must be used in pairs with similar polar characteristics as the microphones used for the acquisition
• The perfect reconstruction is obtained when we can have a continuous distribution on S
• Distance btw microphones and btw speakers determines the minimum wavelength that can be recorded (Shannon)
![Page 15: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/15.jpg)
Wavefield reconstruction
![Page 16: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/16.jpg)
Wavefield reconstruction • Elimination of dipoles:
• admit a certain sound pressure also for z′ outside V • use a modified Green’s function G1 in order to verify the
condition ∂/ ∂n G1=0 on S
only monopoles are now used • cons:
• speaker selection: windowing artifacts • soundfield generated outside V
![Page 17: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/17.jpg)
Wavefield reconstruction Reproduction of monochromatic plane wave …
![Page 18: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/18.jpg)
Wavefield reconstruction • From 3D to 2D:
• reconstruct the wavefield on a plane (listener head)
• Spatial Sampling: • Sampling the curve:
• Trade-off: N. of speakers VS. Max sampled freq. • We limit our bandwidth to frequencies below 1.5KHz
(above 1.5 the damage is quite contained)
![Page 19: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/19.jpg)
Wavefield reconstruction Spatial sampling artifacts:
![Page 20: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/20.jpg)
Example: 8-channel circular rig • Wavefield synthesis requires an anechoic rendering
environment • When the environment is not anechoic we can adopt a
scheme with filters that are design to cancel (or minimize the impact of the environment)
Multitrack recording
![Page 21: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/21.jpg)
Non-anechoic environments • G(z) can be determined through a
calibration procedure • Adapt a filter in such a way to
minimize the error btw the acquired signal in the rendering room y[n] (using microphones that are similar to those used during the recording) and the reference signal d[n]
• Calibration signal is a pseudo-random noise
• Filter lengths are proportional to reverberation time
![Page 22: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/22.jpg)
Wavefield synthesis: applications • Example of application: recreating the soundfield of
a different environment
![Page 23: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/23.jpg)
• improving the acoustics of an auditorium by acquiring the sound and re-synthesizing it in the same environment
Wavefield synthesis: applications
![Page 24: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/24.jpg)
Wavefield synthesis • using WFS together with beamforming it is possible
to silence some areas
![Page 25: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/25.jpg)
AMBISONICS
![Page 26: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/26.jpg)
Ambisonics - overview • Originally developed in the ’70s by Gerzon,
Barton and Fellgett as a novel multi-channel recording-rendering technique with the aim of enhancing immersivity • With ambisonics the sound is encoded
together with its directional components in a vector as a set of spherical harmonics
• Similarly to holophonic systems, we can apply a transform matrix (decoder) to such vector in order to produce the signals that feed a speaker array that surrounds the environment
![Page 27: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/27.jpg)
Ambisonics - overview
Sound!eldMicrophone
WXY(Z)coding
decodingand
conversion ch.N
Recording
Sweet-spot
2 ch. LR
2 ch.headphone
3/ 2
![Page 28: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/28.jpg)
Pros & cons • Main advantages
• Simple recording/coding method. We need multiple microphones (at least 4 for 3d – 3 for 2D) placed in the same point at the center of the acoustic scene;
• Independent coding/decoding methods • We can decode sound to produce a stereo signal, or a signal
that is compatible with a standard 3/2 or with an N-channel system (N=4, 6, 8,…)
• Main disadvantages • Sources can only produce planar wavefronts (fundamental
assumption of Ambisonics) • Not restrictive as any wavefield can be modeled as the
combination of planar wavefronts • The rendered wavefield must be a combination of planar
wavefronts (i.e. speakers must be sufficiently far from the listener)
• Not a negligible problem for small environments such as cars
![Page 29: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/29.jpg)
Ambisonics • Ambisonics systems are based on transducers
(mics) that produce a sampling of directional components of • order zero (pressure microphones) • order one (pressure gradient microphones) • ...
• Higher-order components correspond to a sampling of the acoustic field with many more (at least 9) super-directional microphones
![Page 30: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/30.jpg)
Ambisonics • Directional characteristics of the wavefield are
reconstructed by summing the spherical components of the field for orders m=0,1,2; each acquired with a microphone having similar directivity characteristics
(from an omnidirectional microphone)
(from microphone with an 8-shaped response)
![Page 31: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/31.jpg)
Coding formats • A-format – used for coincident directional microphones
(symmetrically oriented - soundfield) • B-format – developed for studio equipment (most
widespread) • Used also when microphones have similar directional
characteristics as the spherical harmonics • Signal are coded into W,X,Y,Z
• C-format – for transmission • UJT-format – for multi-channel coding (compatible with
mono and stereo formats) • D-format – for decoding and rendering
![Page 32: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/32.jpg)
Coding formats • When using a microphone for each harmonic
component, signals coming from W, X, Y, Z (B-format) can be encoded in any other format
• Z is omitted in 2D systems
![Page 33: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/33.jpg)
Theory • Hypothesis
• Sources are considered point source • Sources emit planar wavefronts
• Notation • We adopt cylindrical coordinates • Let us consider (for simplicity) the 2D case (z=0) • Let k be the wave vector, of magnitude k=2π/λ (wave number) and
same orientation as Ψ • k= [kx ky] = [kcosΨ ksinΨ] • Let r = (r,φ) be the space vector
• The soundfield produced by a planar wave is where PΨ is the pressure
![Page 34: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/34.jpg)
Theory • Pressure of reference field in a specific point is
• Goal: to reproduce the planar wave relative to the listening reference pt by replacing the real source with a rendering system made of N speakers symmetrically arranged around the listening pt
proof
xP! ej kr
!
y!
r
P! ejkr cos( !! ! )
listening point
plane wave
source
![Page 35: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/35.jpg)
Theory
proof
Theory
proof
![Page 36: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/36.jpg)
Rendering • Each speaker generates a wavefield • At the listening pt we have • To reproduce the original
field we need
![Page 37: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/37.jpg)
Rendering • The planar wave is characterized by a constant
pressure and a phase component that can be expanded using a Fourier-Bessel series
![Page 38: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/38.jpg)
Synthesis phase • Truncate series as we have N speakers only
Matching conditions
![Page 39: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/39.jpg)
Facts • As N tends to infinity
• the order-N approximation tends to an exact matching, so the synthesized wavefront is exactly the same as the reference one
• the sweet spot expands
![Page 40: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/40.jpg)
Vector-form expression for analysis
• We decoupled the direction information of the of the plane wavefront (ψ) from information on the soundfield (φ)
• The information on the spatial distribution of the plane wavefront is all in c
• Pr(r) can be express in the vector-form
proof
![Page 41: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/41.jpg)
Vector-form expression for analysis proof
€
cTh=J0(kr)+ 2cosψ⋅ j 2J1(kr)cosφ+ 2sinψ⋅ j 2J1(kr)sinφ+...+
+ 2cosmψ⋅ jm 2Jm(kr)cosmφ+ 2sinmψ⋅ jm 2Jm(kr)sinmφ
cTh=J0(kr)+2jJ1(kr)cosψcosφ+2jJ1(kr)sinψsinφ+...+
+2jmJm(kr)cosmψcosmφ+ jmJm(kr)sinmψsinmφ=
=J0(kr)+2 jmJm(kr)cosmψcosmφ+ jmJm(kr)sinmψsinmφ( )m=1
∞
∑
€
cTh=J0(kr)+2 jmJm(kr)cos[m(φ−ψ)]( )m=1
∞
∑
Vector-form expression for analysis proof
![Page 42: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/42.jpg)
Vector-form expression for analysis
• First components of vector c multiplied PΨ
• Which are for m=0 (zero order) and m=1 (first order)
![Page 43: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/43.jpg)
• In the acquisition phase (mics) we need to identify the coefficients cm of vector c • In principle we need m=0,…,+∞ • In practice, we need to truncate the expression
• In order to measure cm we would need mics whose directivity matches cos(mΨ) e sin(mΨ). Except for m=0 (omnidirectional mic) and m=1 (bi-directional mics), for m>=2 there are no matching mics
• In practice, we stop at order 1 or 2 • For high order we need to mathematically
determine coefficients cm
Vector-form expression for analysis
![Page 44: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/44.jpg)
![Page 45: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/45.jpg)
• In the synthesis phase
Vector-form expression for synthesis
![Page 46: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/46.jpg)
Vector-form expression for synthesis • Let be the soundfield produced by
the n-th speaker • We have
Vector-form expression for synthesis• Let be the soundfield produced
by the n-th speaker
• We have
C=[ c1 c2 . .. cN ]
PA( r )=∑
n=1
N
Pn(r , f
n)=∑
n=1
N
PncnT h=AT CT h
AT=[ P1 P2 .. . PN ]
![Page 47: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/47.jpg)
Coding • The recorded sound contains the whole information
required for the coding • So the Ambisonic signal contains the whole
information (magnitude and direction) of the sources • E.g.: the 2D Ambisonic signal for a source PΨ
oriented as Ψ is given by
![Page 48: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/48.jpg)
Notes • In ambisonics recordings the order M is extremely
limited • Typical choice is M=1 but sometimes M=2 or M=3
are used for experimental reasons • Truncation implies a limited listening region: as M
increases, the sweet spot grows • Truncation implies a limited localization
• The notation (W,X,Y,U,V) is the typical one of the 2D case. The harmonics along z are, in fact, zero • m=1 implies Z=0 • m=2 implies R=S=T=0
![Page 49: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/49.jpg)
Decoding • We start from the matching conditions
and we want
€
PA(r)=PR(r)
€
PA(r)=ATCTh
€
PR(r)=PψcTh
€
ATCTh=PψcTh we can define
€
b=PψcT
€
bT = ATCT
Where:
Decoding • We start from the matching conditions
and we want
we can define
€
bT = ATCT
Where:
![Page 50: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/50.jpg)
• We thus have where
• C is dependent of the speaker positions
The only element we don’t know is A which is the vector of the unknowns magnitudes that drive the speakers
• As C is not square, we need to resort to pseudo-inversion (least mean square solution)
Decoding
€
AT=[P1 P2 ...PN]
• We thus have where
• C is dependent of the speaker positions
The only element we don’t know is A which is the vector of the unknowns magnitudes that drive the speakers
• As C is not square, we need to resort to pseudo-inversion (least mean square solution)
Decoding
![Page 51: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/51.jpg)
Speaker placement • Speaker signals are determined by the Ambisonics
weights according to their spatial location • When speakers are placed on a circle (2D case)
we have • Likewise, a closed-form expression can be found
for speakers placed on a spherical surface (3D case)
![Page 52: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/52.jpg)
Remarks • Coding (recording) format (W,X,Y (Z) , …) is independent
on the No. and on the configuration of the rendering speakers
• A large number of equidistant and uniformly distributed speakers guarantees a larger sweet spot and a more stable localization of sound
• Many speakers guarantee better realism
• Asymmetric speaker placement requires a different weighting of components coming from
• Speaker characteristics must be as similar as possible
![Page 53: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/53.jpg)
Remarks • For a good 2D rendering we need at least 4
speakers • For a correct localization we need 5 or more
speakers • For 3D Ambisonics (complete sphere) we need 6-8
speakers at least • There are currently systems based on over 128
speakers
![Page 54: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/54.jpg)
Comparative remarks • Both holophonic systems (wavefield synthesis) and
Ambisonics attempt a complete characterization of the soundfield but they profoundly differ in their approach
• We consider a number of speakers N=S number of mics, and the order given 2M+1
• Holophony is based on spatial sampling of the wavefield • Spatial aliasing • For small S, due to aliasing, the error is uniformly distributed
over the volume • Increasing S the error has a fast decrease
• Ambisonics truncate the spherical harmonic series • Approximation that narrows the sweet spot • For M low, the error is minimal at the center of the sweet spot
and it increases as we move out of it • Increasing M the sweet-spot increase and the error decrease
![Page 55: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/55.jpg)
Comparative remarks • For small S Ambisonic work better than holophonic
approach. • For high order they are almost equivalent • The difference is in mics acquisition:
• In Ambisonic for high order we need specific directional mic (they could even not exist)
• In Holophony we can increase the number of Omnidirectional and “otto” mics
![Page 56: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/56.jpg)
References • P. Fellgett, “Ambisonics. Part one: general system description” Studio
Sound, pp. 20-22 & 40. August, 1975 • M.A. Gerzon, “Peiphony: With-Height Sound Reproduction”, JAES Vol.
21, No. 1, Jan-Feb 1973. • M.A. Gerzon, “The Design of Precisely Coincident Microphone Arrays
for Stereo and Surround Sound”, AES preprint L-20 Convention 50, March 1975.
• M.A. Gerzon, “Ambisonics. Part two: Studio Techniques”, Studio Sound, pp. 24-30, October 1975.
• M.A. Gerzon, “Design of Ambisonic Decoders for Multispeaker Surround Sound”, 58th Audio Egineering Society Convention, November 1977.
• D. G. Malham, A. Myatt, “ 3-D Sound Spatialization using Ambisonic Techniques”, Computer Music J., vol. 19 no. 4, pp. 58-70 (1995).
• J. S. Bamford, “An Analysis of Ambisonic Sound Systems of First and Second Order”. Master Thesis. Waterloo University, Ontario, Canada. 1995.
• Jérome Daniel, “Represéntation de champs acoustiques, application à la trasmision et à la reproduction de scenes sonores complexex dans un contexte multimedia”, PhD thesis, CCETT- France Telecom, 1999.
![Page 57: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/57.jpg)
Microphone acquisitions
![Page 58: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/58.jpg)
Microphones • The mic is an engine which produces an output
voltage v(t) proportional to some scalar or vector acoustic variable g(t)
• Microphones can be classified according to the law of proportionality btw acoustic and electric variable: • Microphones that are sensitive to the (scalar)
acoustic pressure • Non-directional (omni-directional) microphones
• Microphones that are sensitive to the (vector) pressure gradient
• Directional microphones whose characteristics depend on the order m of the gradient
![Page 59: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/59.jpg)
Directional characteristics • Radiation diagrams
a) Pressure microphone (omni-directional) b) Low-order pressure gradient microphone (directional) c) High-order pressure gradient microphone (very directional)
NB: radiation diagrams depend on the frequency and generally we have different diagrams for different frequencies
![Page 60: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/60.jpg)
Combining radiation diagrams • It is possible to construct alternative radiation patterns through
• combining and connecting several microphone capsules (possibly with different characteristics) within the same device
• Adopting particular physical geometries around the capsule(s)
• E.g. Cardioid pattern can be obtained through a series connection of an omnidirectional capsule with a figure-of-eight combined in the same enclosure.
![Page 61: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/61.jpg)
Acquisition techniques • Three different strategies
• Directional microphones in close proximity (coincident)
• intensity depends on orientation only • referred to as Intensity difference ΔI.
• Omni-directional microphones placed at a distance of tens of centimeters from one another
• referred to as time difference ΔΤ • As microphones are omni-directional, intensity
difference is negligible • Mixed microphones: mix directional microphones
(cardioid or 8-figure) placed at a certain distance from one another
• we have both a ΔT and a ΔI
![Page 62: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/62.jpg)
Intensity stereophony • XY techniques (intensity stereophony)
• two directional microphones at the same place, and typically pointing at an angle 90° or more to each other
• A stereo effect is achieved through differences in sound pressure level between two microphones.
• Due to the lack of differences in time-of-arrival / phase-ambiguities, the sonic characteristic of XY recordings is generally less spacy and has less depth compared to recordings employing an AB-setup
• When the microphones are bidirectional and placed facing +/-45° with respect to the sound source the XY-setup is called a Blumlein pair.
• The sonic image produced by this configuration is considered by many authorities to create a more realistic, almost holographic soundstage
![Page 63: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/63.jpg)
Intensity stereophony
(a) ΔI pair (XY config.) with cardioid mics (b) ΔI pair (XY config.) with figure 8 mics
(Blumlein pair) (c) ΔI pair with one cardioid and one 8-figure
(Mid-Side configuration)
![Page 64: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/64.jpg)
Time-of-Arrival Stereophony • A-B techniques (time-of-arrival stereophony)
• two parallel omnidirectional microphones some distance apart, so capturing time-of-arrival stereo information as well as some level (amplitude) difference information, especially if employed in close proximity to the sound source(s)
• At a distance of about 50 cm the time delay for a signal reaching
first one and then the other microphone from the side is approximately 1.5 msec (1 to 2 msec)
![Page 65: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/65.jpg)
Acquisition techniques
(e) ΔΤ pair (A-B) with omni-directional mics
![Page 66: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/66.jpg)
Near-coincident technique: mixed stereophony • the ORTF stereo technique of the Office de Radiodiffusion
Télévision Française = Radio France, calls for a pair of cardioid microphones placed 17 cm apart at a total angle between microphones of 110 degrees which, according to Eberhard Sengpiel, results in a stereophonic pickup-angle of 96°
• Notice that the spacing of 17 cm has nothing to do with human ear distance. The recorded signals are generally intended for playback over stereo loudspeakers and not for ear phones.
![Page 67: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/67.jpg)
Acquisition techniques
(f) ΔT−ΔI pair (A-B) w/ cardioids (ORTF - Office de
Radiodiffusion Television Française)
![Page 68: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/68.jpg)
Examples of ORTF rigs
![Page 69: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/69.jpg)
Acquisition techniques • For multi-channel application
• ∆Τ and mixed techniques use one mic for each reproduction channel (holophonic)
• ∆I (coincident) techniques can use a smaller number of mics than
the number of speakers – The signal fed into speakers are the obtained by linear combination of signal captured from mics, through a coding matrix
where gi is a constant, ci is the signal to feed to the ith speaker and mi the signal captured from the ith mic
where M is the number of mics
![Page 70: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/70.jpg)
Surround Atmo-Mikrofon • “Surround-Atmo-Mikrofon” – composed by 4 cardioid mics • Much used to record concerts
![Page 71: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/71.jpg)
Multi-channel Mid-Side • Mid-Side (MS) • It uses the two cardioid mics (Mf and Mb) and one “otto” figure mic (S) • It possible to build a 5 channel system (with C central – L Left – R Right
– Ls Left back – Rs Right back)
![Page 72: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/72.jpg)
Multi-channel Mid-Side
With and costant
![Page 73: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/73.jpg)
Microphone arrays • Discrete arrays
![Page 74: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/74.jpg)
Discrete microphone arrays
• Sets of discrete microphones with various geometries and appropriate directivity patterns
![Page 75: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/75.jpg)
3D arrays • Soundfield microphone
![Page 76: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/76.jpg)
High-order arrays • Trinnov’s 3D array with 24 microphones arranged in a
pseudo-random fashion
![Page 77: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/77.jpg)
3°-order directivity responses
![Page 78: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/78.jpg)
High Order Ambisonics (HOA) • France Telecom also
developed spherical arrays with 32 microphones (4th order)
![Page 79: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/79.jpg)
Ambisonics and WFS acquisition and rendering
![Page 80: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/80.jpg)
Ambisonics array • The directional 3D diagram
of the first-order Ambisonics is made of an omnidirectional figure and by three figures eight, one per axis
• One pressure microphone (omnidirectional)
• Three gradient pressure microphones
• sounds coming from mics are those of the B format
• Omnidirectional mic: W signal
• Directional mics: X,Y and Z signals
![Page 81: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/81.jpg)
Soundfield microphone • The ambisonics acquisition system is a coincident
array. In order to avoid mic cluttering, we can use the so-called Soundfield microphone of type ∆I • First-order approximation, the microphone is made
of 4 cardioid directional cartridges placed on a tetrahedron (one mic per direction)
• The soundfield mic picks up 4 signals: RF right-front, RB right-back, LF left-front and LB left-back
• This set of signals is called A-format
![Page 82: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/82.jpg)
Soundfield microphone • B-format can be derived
by the A-format as
![Page 83: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/83.jpg)
W
Z
Z
Y
4 channel: Σ,Δ,T,Q
3 channel: Σ,Δ,T(T band limited to 5kHz)
2 channel superstero
2 channel stereo
2 channel summed
UH
J enc
oder
Limits and other formats • Because of the discrete distribution of microphones, there is
an upper bandwidth limit on the 4 components of the B-format
• From [W X Y Z] is possible to derive other formats. • With the so called UHJ encoding matrix is possible to
obtain variuous broadcast fotramt from the B-Format.
W
Z
Y
X
4 channel: L,R,T,Q
3 channel: L,R,T (T band limited to 5kHz)
2 channel superstero
2 channel stereo
2 channel summed
UH
J enc
oder
![Page 84: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/84.jpg)
Second-order ambisonics • First-order ambisonics with coincident microphones
determines a panning law that is quite far from ideality
• In order to better approximate ideal directivity patterns we can adopt a second-order approximation • The second order coincident
array is made of 12 cardioid or hyper-cardioid microphones arranged on a dodecahedron
![Page 85: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/85.jpg)
Problems and limits • Coincident microphone arrays are used for stereo
or surround sound recording but there are some factors that affect the correct reconstruction of the spatial acoustic image • Distance btw microphones is 3-10 cm instead of less
than 5 mm • Microphone directivity diagrams are not ideal
(especially at high frequencies)
![Page 86: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/86.jpg)
Altering directivity • In the first-order case we can construct a tf function
matrix G to perform beamforming so that the new signals X’,Y’,Z’,W’ will exhibit the desired polar diagram
![Page 87: 3D Audio: pt3](https://reader031.fdocuments.net/reader031/viewer/2022021416/5868d4b71a28ab27408c1d42/html5/thumbnails/87.jpg)
Altering directivity • Correction of
tetrahedron response
G1(z) for the omnidirectional signal W G2(z) for the figure eight signals X,Y,Z