CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Audio NetworkingTechnical challenges and possibilities for distributed digital sound production at Otago

CSIS Seminar series, July 2010

Chris Edwards

Department of Information ScienceUniversity of Otago

Motivation and Background

Music Department’s new $1M SSL mixing console

New Zealand Music Industry Centre (NZMiC)

KAREN high-capacity network connectivity

Interesting technical and creative possibilities:

Remote (live) mixing

Remote recording (live or layered multi-track)

Distributed real-time performance (live and recorded)

Internet broadcast/multicast/streaming

Asynchronous production tasks, e.g. (re)mixing, mastering, film score composition, with very short turnaround

The SSL Console

Solid State Logic model C200 HD

Digital control surface, array of common per-channel controls

Signal level metering

Transport control, timecode display

Full automation (programmable fader (etc.) motion)

DAW control (mouse, keyboard, display)

64 dedicated control strips, pageable for even more

(and it can play “Pong”!)

Not just a pretty face:Behind the console

Dual C-SB Stageboxes

48 high-quality mic inputs each

Gain and pre-amp behaviour remotely controllable

~2 km reach over single-mode optical fibre

Portable; plans for Marama Hall, Stadium, Town Hall

Centuri core

Signal routing and control surface I/O

Storage

DSP modules

Outboard hardware effects processors

DAW (a Power Macintosh with MADI card)

KAREN connectivity

Albany Street Installation

SSL Centuri

core

C-SB Stage Unit SSL DSP

Units

SSL Console

C-SB Stage Unit

Network

optical fibre(2 km reach)

DAW(Mac)

48 mic input channels per stage unit

MADI(Multi-Channel Audio Digital Interface)

O/B EffectsUnits

KAREN

Kiwi Advanced Research and Education Network

Operated by REANNZ (Research and Education Advanced Network New Zealand), Ltd. (Crown-owned company)

10 Gb/s generally available between participating institutions

16 national points-of-presence (PoPs)

International links to Australia, North America

and, via these, to Asia and Europe

KAREN NZ Network Map

Source: http://www.karen.net.nz/topology/

Digital Audio BasicsDiscretised, quantised representation of continuous analog signal

Signal is represented as a stream of numbers

Driven by hardware clock (typically a crystal oscillator)

One sample recorded/played every clock cycle

Fs is sampling frequency

Typ. transmitted digitally using PCM (pulse-code modulation)

Am

plit

ude

Time

Digital Audio Basics (2)

Typical audio sampling frequencies are 10s–100s of kHz

Human hearing tops out around 16–20 kHz

Nyquist limit (highest reproducible frequency) is Fs/2

Fs {∊ 44.1 kHz (CD-DA), 48 kHz, 96 kHz, 192 kHz}

Higher Fs means more resampling options, better resampling quality

Higher Fs also makes life easier for the low-pass filters

Oversampling is also commonly used

24-bit signed integer precision common in studio work

~120 dB theoretical dynamic range; limited by analog noise floor in practice

Different ways of being wrong:Quantisation error, jitter et al .

Use of quantisation means approximation, error

Amplitude (bits per sample)

Time (sampling frequency)

More bits and/or faster clock: better approximation

Often a worthwhile trade-off (no generation loss, DSP, transmission)

Hardware clocks are not perfectly stable

Non-uniformity classified as follows:

• Jitter (short-term variations: cycle-to-cycle)

• Wander (medium-term)

• Drift (long-term)

Measurement: eye diagrams, Allan variance

A stable clock is not necessarily an accurate clock!

Transmitting Digital Audio(within the studio)

Example protocols:

AES3 (AES/EBU)

MADI

ADAT Optical Interface

S/PDIF

Local digital audio transmission is generally synchronous.

Must avoid clock drift to avoid buffer over-/under-runs

A word about word clock

Synchronous operation means having a common reference clock

Word clock is a dedicated digital signal operating at Fs

Word clock ≠ timecode, it’s a frequency reference

Master clock signal fanned out to slave devices via dedicated co-axial cable

Clock can also be sent as part of audio data

In-band bit clock (“self-clocking” signal)

Used in AES3, ADAT, S/PDIF

Typically uses bi-phase mark coding or similar

Both assume complete control of physical medium.

Digital audio is surprisingly demanding on clock quality.

Transmitting Digital Audio(beyond the studio)

“Pro” networked-audio protocols generally operate at OSI Layer 2

Data Link layer, i.e. not routeable, local area only

Ethernet is popular, unsurprisingly

Examples:

• AVB (IEEE 1722)

• AES47 (IEC 62365, AES3 over ATM)

• AES51 (AES3 over ATM over Ethernet)

• CobraNet

• EtherSound

Some OSI Layer 3/4 protocols do exist:

NetJACK (Open Source)

Livewire

•TCP

•UDP

4.Transport

•IP3.

Network

•Ethernet

2.Data Link

•UTP

1.Physical

(Some) challenges for distributed digital audio production

1. Audio hardware clock synchronisation

2. Audio data delivery (network service quality)

Network capacity (“bandwidth”)

Latency (packet delivery time, i.e. delay)

Trade-offs between these

Quality of Service (QoS) assurance (per-packet priority)

Network out[r]ages

3. Timecode and transport control

4. Interoperability in general

Protocols

Framing

Data representation and encoding

Challenge 1: Hardware Clock Drift

Unsynchronised audio hardware clocks will drift

Drifting too far will lead to buffer over-/under-runs

Unacceptable audio glitches (drop-outs, pops/clicks)

Word-clock operates at physical level

Running co-ax to Auckland, London, Seattle not feasible!

Hardware clock synchronisation:Some possible solutions

Discipline the audio clock using a common external source

Internet Network Time Protocol (NTP) (Mills, 1980s–)

Timecode embedded in application-level packets

GPS PPS timekeeping signal

Dynamically resample audio at each node

Solution should be low-jitter:

More than a few hundred picoseconds may be unacceptable

Jitter may manifest as white noise or more complex distortions

At best, jitter undermines SNR

Reasons for leaning towards GPSResampling degrades quality; avoid if possible.

Pro audio hardware generally has word-clock input anyway.

A hardware solution would be convenient.

NTP doesn’t claim especially high accuracy

Approx. ±10 ms for general use on the Internet

Personal computer hardware clocks are not especially accurate or stable

NTP is primarily concerned with absolute timekeeping.

We care more about consistent frequency.

NTP assumes symmetric network paths (not a problem for frequency reference only?)

NTP’s clock slewing behaviour might be disruptive if applied to audio AD/DA converters?

NTP experts recommend using GPS anyway! (Shalunov, 2005)

GPS is globally available and uses a dedicated radio signalling.

GPS satellite network is guaranteed to keep closely in sync; ideal for single master clock approach.

Pros and cons to be investigated further…!

GPSThe Global Positioning System

“Wherever you go, there you are.”—anon.

The Global Positioning System (GPS)

Basically a distributed high-precision time-keeping and message broadcasting system

24 satellites (plus spares!) in medium Earth orbit (20,000 km altitude)

6 orbital planes with 4 satellites each

4 must be “visible” to receiver to get precise position.

True position of each satellite is known/predictable (the ephemeris).

Satellites broadcast time-stamped messages.

Delay in receiving timestamped message determines distance from satellite.

Intersection of distances pinpoints location in space

GPS is also used to help other satellites know where (and when) they are.

GPS uses distance (from time) rather than direction:

Receiver uses delay in receiving each message to calculate distance to the satellite that sent it.

Requires very precise timekeeping, as messages travel at/near light speed.

Relativistic effects must be accounted for!

1D position (i.e. on a line) requires two distance measurements.

2D (on a plane) requires three distance measurements (circles).

3D (in space) requires four distance measurements (spheres).

Earth’s sphere could be used to provide the fourth distance (provided you are on the surface).

Would still require four readings for altitude.

(essential if flying or in space)

Using four measurements improves accuracy as well.

How GPS location works

Satellite 1 Satellite 2

You Are Here

r1r2

GPS in one dimension

•Satellite positions are known•Messages are time-stamped, so time of sending is known•Delay in receiving message can be measured•Distance is proportional to delay•Intersection of distances determines actual position

GPS for time-keeping

PPS (pulse per second, i.e. 1 Hz) signal available is externally on many GPS receivers.

Can be used for precise timekeeping, even in remote areas.

Once location is determined and locked in, even higher timing accuracy is possible.

Can derive higher frequencies (for word clock) using frequency synthesis.

Proposed Scheme

Use globally available GPS PPS signal to discipline local audio hardware clocks

Uniform frequency (not absolute time) is the critical thing.

Avoid clock drift across sites, to avoid buffering errors.

Already been done! Shera (1998):

Ham radio application, originally

Voltage-controlled crystal oscillator (VCXO)

PLL-based regulation (phase-locked control feedback loop, de Bellescize (1932))

Temperature-sensitive (even with thermostatic “oven”)

27 MHz master clock is common in multimedia systems

Because of NTSC television timings, AFAICT

Video sync input required for SSL Centuri (implications?)

Shera (1998): block diagram

u-blox LEA-6T

GPS receiver module for precision timing applications

Position-lock for greater timekeeping accuracy

Programmable output clock pulse, 1/60 Hz to 10 MHz

High sensitivity; useable indoors

15 ns accuracy achievable

Ideally would simply connect LEA-6T clock output to audio word clock input

Innovative Integration PCIe-Timing card

PCIe expansion card

GPS receiver for clock discipline

Multiple programmable digital clocks

1560 kHz .. 1 GHz output

0.2 ps jitter specification

How a PLL works(analogy: two cars on a race track) 1 lap = 1 clock cycle

“Master” reference car and following “slave” car

Lead or lag is phase difference

Measure once per lap or continuously

Constant phase difference means same frequency

If gaining, slow down slightly

If lagging, speed up slightly

Frequency is the derivative of phase!

PLL Demo in Pure Data (if time)

The Software Side(in case you don’t know JACK )☻

JACK = JACK Audio Connection Kit (Paul Davis, ~2000?)

Audio server program providing low latency and sample-accurate sync

Like an Open Source combination of ReWire (inter-process audio), ASIO (low-latency audio I/O) and VST (software plug-ins)

Provides audio routing among software clients and hardware

Clients may be ordinary processes or in-process plug-ins

Originated on Linux, now also runs on Mac OS X, Windows, BSDs

Also can provide network transport over IP (NetJACK)!

Probably an ideal platform for research software development

JACK details (1)Runs at real-time priority where possible

No additional latency due to JACK itself

mmap()s to system audio buffers

Provides a high-level audio API

Client software requires no audio hardware access code

Various audio back-ends: ALSA, FFADO, Core Audio, PortAudio, etc.

Enables rapid development and portability of audio apps

Client connects to server, registers audio input/output port(s)

Registered clients have process() callback invoked on demand by JACK server

Synchronous execution of all clients

Supports MIDI data streams too; may support video etc. in future

JACK details (2)

All audio data represented uniformly as 32-bit IEEE floating-point, normalised to -1.0..+1.0

Provides global transport control and timecode

No multiplexing/interleaving (e.g. stereo, 5.1, etc.) at the JACK level

One port: one channel

Use whatever channel configuration you need

Buffer over-/under-runs (“xruns”) detected and reported by JACK server

Server can disconnect misbehaving clients

JACK details (3)

Audio processing driven by audio hardware

Hardware buffer typically divided in two (double-buffering):

• Software reads from one buffer, writes to the other

One interrupt period to receive input

Two interrupt periods to process and deliver (input and output)

Example timing: 256 frames/period × 2 periods/buffer @ 96 kHz:

(1 frame is all samples across channels taken at one sampling interval)

375 Hz interrupt rate

~5 ms “through” latency

Comparable to sound delay from monitor speakers

Buffer (frames/period × nperiods)

JACK buffer management

Period 1512 frames

Period 2512 frames

Audio Hardware

Software

Period 1512 frames

JACK: Implementation Challenges

(Hard) real-time processing requirements

Also, want non-root users to be able to run JACK and clients

May have only hundreds of microseconds to run all client process() callbacks

Overhead of context switches (e.g. CPU cache invalidation) is significant!

Linux signals proved too slow to be used for JACK IPC.

Current design uses FIFOs.

Client callbacks must of course be RT-safe.

Recording/streaming software must do I/O!

NetJACK

Networking extension to JACK

Technically just another audio back-end

Allows multiple JACK instances to communicate via UDP/IP

Remote (slave) JACK instances run inside the master JACK loop

BUT!: slave instances are generally deaf and mute

No audio clock available; driven by reception of network packets instead

Processing only; no audio I/O (DSP farming)

However:

• Sample-rate conversion exists in code-base for local audio I/O

• CELT lossy codec with packet-loss concealment also available

Might be suitable for use/adaptation for distributed studio work

Large buffer period sizes to handle latency (4096 frames for 96 kHz within NZ?)

NetJACK: Possible modifications

Allow normal audio I/O on NetJACK slave instances

No resampling, so no loss of quality

Could be feasible if hardware clock synch scheme works

Would it require/experience some extra buffering?

• Jitter buffer

• I/O still triggered by audio hardware

Facility to measure and record network latencies

(Local) JACK already accounts for latency throughout the call graph

JACK transport pre-roll can compensate for playback latency

Challenge 2: Data Delivery Quality

Long-range Internet transport is highly variable:

Non-uniform delivery time of packets

Variable bandwidth available

Congestion, traffic-shaping, etc.

Live audio data must be delivered as fast as possible

Buffering generally increases throughput, robustness and jitter-immunity at the expense of latency

Network performance on KARENKAREN should provide a good starting point for feasibility studies

Bandwidth aplenty:

Up to 10 Gb/s generally available

• ~10,000 × typical home DSL

Typically under 5% utilisation

Audio: ~600 raw Mb/s for 96 32-bit audio channels at 192 kHz

Whole-session transfer in < 10 s (in theory)

• 4 minutes × 24 tracks of 24-bit audio @ 96 kHz ≈ 4 GB

• Endpoint disk I/O is probably the bottleneck in practice

Interestingly, no QoS facilities

Latency is the big problem

Audio signals must be kept within ~15 ms to seem musically simultaneous

Acoustic and electromagnetic signal propagation is not instantaneous

~3 ms/m for sound waves in air (~330 m/s)

Light (fibre-optic) and electrical signal propagation is typically around 0.7c

~5 ms/1000 km

20–30 ms RTT (round-trip time) observed between Otago and Auckland via KAREN (so 10–15 ms each way)

Worst-case latency is really the important case

The Latency Problem

Drums @ Dunedin

Guitar @ Auckland

15 ms delay

15 ms delay

1. Drum part provides reference timing

2. Guitar plays in sync with heard drum sound

3. Guitar part sounds late by 30 ms

“I canna change the Laws of Physics”

Observed latencies to international locations via KAREN( s o u r c e : h t t p s : / / k m e a s u r e . k a r e n . a c . n z / c g i - b i n / s m o k e p i n g . c g i ? t a r g e t = I N T E R N A T I O N A L _ L O C A T I O N S )

Sydney: ~40 ms RTT

Perth: ~80 ms RTT

Seattle: ~160 ms RTT

North America generally: 200..300 ms RTT

Asia: 300..500 ms RTT

Europe: 300..400 ms RTT

Note: these are averages (“show me the histograms!”)

Equivalent Approx. Distances(cf . propagation of sound waves in air)

Dunedin to Auckland: 5 m

Dunedin to Sydney: 10 m

Dunedin to Seattle: 30 m

Dunedin to Europe: 60 m

d = v / t

International latencies, in musical terms

At 120 BPM tempo (e=120):

2 beats/s

Asia round-trip ≈ e

North America round-trip ≈ r

Australia round-trip ≈ y

Network latency will be a problem for certain applications.

Acoustic demonstrations of delay

Phasing (comb filtering) 0.02..15 ms

Stereophonic (Haas effect) shifts ~10-50 ms

Distinct echoes ~50+ ms

Synchronisation vs. delay

Synchronisation and delay are two different problems.

For some applications, delay is largely irrelevant

e.g. mixing a band from 20 m away can still be done

Synchronisation, however, is generally critical

esp. if the same audio is split across multiple paths and recombined

• comb filtering, changes in comb filtering

What Might Be Feasible?

Mixing can be considered part of a live performance, but latency requirement is less stringent

Remote recording is one-directional; high latency is quite acceptable. Internet streaming ditto.

Pre-scored performance is easier than fully live

E.g. Sibelius score, sequenced backing, metronome/click-track

Pre-roll to compensate for latency

Layered multi-track recording generally doable

Latency requirements can be relaxed considerably under certain conditions:

In particular, if nodes don’t need to hear all other nodes

• Acyclic audio processing graph

Sync is more important than absolute delay in many situations

Better read up on some graph theory...!

For further investigation

Determine required audio hardware clock quality (jitter, drift, etc.)

Trial the GPS hardware clock sync idea

Is variable satellite visibility a problem?

Test feasibility of NTP for hardware clock sync

Determine latency requirements for potential applications

Develop/co-opt network analysis framework for distributed studio

Delve into the JACK code (ZOMG! Real-time C code!)

Investigate network tuning parameters

Investigate use of the Internet for longer-haul transport

Moving to the Internet

KAREN provides many benefits over a normal consumer Internet connection

Long-haul Internet would mean significantly lower connection quality (bandwidth, latency, packet jitter, reliability)

Potential hassles:

QoS and traffic-shaping

Firewalls and NAT

CELT for lower data rate and concealment of packet loss?

Only if necessary (it is lossy)

References and further reading Stereophile magazine article on digital audio clock jitter

http://www.stereophile.com/reference/193jitter/

Sound on Sound article on digital studio clockshttp://www.soundonsound.com/sos/jun10/articles/masterclocks.htm

Brooks Shera's GPS-Controlled Frequency Standardhttp://www.rt66.com/~shera/index_fs.htm

Phase-Locked Loop (PLL) overviewhttp://en.wikipedia.org/wiki/Phase-locked_loop

NTP overviewhttp://www.eecis.udel.edu/~mills/exec.html

Shalunov, 2005: NTP Cookbookhttp://www.internet2.edu/workshops/npw/binder-docs/ntp-cookbook.pdf

NTP RFC documenthttp://www.eecis.udel.edu/~mills/database/rfc/rfc1059.txt

NetJACK2 architectural overviewhttp://trac.jackaudio.org/wiki/WalkThrough/User/NetJack2

KAREN timing statisticshttps://kmeasure.karen.ac.nz/cgi-bin/smokeping.cgi?target=INTERNATIONAL_LOCATIONS

Allan clock variance measurementhttp://en.wikipedia.org/wiki/Allan_variance

To find this document, go to:http://eprints.otago.ac.nz/

http://www.stereophile.com/reference/193jitter/

http://www.soundonsound.com/sos/jun10/articles/masterclocks.htm

http://www.rt66.com/~shera/index_fs.htm

http://en.wikipedia.org/wiki/Phase-locked_loop

http://www.eecis.udel.edu/~mills/exec.html

http://www.internet2.edu/workshops/npw/binder-docs/ntp-cookbook.pdf

http://www.eecis.udel.edu/~mills/database/rfc/rfc1059.txt

http://trac.jackaudio.org/wiki/WalkThrough/User/NetJack2

https://kmeasure.karen.ac.nz/cgi-bin/smokeping.cgi?target=INTERNATIONAL_LOCATIONS

https://kmeasure.karen.ac.nz/cgi-bin/smokeping.cgi?target=INTERNATIONAL_LOCATIONS

http://en.wikipedia.org/wiki/Allan_variance

http://eprints.otago.ac.nz/

?Questions?

Suggestions:

What about Skype? How does it manage?

What about MIDI?

How do musicians deal with latency normally?

CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Documents

Transcript of CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.