CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

54
Audio Networking Technical challenges and possibilities for distributed digital sound production at Otago CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago

Transcript of CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Page 1: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Audio NetworkingTechnical challenges and possibilities for distributed digital sound production at Otago

CSIS Seminar series, July 2010

Chris Edwards

Department of Information ScienceUniversity of Otago

Page 2: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Motivation and Background

Music Department’s new $1M SSL mixing console

New Zealand Music Industry Centre (NZMiC)

KAREN high-capacity network connectivity

Interesting technical and creative possibilities:

Remote (live) mixing

Remote recording (live or layered multi-track)

Distributed real-time performance (live and recorded)

Internet broadcast/multicast/streaming

Asynchronous production tasks, e.g. (re)mixing, mastering, film score composition, with very short turnaround

Page 3: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

The SSL Console

Solid State Logic model C200 HD

Digital control surface, array of common per-channel controls

Signal level metering

Transport control, timecode display

Full automation (programmable fader (etc.) motion)

DAW control (mouse, keyboard, display)

64 dedicated control strips, pageable for even more

(and it can play “Pong”!)

Page 4: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Not just a pretty face:Behind the console

Dual C-SB Stageboxes

48 high-quality mic inputs each

Gain and pre-amp behaviour remotely controllable

~2 km reach over single-mode optical fibre

Portable; plans for Marama Hall, Stadium, Town Hall

Centuri core

Signal routing and control surface I/O

Storage

DSP modules

Outboard hardware effects processors

DAW (a Power Macintosh with MADI card)

KAREN connectivity

Page 5: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Albany Street Installation

SSL Centuri

core

C-SB Stage Unit SSL DSP

Units

SSL Console

C-SB Stage Unit

Network

optical fibre(2 km reach)

DAW(Mac)

48 mic input channels per stage unit

MADI(Multi-Channel Audio Digital Interface)

O/B EffectsUnits

Page 6: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

KAREN

Kiwi Advanced Research and Education Network

Operated by REANNZ (Research and Education Advanced Network New Zealand), Ltd. (Crown-owned company)

10 Gb/s generally available between participating institutions

16 national points-of-presence (PoPs)

International links to Australia, North America

and, via these, to Asia and Europe

Page 7: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

KAREN NZ Network Map

Source: http://www.karen.net.nz/topology/

Page 8: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Digital Audio BasicsDiscretised, quantised representation of continuous analog signal

Signal is represented as a stream of numbers

Driven by hardware clock (typically a crystal oscillator)

One sample recorded/played every clock cycle

Fs is sampling frequency

Typ. transmitted digitally using PCM (pulse-code modulation)

Am

plit

ude

Time

Page 9: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Digital Audio Basics (2)

Typical audio sampling frequencies are 10s–100s of kHz

Human hearing tops out around 16–20 kHz

Nyquist limit (highest reproducible frequency) is Fs/2

Fs {∊ 44.1 kHz (CD-DA), 48 kHz, 96 kHz, 192 kHz}

Higher Fs means more resampling options, better resampling quality

Higher Fs also makes life easier for the low-pass filters

Oversampling is also commonly used

24-bit signed integer precision common in studio work

~120 dB theoretical dynamic range; limited by analog noise floor in practice

Page 10: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Different ways of being wrong:Quantisation error, jitter et al .

Use of quantisation means approximation, error

Amplitude (bits per sample)

Time (sampling frequency)

More bits and/or faster clock: better approximation

Often a worthwhile trade-off (no generation loss, DSP, transmission)

Hardware clocks are not perfectly stable

Non-uniformity classified as follows:

• Jitter (short-term variations: cycle-to-cycle)

• Wander (medium-term)

• Drift (long-term)

Measurement: eye diagrams, Allan variance

A stable clock is not necessarily an accurate clock!

Page 11: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Transmitting Digital Audio(within the studio)

Example protocols:

AES3 (AES/EBU)

MADI

ADAT Optical Interface

S/PDIF

Local digital audio transmission is generally synchronous.

Must avoid clock drift to avoid buffer over-/under-runs

Page 12: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

A word about word clock

Synchronous operation means having a common reference clock

Word clock is a dedicated digital signal operating at Fs

Word clock ≠ timecode, it’s a frequency reference

Master clock signal fanned out to slave devices via dedicated co-axial cable

Clock can also be sent as part of audio data

In-band bit clock (“self-clocking” signal)

Used in AES3, ADAT, S/PDIF

Typically uses bi-phase mark coding or similar

Both assume complete control of physical medium.

Digital audio is surprisingly demanding on clock quality.

Page 13: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Transmitting Digital Audio(beyond the studio)

“Pro” networked-audio protocols generally operate at OSI Layer 2

Data Link layer, i.e. not routeable, local area only

Ethernet is popular, unsurprisingly

Examples:

• AVB (IEEE 1722)

• AES47 (IEC 62365, AES3 over ATM)

• AES51 (AES3 over ATM over Ethernet)

• CobraNet

• EtherSound

Some OSI Layer 3/4 protocols do exist:

NetJACK (Open Source)

Livewire

•TCP

•UDP

4.Transport

•IP3.

Network

•Ethernet

2.Data Link

•UTP

1.Physical

Page 14: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

(Some) challenges for distributed digital audio production

1. Audio hardware clock synchronisation

2. Audio data delivery (network service quality)

Network capacity (“bandwidth”)

Latency (packet delivery time, i.e. delay)

Trade-offs between these

Quality of Service (QoS) assurance (per-packet priority)

Network out[r]ages

3. Timecode and transport control

4. Interoperability in general

Protocols

Framing

Data representation and encoding

Page 15: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Challenge 1: Hardware Clock Drift

Unsynchronised audio hardware clocks will drift

Drifting too far will lead to buffer over-/under-runs

Unacceptable audio glitches (drop-outs, pops/clicks)

Word-clock operates at physical level

Running co-ax to Auckland, London, Seattle not feasible!

Page 16: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Hardware clock synchronisation:Some possible solutions

Discipline the audio clock using a common external source

Internet Network Time Protocol (NTP) (Mills, 1980s–)

Timecode embedded in application-level packets

GPS PPS timekeeping signal

Dynamically resample audio at each node

Solution should be low-jitter:

More than a few hundred picoseconds may be unacceptable

Jitter may manifest as white noise or more complex distortions

At best, jitter undermines SNR

Page 17: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Reasons for leaning towards GPSResampling degrades quality; avoid if possible.

Pro audio hardware generally has word-clock input anyway.

A hardware solution would be convenient.

NTP doesn’t claim especially high accuracy

Approx. ±10 ms for general use on the Internet

Personal computer hardware clocks are not especially accurate or stable

NTP is primarily concerned with absolute timekeeping.

We care more about consistent frequency.

NTP assumes symmetric network paths (not a problem for frequency reference only?)

NTP’s clock slewing behaviour might be disruptive if applied to audio AD/DA converters?

NTP experts recommend using GPS anyway! (Shalunov, 2005)

GPS is globally available and uses a dedicated radio signalling.

GPS satellite network is guaranteed to keep closely in sync; ideal for single master clock approach.

Pros and cons to be investigated further…!

Page 18: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

GPSThe Global Positioning System

Page 19: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

“Wherever you go, there you are.”—anon.

Page 20: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

The Global Positioning System (GPS)

Basically a distributed high-precision time-keeping and message broadcasting system

24 satellites (plus spares!) in medium Earth orbit (20,000 km altitude)

6 orbital planes with 4 satellites each

4 must be “visible” to receiver to get precise position.

True position of each satellite is known/predictable (the ephemeris).

Satellites broadcast time-stamped messages.

Delay in receiving timestamped message determines distance from satellite.

Intersection of distances pinpoints location in space

GPS is also used to help other satellites know where (and when) they are.

Page 21: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

GPS uses distance (from time) rather than direction:

Receiver uses delay in receiving each message to calculate distance to the satellite that sent it.

Requires very precise timekeeping, as messages travel at/near light speed.

Relativistic effects must be accounted for!

1D position (i.e. on a line) requires two distance measurements.

2D (on a plane) requires three distance measurements (circles).

3D (in space) requires four distance measurements (spheres).

Earth’s sphere could be used to provide the fourth distance (provided you are on the surface).

Would still require four readings for altitude.

(essential if flying or in space)

Using four measurements improves accuracy as well.

How GPS location works

Page 22: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.
Page 23: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Satellite 1 Satellite 2

You Are Here

r1r2

GPS in one dimension

•Satellite positions are known•Messages are time-stamped, so time of sending is known•Delay in receiving message can be measured•Distance is proportional to delay•Intersection of distances determines actual position

Page 24: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

GPS for time-keeping

PPS (pulse per second, i.e. 1 Hz) signal available is externally on many GPS receivers.

Can be used for precise timekeeping, even in remote areas.

Once location is determined and locked in, even higher timing accuracy is possible.

Can derive higher frequencies (for word clock) using frequency synthesis.

Page 25: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Proposed Scheme

Use globally available GPS PPS signal to discipline local audio hardware clocks

Uniform frequency (not absolute time) is the critical thing.

Avoid clock drift across sites, to avoid buffering errors.

Already been done! Shera (1998):

Ham radio application, originally

Voltage-controlled crystal oscillator (VCXO)

PLL-based regulation (phase-locked control feedback loop, de Bellescize (1932))

Temperature-sensitive (even with thermostatic “oven”)

27 MHz master clock is common in multimedia systems

Because of NTSC television timings, AFAICT

Video sync input required for SSL Centuri (implications?)

Page 26: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Shera (1998): block diagram

Page 27: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

u-blox LEA-6T

GPS receiver module for precision timing applications

Position-lock for greater timekeeping accuracy

Programmable output clock pulse, 1/60 Hz to 10 MHz

High sensitivity; useable indoors

15 ns accuracy achievable

Ideally would simply connect LEA-6T clock output to audio word clock input

Page 28: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Innovative Integration PCIe-Timing card

PCIe expansion card

GPS receiver for clock discipline

Multiple programmable digital clocks

1560 kHz .. 1 GHz output

0.2 ps jitter specification

Page 29: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

How a PLL works(analogy: two cars on a race track) 1 lap = 1 clock cycle

“Master” reference car and following “slave” car

Lead or lag is phase difference

Measure once per lap or continuously

Constant phase difference means same frequency

If gaining, slow down slightly

If lagging, speed up slightly

Frequency is the derivative of phase!

Page 30: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

PLL Demo in Pure Data (if time)

Page 31: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

The Software Side(in case you don’t know JACK )☻

JACK = JACK Audio Connection Kit (Paul Davis, ~2000?)

Audio server program providing low latency and sample-accurate sync

Like an Open Source combination of ReWire (inter-process audio), ASIO (low-latency audio I/O) and VST (software plug-ins)

Provides audio routing among software clients and hardware

Clients may be ordinary processes or in-process plug-ins

Originated on Linux, now also runs on Mac OS X, Windows, BSDs

Also can provide network transport over IP (NetJACK)!

Probably an ideal platform for research software development

Page 32: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.
Page 33: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

JACK details (1)Runs at real-time priority where possible

No additional latency due to JACK itself

mmap()s to system audio buffers

Provides a high-level audio API

Client software requires no audio hardware access code

Various audio back-ends: ALSA, FFADO, Core Audio, PortAudio, etc.

Enables rapid development and portability of audio apps

Client connects to server, registers audio input/output port(s)

Registered clients have process() callback invoked on demand by JACK server

Synchronous execution of all clients

Supports MIDI data streams too; may support video etc. in future

Page 34: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

JACK details (2)

All audio data represented uniformly as 32-bit IEEE floating-point, normalised to -1.0..+1.0

Provides global transport control and timecode

No multiplexing/interleaving (e.g. stereo, 5.1, etc.) at the JACK level

One port: one channel

Use whatever channel configuration you need

Buffer over-/under-runs (“xruns”) detected and reported by JACK server

Server can disconnect misbehaving clients

Page 35: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

JACK details (3)

Audio processing driven by audio hardware

Hardware buffer typically divided in two (double-buffering):

• Software reads from one buffer, writes to the other

One interrupt period to receive input

Two interrupt periods to process and deliver (input and output)

Example timing: 256 frames/period × 2 periods/buffer @ 96 kHz:

(1 frame is all samples across channels taken at one sampling interval)

375 Hz interrupt rate

~5 ms “through” latency

Comparable to sound delay from monitor speakers

Page 36: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Buffer (frames/period × nperiods)

JACK buffer management

Period 1512 frames

Period 2512 frames

Audio Hardware

Software

Period 1512 frames

Page 37: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

JACK: Implementation Challenges

(Hard) real-time processing requirements

Also, want non-root users to be able to run JACK and clients

May have only hundreds of microseconds to run all client process() callbacks

Overhead of context switches (e.g. CPU cache invalidation) is significant!

Linux signals proved too slow to be used for JACK IPC.

Current design uses FIFOs.

Client callbacks must of course be RT-safe.

Recording/streaming software must do I/O!

Page 38: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

NetJACK

Networking extension to JACK

Technically just another audio back-end

Allows multiple JACK instances to communicate via UDP/IP

Remote (slave) JACK instances run inside the master JACK loop

BUT!: slave instances are generally deaf and mute

No audio clock available; driven by reception of network packets instead

Processing only; no audio I/O (DSP farming)

However:

• Sample-rate conversion exists in code-base for local audio I/O

• CELT lossy codec with packet-loss concealment also available

Might be suitable for use/adaptation for distributed studio work

Large buffer period sizes to handle latency (4096 frames for 96 kHz within NZ?)

Page 39: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

NetJACK: Possible modifications

Allow normal audio I/O on NetJACK slave instances

No resampling, so no loss of quality

Could be feasible if hardware clock synch scheme works

Would it require/experience some extra buffering?

• Jitter buffer

• I/O still triggered by audio hardware

Facility to measure and record network latencies

(Local) JACK already accounts for latency throughout the call graph

JACK transport pre-roll can compensate for playback latency

Page 40: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Challenge 2: Data Delivery Quality

Long-range Internet transport is highly variable:

Non-uniform delivery time of packets

Variable bandwidth available

Congestion, traffic-shaping, etc.

Live audio data must be delivered as fast as possible

Buffering generally increases throughput, robustness and jitter-immunity at the expense of latency

Page 41: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Network performance on KARENKAREN should provide a good starting point for feasibility studies

Bandwidth aplenty:

Up to 10 Gb/s generally available

• ~10,000 × typical home DSL

Typically under 5% utilisation

Audio: ~600 raw Mb/s for 96 32-bit audio channels at 192 kHz

Whole-session transfer in < 10 s (in theory)

• 4 minutes × 24 tracks of 24-bit audio @ 96 kHz ≈ 4 GB

• Endpoint disk I/O is probably the bottleneck in practice

Interestingly, no QoS facilities

Page 42: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Latency is the big problem

Audio signals must be kept within ~15 ms to seem musically simultaneous

Acoustic and electromagnetic signal propagation is not instantaneous

~3 ms/m for sound waves in air (~330 m/s)

Light (fibre-optic) and electrical signal propagation is typically around 0.7c

~5 ms/1000 km

20–30 ms RTT (round-trip time) observed between Otago and Auckland via KAREN (so 10–15 ms each way)

Worst-case latency is really the important case

Page 43: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

The Latency Problem

Drums @ Dunedin

Guitar @ Auckland

15 ms delay

15 ms delay

1. Drum part provides reference timing

2. Guitar plays in sync with heard drum sound

3. Guitar part sounds late by 30 ms

Page 44: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

“I canna change the Laws of Physics”

Page 45: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Observed latencies to international locations via KAREN( s o u r c e : h t t p s : / / k m e a s u r e . k a r e n . a c . n z / c g i - b i n / s m o k e p i n g . c g i ? t a r g e t = I N T E R N A T I O N A L _ L O C A T I O N S )

Sydney: ~40 ms RTT

Perth: ~80 ms RTT

Seattle: ~160 ms RTT

North America generally: 200..300 ms RTT

Asia: 300..500 ms RTT

Europe: 300..400 ms RTT

Note: these are averages (“show me the histograms!”)

Page 46: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Equivalent Approx. Distances(cf . propagation of sound waves in air)

Dunedin to Auckland: 5 m

Dunedin to Sydney: 10 m

Dunedin to Seattle: 30 m

Dunedin to Europe: 60 m

d = v / t

Page 47: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

International latencies, in musical terms

At 120 BPM tempo (e=120):

2 beats/s

Asia round-trip ≈ e

North America round-trip ≈ r

Australia round-trip ≈ y

Network latency will be a problem for certain applications.

Page 48: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Acoustic demonstrations of delay

Phasing (comb filtering) 0.02..15 ms

Stereophonic (Haas effect) shifts ~10-50 ms

Distinct echoes ~50+ ms

Page 49: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Synchronisation vs. delay

Synchronisation and delay are two different problems.

For some applications, delay is largely irrelevant

e.g. mixing a band from 20 m away can still be done

Synchronisation, however, is generally critical

esp. if the same audio is split across multiple paths and recombined

• comb filtering, changes in comb filtering

Page 50: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

What Might Be Feasible?

Mixing can be considered part of a live performance, but latency requirement is less stringent

Remote recording is one-directional; high latency is quite acceptable. Internet streaming ditto.

Pre-scored performance is easier than fully live

E.g. Sibelius score, sequenced backing, metronome/click-track

Pre-roll to compensate for latency

Layered multi-track recording generally doable

Latency requirements can be relaxed considerably under certain conditions:

In particular, if nodes don’t need to hear all other nodes

• Acyclic audio processing graph

Sync is more important than absolute delay in many situations

Better read up on some graph theory...!

Page 51: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

For further investigation

Determine required audio hardware clock quality (jitter, drift, etc.)

Trial the GPS hardware clock sync idea

Is variable satellite visibility a problem?

Test feasibility of NTP for hardware clock sync

Determine latency requirements for potential applications

Develop/co-opt network analysis framework for distributed studio

Delve into the JACK code (ZOMG! Real-time C code!)

Investigate network tuning parameters

Investigate use of the Internet for longer-haul transport

Page 52: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

Moving to the Internet

KAREN provides many benefits over a normal consumer Internet connection

Long-haul Internet would mean significantly lower connection quality (bandwidth, latency, packet jitter, reliability)

Potential hassles:

QoS and traffic-shaping

Firewalls and NAT

CELT for lower data rate and concealment of packet loss?

Only if necessary (it is lossy)

Page 53: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

References and further reading Stereophile magazine article on digital audio clock jitter

http://www.stereophile.com/reference/193jitter/

Sound on Sound article on digital studio clockshttp://www.soundonsound.com/sos/jun10/articles/masterclocks.htm

Brooks Shera's GPS-Controlled Frequency Standardhttp://www.rt66.com/~shera/index_fs.htm

Phase-Locked Loop (PLL) overviewhttp://en.wikipedia.org/wiki/Phase-locked_loop

NTP overviewhttp://www.eecis.udel.edu/~mills/exec.html

Shalunov, 2005: NTP Cookbookhttp://www.internet2.edu/workshops/npw/binder-docs/ntp-cookbook.pdf

NTP RFC documenthttp://www.eecis.udel.edu/~mills/database/rfc/rfc1059.txt

NetJACK2 architectural overviewhttp://trac.jackaudio.org/wiki/WalkThrough/User/NetJack2

KAREN timing statisticshttps://kmeasure.karen.ac.nz/cgi-bin/smokeping.cgi?target=INTERNATIONAL_LOCATIONS

Allan clock variance measurementhttp://en.wikipedia.org/wiki/Allan_variance

To find this document, go to:http://eprints.otago.ac.nz/

Page 54: CSIS Seminar series, July 2010 Chris Edwards Department of Information Science University of Otago.

?Questions?

Suggestions:

What about Skype? How does it manage?

What about MIDI?

How do musicians deal with latency normally?