Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

28
Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals

Transcript of Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Page 1: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Computer Science 121

Scientific ComputingWinter 2012Chapter 13

Sounds and Signals

Page 2: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Background: Sounds and Signals

● Recall transducer view of computer: convert

input signal into numbers.

● Signal: a quantity that changes over time– Body temperature– Air pressure (sound)– Electrical potential on skin (electrocardiogram)– Seismological disturbances

● We will study audio signals (sounds), but the same issues apply across a broad range of signal types.

Page 3: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.1 Basics of Computer Sound

>> [x, fs, bits] = wavread(‘FH.wav');>> size(x)ans = 41777 1>> [max(x) min(x)]ans = 0.9922 -1.0000>> fsfs = 11025>> bitsbits = 8>> sound(x, fs)

Page 4: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.1 Basics of Computer Sound

Page 5: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.1 Basics of Computer Sound

Page 6: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.1 Basics of Computer Sound

• x contains the sound waveform (signal) – essentially, voltage levels representing transduced air pressure on microphone.

• fs is the sampling frequency – how many time per second (Hertz, Hz), did we measure the voltage?

• bits is the number of bits used to represent each sample.

Page 7: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Questions

• Why does the sound waveform range from -1 to +1?

– These values are essentially arbitrary. One nice feature of a ±x representation is that zero means silence.

• What role does the sampling frequency play in the quality of the sound?– The more samples per second, the closer the sound is

to a “perfect” recording.

• What happens if we double (or halve) the sampling frequency at playback, and why?

• What is it about the waveform that determines the sound we're hearing (which vowel), and the speaker's voice?

Page 8: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Questions

• What is it about the waveform that determines the sound we're hearing (which vowel), and the speaker's voice?–Most of this information is encoded in the

frequencies that make up the waveform – roughly, the differences between locations of successive peaks – and not in the actual waveform values themselves.

–We can do some useful processing on the “raw” waveform, however – e.g., count syllables:

Page 9: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Syllable Counting by Smoothing and Peak-Picking

Page 10: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

function res = syllables(x, fs)% SYLLABLES(X, FS) counts syllables in speech waveform X by peak-picking% on smoothed rectified signal. FS is sampling rate.

% how much higher a peak must be than its neighborsDIFF = .001;

% size of moving-average "window" around each point, empirically determinedwinsize = fix(fs / 20);

% rectify signalx = abs(x);

% create smoothed signal from rectified

y = zeros(1, fix(length(x)/winsize));

for i = winsize:winsize:length(x)-winsize y(fix(i/winsize)) = mean(x(i-winsize+1:i+winsize));end

plot(y)hold on

% pick peaks in smoothedpeaks = find((y(2:end-1)-y(1:end-2))>DIFF & (y(2:end-1)-y(3:end))>DIFF) + 1;

plot(peaks, y(peaks), 'ro')

res = length(peaks);

Page 11: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.2 Perception and Generation of Sound

• Sound is the perception of small, rapid vibrations in air pressure on the ear.

• Simplest model of sound is a function P(t) expressing pressure P at time t:

P(t) = A sin(2πft + φ)

where A = amplitude (roughly, loudness)f = frequency (cycles per second)φ = phase (roughly, starting point)

• This is the equation for a pure musical tone (just one pitch)

Page 12: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.2 Perception and Generation of Sound

–Inverse of frequency is period (distance between peaks):

Page 13: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.2 Perception and Generation of Sound

–E.g., whistling a musical scale:

Page 14: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.2 Perception and Generation of Sound (ignore textbook)

• Most real sounds are complicated mixtures of many frequencies (no pure tones in nature).

• Still, we can learn some basic concepts by experimenting with pure tones:

>> FS = 10000; % sampling frequency

>> f = 500; % sound frequency

>> A = 1.0; % amplitude

>> t = linspace(0,1,FS);% 1 sec at 10 kHz

>> Pt = A * sin(2*pi*f*t); % ignore phase

Page 15: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.2 Perception and Generation of Sound (ignore textbook)

>> Pt = A * sin(2*pi*f*t);

>> plot(t, Pt)

>> xlim([0 .01]) % plot from 0 to .01 sec

Page 16: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

>> Pt = A * sin(2*pi*3*f*t); % k = 3

>> plot(t, Pt),xlim([0 .01])

Multiplying the frequency by k gives us k times as many cycles in the same amount of time….

Page 17: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

>> Pt = 0.5 * A * sin(2*pi*3*f*t); % half the loudness

>> plot(t, Pt), xlim([0 .01])

>> ylim([-1 1]) % keep Y axis scaling

Multiplying the amplitude by a number between 0 and 1 adjuststhe loudness (volume) of the sound:

Page 18: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.3 Synthesizing Complex Sounds (ignore textbook)

• Any sound can (in principle) be expressed as the sum of a set of pure tones of various frequencies, amplitudes, and phases.

• People are (arguably) insensitive to phase distinctions, so we will ignore phase here.

• Consider a sound containing a 500 Hz and a 1200 Hz component at half the amplitude...

Page 19: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

>> FS = 10000;>> t = linspace(0, 1, FS);>> f = 500;>> A = 1.0;>> Pt = A * sin(2*pi*f*t);>> f2 = 1200;>> A2 = 0.5;>> Pt2 = A2 * sin(2*pi*f2*t);>> Pt3 = Pt + Pt2;>> plot(t, Pt3), xlim([0 .01])

Page 20: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.3 Synthesizing Complex Sounds

• More generally, we have the formula

P(t) = Ai sin(2 π f

i t + φ

i )

i=1

• With all φi typically set to zero.

n

Page 21: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.4 Transducing and Recording Sound

• Convert sound pressure to voltage, then digitize voltage into N discrete values in interval [x

min, x

max], by

sampling at frequency Fs.

• This is done by a analog /digital converter.• Another device must pre-amplify sound to match input

expectations of a/d converter.• N is typically a power of 2, so we can use bits to

express sampling precision (minimum 8 for decent quality). This is called quantization.

• For Matlab, xmin

, = -1.0, xmax

= +1.0

• Various things can go wrong if we don't choose these values wisely....

Page 22: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Figure 13.5. A segment of the sound

“OH” transduced to voltage.

Top: The preamplifier has been set appropriately so that the analog voltage signal takes up a large fraction of the A/D voltage range. The digitized signal closely resembles the analog signal even though the A/D conversion is set to 8 bits.

Bottom: The preamplifier has been set too low. Consequently, there is effectively only about 3 bits of resolution in the digitized signal; most of the range is unused.

13.4 Transducing and Recording Sound

0 2 4 6 8 10 12 14-4

-2

0

2

4

Vol

tage

Appropriate preamplification

0 2 4 6 8 10 12 14-96

-64

-32

0

32

64

96

A/D

uni

ts

AnalogDigital

0 2 4 6 8 10 12 14

-0.1

0

0.1

Vol

tage

Time (ms)

Preamplification too low

0 2 4 6 8 10 12 14

-4

-2

0

2

4

A/D

uni

ts

Page 23: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Figure 13.6.

Clipping of a signal (right) when the preamplifier has been set too high, so that the signal is outside of the −5 to 5 V range of the A/D converter.

13.4 Transducing and Recording Sound

Page 24: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.5 Aliasing and the Sampling Frequency

• Someone has an alias when they use more than one name (representation)

• In the world of signals, this means having more than one representation of an analog signal, because of inadequate sampling frequency

• Familiar visual aliasing from the movies (when 32 frames per second is too slow)

• Wagon wheel / propeller going backwards• Scan lines appearing on computer screen

• Inadequate Fs can result in aliasing for sounds too....

Page 25: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.5 Aliasing and the Sampling Frequency

Page 26: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.5 Aliasing and the Sampling Frequency

Figure 13.8. Aliasing. A set of samples marked as circles. The three sine waves plotted are of different frequencies, but all pass through the same samples. The aliased frequencies are F +m/∆T, where m is any integer and ∆T is the sampling interval. The sine waves shown are m = 0, m = 1, and m = 2.

0 1 2 3

-1

0

1

Time (T)

Am

plitu

de

m = 0m = 1m = 2samples

Page 27: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

13.5 Aliasing and the Sampling Frequency

• Nyquist's Theorem tells us that Fs should be at least

twice the maximum frequency Fmax

we wish to reproduce.

• Intuitively, we need two values to represent a single cycle: one for peak, one for valley:

Page 28: Computer Science 121 Scientific Computing Winter 2012 Chapter 13 Sounds and Signals.

Aliasing in the Time Domain