Pitch Recognition with Wavelets 1.130 Final Presentation by Stephen Geiger.

34
Pitch Recognition with Wavelets 1.130 Final Presentation by Stephen Geiger

Transcript of Pitch Recognition with Wavelets 1.130 Final Presentation by Stephen Geiger.

Pitch Recognition with Wavelets

1.130 Final Presentation

by Stephen Geiger

What is pitch recognition?

Well, what is pitch? . . .

How HIGH or LOW a sound is

Which note?

Perceived Frequency

Relationship Between Pitch and Frequency

Pitch Fundamental

Frequency

For Example:

For Middle C:

Frequency = 262 Hz

MATLAB CODE:fs = 22050; % Sampling Frequency.f = 262; % Fundamental Freq of Middle C. t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise!

For an A Scale:

E = 220*2^(7/12) = 330 HzF = 220*2^(8/12) = 349 HzF#= 220*2^(9/12) = 370 HzG = 220*2^(10/12)= 392 HzG = 220*2^(11/12)= 415 HzA = 220*2^(12/12)= 440 Hz

A = 220*2^(0/12)= 220 HzA#= 220*2^(1/12)= 233 HzB = 220*2^(2/12)= 247 HzC = 220*2^(3/12)= 262 HzC#= 220*2^(4/12)= 277 HzD = 220*2^(5/12)= 294 HzD#= 220*2^(6/12)= 311 Hz

An Octave Up:

For C5:

Frequency = 524 Hz

MATLAB CODE:fs = 22050; % Sampling Frequency.f = 524; % Fundamental Freq of C5.t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound(cos(2*pi*f*t)/2,fs); % Make some noise!

A Sum with 2 Frequencies:

MATLAB CODE:fs = 22050; % Sampling Frequency.f1 = 262; % Fundamental Freq of Middle C. f2 = 524; % Fundamental Freq of C5.t=0:1/(fs):1; % Time range of 0 to 1 seconds. sound((cos(2*pi*f1*t)+ . . . 0.25*cos(2*pi*f2*t))/2,fs);

Frequency = 262 Hz

and

Frequency = 524 Hz

Freq in a Piano - Middle C

Frequency, Hz

FFT of a Oboe Middle C

Frequency, Hz

Mono vs. Poly

Monophonic

one note at a time

(e.g. trumpet)

Polyphonic

multiple notes at a time

(e.g. piano, orchestra)

Creates a problem forpitch recognition.

(especially octaves!)

Some Existing Methods

Time Domain – Pitch Period estimation With wavelets. With auto-correlation function.

Freq. Domain – Find Fundamental

Auditory Scene Analysis Blackboard Systems Neural Networks Perceptual Models

What applications are there?

Transcription of Music

Modeling of Musical Instruments

Speech Analysis

Besides its an Interesting Problem

My Work . . .

A Novel Wavelet Approach

For a piano playing these notes, a CWT

could be used to identify a ‘G’

with certain scale/wavelet combinations.

Even with some polyphony !

Based on an observation made by

Jeremy Todd, that:

Finding a G in a C Scale

OriginalSignal

CWT @ Specific“Scale”

The Continuous Wavelet Transform

Definition of a CWT:

dta

bt

atfC ba

1)(,

Where: a = scaling factor b = shift factor f(t) = function we start with (t) = Mother wavelet

What is Scale?

LOW SCALECompressed Wavelet

Lots of DetailHigh Frequency

(You are here) (And here)

HIGH SCALEStretched WaveletCoarse FeaturesLow Frequency

Gaussian 2nd Order Wavelet

Initial Work

Took an empirical approach.

Ran a number of CWT’s at varying scale, and looked at the results.

Picked out a CWT scale for each note in the C scale.

Finding Notes in a C Scale

Scale: 594

530

472

446

394

722

642

606

Original

Finding Notes w/ Polyphony

Scale: 594

530

472

446

394

722

642

606

Original

More Complex Polyphony

Original

Scale: 594

530

472

446

394

722

642

606

Testing with different timbre

Scale: 594

530

472

446

394

722

642

606

Original

Why does this work?

The scale parameter

in the CWT affects

frequency response.

However, our “scales” that

work don’t seem to follow

a clear pattern.

Training Algorithm

Again, took an empirical approach.

Ran CWT’s at varying scales, on sample files containing one note.

Picked out scales, where: maximum of the CWT forone note >> other notes(and collected results).

Results of

Training Algorithm

. . .

Longer C Scale – Trained on 3 Octaves of Notes

A Fragment by Chopin**From Right Hand of Prelude in C, Op. 28 No. 1

Training on a ‘Real’ Guitar

Only able to find 5 of 8 pitches for C Scale

training case. (With limited attempt).

Results on a test file were not completely

accurate.

Expected to be a more difficult case than a

piano.

Could merit a more thorough try.

Entire 88 K on a P

Work in progress.

It takes a long time to run many

CWT’s on 88 different sound files.

Initial results able to

identify notes 70-88.

Frequency Response Revisited

Frequency Response of a 2nd Order Gaussian Wavelet

Resulting Scales for 22 Piano Notes

0

500

1000

1500

2000

2500

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

SCALE

NOTE NUMBER

Resulting Scales for 8 Sinusoidal Notes

0

2000

4000

6000

8000

10000

12000

14000

0 1 2 3 4 5 6 7 8

SCALE

NOTE NUMBER

Conclusions

The novel wavelet approach isn’t perfect.

Requiring “training” is a handicap.

Most likely not suited to sources with

varying timbre. (e.g. guitar, voice)

Some interesting results.

The mechanism of detection could be

further investigated and better understood.