Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine...
Transcript of Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine...
![Page 1: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/1.jpg)
Machine Learning for Signal Processing
Representing Signals: Images and Sounds
Class 4. 10 Sep 2015
Instructor: Bhiksha Raj
11-755/18-797 1
![Page 2: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/2.jpg)
Representing Data
• The first and most important step in processing signals is representing them appropriately
11-755/18-797 2
![Page 3: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/3.jpg)
Representing an Elephant • It was six men of Indostan,
To learning much inclined, Who went to see the elephant, (Though all of them were blind), That each by observation Might satisfy his mind.
• The first approached the elephant,
And happening to fall Against his broad and sturdy side, At once began to bawl: "God bless me! But the elephant Is very like a wall!“
• The second, feeling of the tusk,
Cried: "Ho! What have we here, So very round and smooth and sharp? To me 'tis very clear, This wonder of an elephant Is very like a spear!“
• The third approached the animal,
And happening to take The squirming trunk within his hands, Thus boldly up and spake: "I see," quoth he, "the elephant Is very like a snake!“
• The fourth reached out an eager hand, And felt about the knee. "What most this wondrous beast is like Is might plain," quoth he; "Tis clear enough the elephant Is very like a tree."
• The fifth, who chanced to touch the ear, Said: "E'en the blindest man Can tell what this resembles most: Deny the fact who can, This marvel of an elephant Is very like a fan.“
• The sixth no sooner had begun About the beast to grope, Than seizing on the swinging tail That fell within his scope, "I see," quoth he, "the elephant Is very like a rope.“
• And so these men of Indostan Disputed loud and long, Each in his own opinion Exceeding stiff and strong. Though each was partly right, All were in the wrong.
11-755/18-797 3
![Page 4: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/4.jpg)
Representation
• Describe these images
– Such that a listener can visualize what you are describing
• More images
11-755/18-797 4
![Page 5: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/5.jpg)
Still more images
11-755/18-797 5
How do you describe them?
![Page 6: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/6.jpg)
Representation
• Pixel-based descriptions are uninformative
• Content-based descriptions are infeasible in the general case
11-755/18-797 6
![Page 7: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/7.jpg)
Sounds
• Sounds are just sequences of numbers
• When plotted, they just look like blobs – Which leads to “natural sounds are blobs”
• Or more precisely, “sounds are sequences of numbers that, when plotted, look like blobs”
– Which wont get us anywhere
11-755/18-797 7
![Page 8: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/8.jpg)
Representation
• Representation is description
• But in compact form
• Must describe the salient characteristics of the data
– E.g. a pixel-wise description of the two images here will be completely different
• Must allow identification, comparison, storage, reconstruction..
11-755/18-797 8
A A
![Page 9: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/9.jpg)
Representing images
• The most common element in the image: background
– Or rather large regions of relatively featureless shading
– Uniform sequences of numbers
11-755/18-797 9
![Page 10: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/10.jpg)
Representing images using a “plain” image
• Most of the figure is a more-or-less uniform shade – Dumb approximation – a image is a block of uniform shade
• Will be mostly right!
• How to compute the “best” description? Projection – Represent the images as vectors and compute the projection of the
image on the “basis”
11-755/18-797 10
Image =
Npixel
pixel
pixel
.
2
1
1
.
1
1
B =
ageBBBBBWPROJECTION
ageBpinvW
ageBW
TT Im.)(
Im)(
Im
1
![Page 11: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/11.jpg)
Adding more bases
• Lets improve the approximation
• Images have some fast varying regions – Dramatic changes
– Add a second picture that has very fast changes
• A checkerboard where every other pixel is black and the rest are white
11-755/18-797 11
11
11
11
11
11
B
] [
Im
21
2
1
2211
BBBw
wW
BwBwage
B1 B2 B2 B1
Image.)(
Image)(
Image
1 TT BBBBBWPROJECTION
BpinvW
BW
![Page 12: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/12.jpg)
Adding still more bases
• Regions that change with different speeds
11-755/18-797 12
ageBpinvW
ageBW
Im)(
Im
] [
.
.
...Im
3213
2
1
332211
BBBBw
w
w
W
BwBwBwage
B1 B2 B3 B4 B5 B6
Getting closer at 625 bases!
![Page 13: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/13.jpg)
Representation using checkerboards
• A “standard” representation
– Checker boards are the same regardless of the picture you’re trying to describe
• As opposed to using “nose shape” to describe faces and “leaf colour” to describe trees.
• Any image can be specified as (for example) 0.8*checkerboard(0) + 0.2*checkerboard(1) + 0.3*checkerboard(2) ..
• The definition is sufficient to reconstruct the image to some degree
– Not perfectly though
11-755/18-797 13
![Page 14: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/14.jpg)
What about sounds?
• Square wave equivalents of checker boards
11-755/18-797 14
![Page 15: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/15.jpg)
Projecting sounds
11-755/18-797 15
SignalBpinvBBWPROJECTION
SignalBpinvW
SignalBW
)).(.(
)(
] [ 321
3
2
1
332211
BBBB
w
w
w
W
BwBwBwSignal
B1 B2 B3
3
2
1
w
w
w
=
![Page 16: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/16.jpg)
General Philosophy of Representation • Identify a set of standard structures
– E.g. checkerboards
– We will call these “bases”
• Express the data as a weighted combination of these bases
– X = w1 B1 + w2 B2 + w3 B3 + …
• Chose weights w1, w2, w3.. for the best representation of X
– I.e. the error between X and Si wi Bi is minimized
– The error is generally chosen to be ||X – Si wi Bi||2
• The weights w1, w2, w3.. fully specify the data
– Since the bases are known beforehand
– Knowing the weights is sufficient to reconstruct the data 11-755/18-797 16
![Page 17: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/17.jpg)
Bases requirements • Non-redundancy
– Each basis must represent information not already
represented by other bases
– I.e. bases must be orthogonal
• <Bi, Bj> = 0 for i != j
– Mathematical benefit: can compute wi = <Bi,X>
• Compactness
– Must be able to represent most of X with fewest bases
– Completeness: For D-dimensional data, need no more
than D bases
11-755/18-797 17
![Page 18: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/18.jpg)
Bases based representation
• Place all bases in basis matrix B
• For orthogonal bases
11-755/18-797 18
XBPinvW
XBW
)(
2||||
,
i
ii
B
XBw
3
2
1
3
2
1
X
X
X
w
w
w
333231
232221
131211
bbb
bbb
bbb
![Page 19: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/19.jpg)
Bases based representation
• Challenge: Choice of appropriate bases
11-755/18-797 19
![Page 20: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/20.jpg)
Why checkerboards are great bases
• We cannot explain one checkerboard in terms of another – The two are orthogonal to one
another!
• This means we can determine the contributions of individual bases separately – Joint decomposition with multiple
bases gives the same result as separate decomposition with each
– This never holds true if one basis can explain another
11-755/18-797 20
11
11
11
11
11
B
B1 B2
] [
Im
212
1
2211
BBBw
wW
BwBwage
2||||
Im,
i
ii
B
ageBw
![Page 21: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/21.jpg)
Checker boards are not good bases
• Sharp edges
– Can never be used to explain rounded curves
11-755/18-797 21
![Page 22: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/22.jpg)
Sinusoids ARE good bases
• They are orthogonal
• They can represent rounded shapes nicely
– Unfortunately, they cannot represent sharp corners
11-755/18-797 22
![Page 23: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/23.jpg)
What are the frequencies of the sinusoids
• Follow the same format as the checkerboard: – DC
– The entire length of the signal is one period
– The entire length of the signal is two periods.
• And so on..
• The k-th sinusoid: – F(n) = sin(2pkn/N)
• N is the length of the signal
• k is the number of periods in N samples
11-755/18-797 23
![Page 24: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/24.jpg)
How many frequencies in all?
• A max of L/2 periods are possible
• If we try to go to (L/2 + X) periods, it ends up being identical to having (L/2 – X) periods
– With sign inversion
• Example for L = 20
– Red curve = sine with 9 cycles (in a 20 point sequence)
• Y(n) = sin(2p9n/20)
– Green curve = sine with 11 cycles in 20 points
• Y(n) = -sin(2p11n/20)
– The blue lines show the actual samples obtained
• These are the only numbers stored on the computer
• This set is the same for both sinusoids 11-755/18-797 24
![Page 25: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/25.jpg)
How to compose the signal from sinusoids
• The sines form the vectors of the projection matrix – Pinv() will do the trick as usual
11-755/18-797 25
SignalBBBBBWPROJECTION
SignalBpinvW
SignalBW
T .)(
)(1
] [ 321
3
2
1
332211
BBBB
w
w
w
W
BwBwBwSignal
B1 B2 B3
3
2
1
w
w
w
=
![Page 26: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/26.jpg)
How to compose the signal from sinusoids
• The sines form the vectors of the projection matrix – Pinv() will do the trick as usual
11-755/18-797 26
SignalBpinvW
SignalBW
)(
]1[
.
]1[
]0[
] [ 321
3
2
1
332211
Ls
s
s
Signal
BBBB
w
w
w
W
BwBwBwSignal
]1[
.
.
]1[
]0[
.
.
/L))1).(2/(.sin(2../L))1.(1.sin(2/L))1.(0.sin(2
.....
.....
/L)1).2/(.sin(2../L)1.1.sin(2/L)1.0.sin(2
/L)0).2/(.sin(2../L)0.1.sin(2/L)0.0.sin(2
2/
2
1
Ls
s
s
w
w
w
LLLL
L
L
Lppp
ppp
ppp
L/2 columns only
)/2sin( Lknp
![Page 27: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/27.jpg)
Interpretation..
• Each sinusoid’s amplitude is adjusted until it gives us the least squared error
– The amplitude is the weight of the sinusoid
• This can be done independently for each sinusoid
11-755/18-797 27
![Page 28: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/28.jpg)
Interpretation..
• Each sinusoid’s amplitude is adjusted until it gives us the least squared error
– The amplitude is the weight of the sinusoid
• This can be done independently for each sinusoid
11-755/18-797 28
![Page 29: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/29.jpg)
Interpretation..
• Each sinusoid’s amplitude is adjusted until it gives us the least squared error
– The amplitude is the weight of the sinusoid
• This can be done independently for each sinusoid
11-755/18-797 29
![Page 30: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/30.jpg)
Interpretation..
• Each sinusoid’s amplitude is adjusted until it gives us the least squared error
– The amplitude is the weight of the sinusoid
• This can be done independently for each sinusoid
11-755/18-797 30
![Page 31: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/31.jpg)
• Every sine starts at zero
– Can never represent a signal that is non-zero in the first sample!
• Every cosine starts at 1
– If the first sample is zero, the signal cannot be represented!
11-755/18-797 31
Sines by themselves are not enough
![Page 32: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/32.jpg)
The need for phase
• Allow the sinusoids to move!
• How much do the sines shift?
11-755/18-797 32
....)/2sin()/2sin( 2211 pp NknwNknwsignal
Sines are shifted: do not start with value = 0
![Page 33: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/33.jpg)
Determining phase
• Least squares fitting: move the sinusoid left / right, and at each shift, try all amplitudes – Find the combination of amplitude and phase that results in the
lowest squared error
• We can still do this separately for each sinusoid – The sinusoids are still orthogonal to one another
11-755/18-797 33
![Page 34: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/34.jpg)
Determining phase
• Least squares fitting: move the sinusoid left / right, and at each shift, try all amplitudes – Find the combination of amplitude and phase that results in the
lowest squared error
• We can still do this separately for each sinusoid – The sinusoids are still orthogonal to one another
11-755/18-797 34
![Page 35: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/35.jpg)
Determining phase
• Least squares fitting: move the sinusoid left / right, and at each shift, try all amplitudes – Find the combination of amplitude and phase that results in the
lowest squared error
• We can still do this separately for each sinusoid – The sinusoids are still orthogonal to one another
11-755/18-797 35
![Page 36: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/36.jpg)
Determining phase
• Least squares fitting: move the sinusoid left / right, and at each shift, try all amplitudes – Find the combination of amplitude and phase that results in the
lowest squared error
• We can still do this separately for each sinusoid – The sinusoids are still orthogonal to one another
11-755/18-797 36
![Page 37: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/37.jpg)
The problem with phase
• This can no longer be expressed as a simple linear algebraic equation – The “basis matrix” depends on the unknown phase
• I.e. there’s a component of the basis itself that must be estimated!
• Linear algebraic notation can only be used if the bases are fully known – We can only (pseudo) invert a known matrix
11-755/18-797 37
]1[
.
.
]1[
]0[
.
.
)/L)1).(2/(.sin(2..)/L)1.(1.sin(2)/L)1.(0.sin(2
.....
.....
)/L1).2/(.sin(2..)/L1.1.sin(2)/L1.0.sin(2
)/L0).2/(.sin(2..)/L0.1.sin(2)/L0.0.sin(2
2/
2
1
L/210
L/210
L/210
Ls
s
s
w
w
w
LLLL
L
L
Lppp
ppp
ppp
L/2 columns only
![Page 38: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/38.jpg)
• The cosine is the real part of a complex exponential
– The sine is the imaginary part
• A phase term for the sinusoid becomes a multiplicative term for the complex exponential!!
11-755/18-797 38
)*sin(][ nfreqnb
1
)*sin()*cos()**exp(][
j
nfreqjnfreqnfreqjnb
)*sin()*cos()exp()**exp()**exp( nfreqjnfreqnfreqjnfreqj
Complex Exponential to the rescue
![Page 39: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/39.jpg)
11-755/18-797 39
A x
Explaining with Complex Exponentials
B x
C x
![Page 40: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/40.jpg)
• Like sinusoids, a complex exponential of one frequency can never explain one of another
– They are orthogonal
• They represent smooth transitions
• Bonus: They are complex
– Can even model complex data!
• They can also model real data
– exp(j x ) + exp(-j x) is real • cos(x) + j sin(x) + cos(x) – j sin(x) = 2cos(x)
11-755/18-797 40
Complex exponentials are well behaved
![Page 41: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/41.jpg)
Complex Exponential bases
• Explain the data using L complex exponential bases
11-755/18-797 41
b0 b1 bL/2
=
1
12/
2/
12/
0
.
.
L
L
L
L
w
w
w
w
w
![Page 42: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/42.jpg)
• Conjugate symmetry
– is real
• The complex exponentials with frequencies equally spaced from L/2 are complex conjugates
11-755/18-797 42
L
nxLj
L
nxLj
)2/(2exp
)2/(2exp pp
Complex exponentials are well behaved
![Page 43: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/43.jpg)
• is real
– The complex exponentials with frequencies equally spaced from L/2 are complex conjugates
• “Frequency = k” k periods in L samples
– Is also real
– If the two exponentials are multiplied by numbers that are conjugates of one another the result is real
11-755/18-797 43
L
nxLj
L
nxLj
)2/(2exp
)2/(2exp pp
L
nxLjaconjugate
L
nxLja
)2/(2exp)(
)2/(2exp pp
Complex exponentials are well behaved
![Page 44: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/44.jpg)
Complex Exponential bases
• For real signals:
• The weights given to the (L/2 + k)th basis and the (L/2 – k)th basis should be complex conjugates, to make the result real
• Fortunately, a least squares fit will give us complex conjugate weights to both bases automatically
11-755/18-797 44
b0 b1 bL/2
1
12/
2/
12/
0
.
.
L
L
L
L
w
w
w
w
w
=
Complex
conjugates
)( 2/2/ kLkL wconjugatew
![Page 45: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/45.jpg)
Complex Exponential Bases: Algebraic Formulation
• Note that SL/2+x = conjugate(SL/2-x) for real s
11-755/18-797 45
]1[
.
.
]1[
]0[
.
.
/L))1).(1(.exp(j2./L))1).(2/(.exp(j2./L))1.(0.exp(j2
.....
.....
/L)1).1(.exp(j2../L)1).2/(.exp(j2./L)1.0.exp(j2
/L)0).1(.exp(j2../L)0).2/(.exp(j2./L)0.0.exp(j2
1
2/
0
Ls
s
s
S
S
S
LLLLL
LL
LL
L
L
ppp
ppp
ppp
![Page 46: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/46.jpg)
Shorthand Notation
• Note that SL/2+x = conjugate(SL/2-x)
11-755/18-797 46
]1[
.
.
]1[
]0[
.
.
..
.....
.....
...
...
)/2sin()/2cos(1
)/2exp(1
1
2/
0
1,11,2/1,0
1,11,2/1,0
0,10,2/0,0
,
Ls
s
s
S
S
S
WWW
WWW
WWW
LknjLknL
LknjL
W
L
L
LLL
LLL
LL
LL
LLL
LL
LLL
nkL ppp
![Page 47: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/47.jpg)
A quick detour
• Real Orthonormal matrix:
– XXT = X XT = I
• But only if all entries are real
– The inverse of X is its own transpose
• Definition: Hermitian
– XH = Complex conjugate of XT
• Complex Orthonormal matrix
– XXH = XH X = I
– The inverse of a complex orthonormal matrix is its own Hermitian
11-755/18-797 47
![Page 48: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/48.jpg)
W-1 = WH
11-755/18-797 48
1,11,2/1,0
1,11,2/1,0
0,10,2/0,0
..
.....
.....
...
...
LL
L
LL
L
L
L
L
L
L
LL
L
L
L
LL
WWW
WWW
WWW
W
)/2exp(1, LknjL
W nk
L p
)/2exp(1, LknjL
W nk
L p
)1(),1(2/),1(0),1(
1,12/,1,0,1
1,02/,00,0
..
.....
.....
...
...
LL
L
LL
L
L
L
L
L
L
LL
L
L
L
LL
H
WWW
WWW
WWW
W
The complex exponential basis is orthogonal Its inverse is its own Hermitian
W-1 = WH
![Page 49: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/49.jpg)
Doing it in matrix form
– Because W-1 = WH
11-755/18-797 49
]1[
.
.
]1[
]0[
..
.....
.....
...
...
.
.
)1(),1(2/),1(0),1(
1,12/,1,0,1
1,02/,00,0
1
2/
0
Ls
s
s
WWW
WWW
WWW
S
S
S
LL
L
LL
L
L
L
L
L
L
LL
L
L
L
LL
L
L
]1[
.
.
]1[
]0[
.
.
..
.....
.....
...
...
1
2/
0
1,11,2/1,0
1,11,2/1,0
0,10,2/0,0
Ls
s
s
S
S
S
WWW
WWW
WWW
L
L
LL
L
LL
L
L
L
L
L
L
LL
L
L
L
LL
![Page 50: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/50.jpg)
The Discrete Fourier Transform
• The matrix to the right is called the “Fourier Matrix”
• The weights (S0, S1. . Etc.) are called the Fourier transform
11-755/18-797 50
]1[
.
.
]1[
]0[
..
.....
.....
...
...
.
.
)1(),1(2/),1(0),1(
1,12/,1,0,1
1,02/,00,0
1
2/
0
Ls
s
s
WWW
WWW
WWW
S
S
S
LLL
LLL
LL
LL
LLL
LL
LLL
L
L
![Page 51: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/51.jpg)
• The matrix to the left is the inverse Fourier matrix
• Multiplying the Fourier transform by this matrix gives us the signal right back from its Fourier transform
11-755/18-797 51
]1[
.
.
]1[
]0[
.
.
..
.....
.....
...
...
1
2/
0
1,11,2/1,0
1,11,2/1,0
0,10,2/0,0
Ls
s
s
S
S
S
WWW
WWW
WWW
L
L
LLL
LLL
LL
LL
LLL
LL
LLL
The Inverse Discrete Fourier Transform
![Page 52: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/52.jpg)
The Fourier Matrix
• Left panel: The real part of the Fourier matrix
– For a 32-point signal
• Right panel: The imaginary part of the Fourier matrix
11-755/18-797 52
![Page 53: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/53.jpg)
The FAST Fourier Transform
• The outcome of the transformation with the Fourier matrix is the DISCRETE FOURIER TRANSFORM (DFT)
• The FAST Fourier transform is an algorithm that takes advantage of the symmetry of the matrix to perform the matrix multiplication really fast
• The FFT computes the DFT – Is much faster if the length of the signal can be expressed as 2N
11-755/18-797 53
![Page 54: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/54.jpg)
Images
• The complex exponential is two dimensional
– Has a separate X frequency and Y frequency
• Would be true even for checker boards!
– The 2-D complex exponential must be unravelled to form one component of the Fourier matrix
• For a KxL image, we’d have K*L bases in the matrix
11-755/18-797 54
![Page 55: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/55.jpg)
Typical Image Bases
• Only real components of bases shown
11-755/18-797 55
![Page 56: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/56.jpg)
DFT: Properties
• The DFT coefficients are complex
– Have both a magnitude and a phase
• Simple linear algebra tells us that
– DFT(A + B) = DFT(A) + DFT(B)
– The DFT of the sum of two signals is the DFT of their sum
• A horribly common approximation in sound processing
– Magnitude(DFT(A+B)) = Magnitude(DFT(A)) + Magnitude(DFT(B))
– Utterly wrong
– Absurdly useful
11-755/18-797 56
)exp(|| kkk SjSS
![Page 57: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/57.jpg)
Symmetric signals
• If a signal is (conjugate) symmetric around L/2, the Fourier coefficients are real!
– A(L/2-k) * exp(-j *f*(L/2-k)) + A(L/2+k) * exp(-j*f*(L/2+k)) is always real if
A(L/2-k) = conjugate(A(L/2+k))
– We can pair up samples around the center all the way; the final summation term is always real
• Overall symmetry properties
– If the signal is real, the FT is (conjugate) symmetric
– If the signal is (conjugate) symmetric, the FT is real
– If the signal is real and symmetric, the FT is real and symmetric
11-755/18-797 57
* *
* * * *
* *
* * * * * *
*
* * *
*
*
* * * * *
Contributions from points equidistant from L/2
combine to cancel out imaginary terms
![Page 58: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/58.jpg)
The Discrete Cosine Transform
• Compose a symmetric signal or image
– Images would be symmetric in two dimensions
• Compute the Fourier transform
– Since the FT is symmetric, sufficient to store only half the coefficients (quarter for an image)
• Or as many coefficients as were originally in the signal / image
11-755/18-797 58
![Page 59: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/59.jpg)
DCT
• Not necessary to compute a 2xL sized FFT – Enough to compute an L-sized cosine transform
– Taking advantage of the symmetry of the problem
• This is the Discrete Cosine Transform
11-755/18-797 59
]1[
.
.
]1[
]0[
.
.
/2L))1).(5.0(.cos(2../2L))1.(0.5)1.(cos(2/2L))1).(5.0(.cos(2
.....
.....
/2L)1).5.0(.cos(2../2L)1.0.5)1.(cos(2/2L)1).5.0(.cos(2
/2L)0).5.0(.cos(2../2L)0.0.5)1.(cos(2/2L)0).5.0(cos(2
1
1
0
Ls
s
s
w
w
w
LLLL
L
L
Lppp
ppp
ppp
L columns
![Page 60: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/60.jpg)
Images and DCT
• Most common coding is the DCT
• JPEG: Each 8x8 element of the picture is converted using a DCT
• The DCT coefficients are quantized and stored – Degree of quantization = degree of compression
• Also used to represent textures etc for pattern recognition and other forms of analysis
11-755/18-797 60
DCT
Multiply by
DCT matrix
![Page 61: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/61.jpg)
Representing Sound and Images
• “Deterministic” representations of audio time series and image data..
11-755/18-797 61
![Page 62: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/62.jpg)
Aside: some tricks to computing Fourier transforms
• Direct computation of the Fourier transform can result in poor representations
• Boundary effects can cause error
– Solution : Windowing
• The size of the signal can introduce inefficiency
– Solution: Zero padding
11-755/18-797 62
![Page 63: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/63.jpg)
Sound: A thought experiment
• Analysis: Analyze the sound using a bank of tuning forks
• Transduce the vibrations and store / transmit them
• Synthesis: Activate tuning forks with the transduced signal
• What do we get?
11-755/18-797 88
+
FT
Inverse FT
![Page 64: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/64.jpg)
The Fourier Transform and Perception: Sound
• The Fourier transforms represents the signal analogously to a bank of tuning forks
• Our ear has a bank of tuning forks
• The output of the Fourier transform is perceptually very meaningful
11-755/18-797 89
+
FT
Inverse FT
![Page 65: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/65.jpg)
The Fourier Transform and Perception: Sound
• Processing Sound:
• Analyze the sound using a bank of tuning forks
• Sample the transduced output of the turning forks at periodic intervals
11-755/18-797 90
+
FT
Inverse FT
![Page 66: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/66.jpg)
Sound parameterization
• The signal is processed in segments of 25-64 ms
– Because the properties of audio signals change quickly
– They are “stationary” only very briefly
11-755/18-797 91
![Page 67: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/67.jpg)
• The signal is processed in segments of 25-64 ms
– Because the properties of audio signals change quickly
– They are “stationary” only very briefly
• Adjacent segments overlap by 15-48 ms
11-755/18-797 92
Sound parameterization
![Page 68: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/68.jpg)
• The signal is processed in segments of 25-64 ms
– Because the properties of audio signals change quickly
– They are “stationary” only very briefly
• Adjacent segments overlap by 15-48 ms
11-755/18-797 93
Sound parameterization
![Page 69: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/69.jpg)
• The signal is processed in segments of 25-64 ms
– Because the properties of audio signals change quickly
– They are “stationary” only very briefly
• Adjacent segments overlap by 15-48 ms
11-755/18-797 94
Sound parameterization
![Page 70: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/70.jpg)
• The signal is processed in segments of 25-64 ms
– Because the properties of audio signals change quickly
– They are “stationary” only very briefly
• Adjacent segments overlap by 15-48 ms
11-755/18-797 95
Sound parameterization
![Page 71: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/71.jpg)
• The signal is processed in segments of 25-64 ms
– Because the properties of audio signals change quickly
– They are “stationary” only very briefly
• Adjacent segments overlap by 15-48 ms
11-755/18-797 96
Sound parameterization
![Page 72: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/72.jpg)
• The signal is processed in segments of 25-64 ms
– Because the properties of audio signals change quickly
– They are “stationary” only very briefly
• Adjacent segments overlap by 15-48 ms
11-755/18-797 97
Sound parameterization
![Page 73: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/73.jpg)
11-755/18-797 98
Each segment is typically 25-64
milliseconds wide Audio signals typically do not change
significantly within this short time interval
Segments shift every 10-
16 milliseconds
Sound parameterization
![Page 74: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/74.jpg)
11-755/18-797 99
Each segment is windowed
and a DFT is computed from it
Windowing
Frequency (Hz)
Com
ple
x
spectr
um
Sound parameterization
![Page 75: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/75.jpg)
11-755/18-797 100
Each segment is windowed
and a DFT is computed from it
Windowing
Sound parameterization
![Page 76: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/76.jpg)
Computing a Spectrogram
11-755/18-797 101
Compute Fourier Spectra of segments of audio and stack them side-by-side
![Page 77: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/77.jpg)
Computing a Spectrogram
11-755/18-797 102
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 78: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/78.jpg)
Computing a Spectrogram
11-755/18-797 103
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 79: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/79.jpg)
Computing a Spectrogram
11-755/18-797 104
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 80: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/80.jpg)
Computing a Spectrogram
11-755/18-797 105
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 81: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/81.jpg)
Computing a Spectrogram
11-755/18-797 106
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 82: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/82.jpg)
Computing a Spectrogram
11-755/18-797 107
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 83: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/83.jpg)
Computing a Spectrogram
11-755/18-797 108
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 84: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/84.jpg)
Computing a Spectrogram
11-755/18-797 109
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 85: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/85.jpg)
Computing a Spectrogram
11-755/18-797 110
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 86: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/86.jpg)
Computing a Spectrogram
11-755/18-797 111
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 87: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/87.jpg)
Computing a Spectrogram
11-755/18-797 112
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 88: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/88.jpg)
Computing a Spectrogram
11-755/18-797 113
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 89: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/89.jpg)
Computing a Spectrogram
11-755/18-797 114
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 90: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/90.jpg)
Computing a Spectrogram
11-755/18-797 115
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 91: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/91.jpg)
Computing a Spectrogram
11-755/18-797 116
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 92: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/92.jpg)
Computing a Spectrogram
11-755/18-797 117
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
frequency
![Page 93: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/93.jpg)
Computing the Spectrogram
11-755/18-797 118
Compute Fourier Spectra of segments of audio and stack them side-by-side
The Fourier spectrum of each window can be inverted to get back the signal.
Hence the spectrogram can be inverted to obtain a time-domain signal
In this example each segment was 25 ms long and adjacent segments overlapped by
15 ms
![Page 94: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/94.jpg)
The result of parameterization
• Each column here represents the FT of a single segment of signal 64ms wide. – Adjacent segments overlap by 48 ms.
• DFT details – 1024 points (16000 samples a second).
– 2048 point DFT – 1024 points of zero padding.
– Only 1025 points of each DFT are shown • The rest are “reflections”
• The value shown is actually the magnitude of the complex spectral values – Most of our analysis / operations are performed on the magnitude
11-755/18-797 119
![Page 95: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/95.jpg)
Representing Images
• DCT of small segments
– 8x8
– Each image becomes a matrix of DCT vectors
• DCT of the image
11-755/18-797 124
DCT
Npixels / 64 columns
![Page 96: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/96.jpg)
Downsampling-based representations
• Downsampling an example
– Trying to reduce size by factor of 4 each time
• Select every alternate sample row-wise and column-wisee
– What exactly did we capture?
• Clue : Results are horrible.
11-755/18-797 125
![Page 97: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/97.jpg)
Downsampling-based representations
• Nasty aliasing effects!
11-755/18-797 126
![Page 98: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/98.jpg)
The Gaussian Kernel
• A two-dimensional image of a Gaussian • Characterized by
– Center (mean) – Standard deviation s (assumed same in both directions)
• I.e. sphereical Gaussian
• The image can be represented by a vector 11-755/18-797 127
Ng
g
g
2
1
![Page 99: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/99.jpg)
The Gaussian Kernel matrix
• Each column is one Gaussian – Representing a Gaussian centered at one of the pixels
in the image
• As many columns as pixels – Also as many rows as pixels
11-755/18-797 128
NNNN
N
N
ggg
ggg
ggg
21
22221
11211
G =
![Page 100: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/100.jpg)
Downsampling-based representations
• Transform with Gaussian kernel matrix
• Then downsample
11-755/18-797 129
G X
Np
p
p
X
2
1
![Page 101: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/101.jpg)
Downsampling-based representations
11-755/18-797 130
G X
G1 X1
![Page 102: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/102.jpg)
The Gaussian Pyramid
• Successive smoothing and scaling
• The entire collection of images is the Gaussian pyramid
11-755/18-797 131
![Page 103: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/103.jpg)
Laplacians
11-755/18-797 132
G X
X - GX
G1 X1
X1 – G1X1
![Page 104: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/104.jpg)
Laplacian Pyramid
11-755/18-797 133
![Page 105: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/105.jpg)
Remember..
• The Gaussian is an anti-aliasing filter
• The Gaussian pyramid is the low-pass filtered version of the image
• The Laplacian pyramid is the high-pass filtered version of the image
11-755/18-797 134
Antialiasing
Filter Sampling
Analog signal Digital signal
![Page 106: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/106.jpg)
The Gaussian/Laplacian Decomposition
• Each low-pass filtered image is downsampled
• The process is recursively performed
11-755/18-797 135
![Page 107: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/107.jpg)
The discrete wavelet transform
• Very similar in structure • But the bases at each scale are orthogonal to
bases at other scales – As opposed to a Gaussian kernel matrix
11-755/18-797 136
![Page 108: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/108.jpg)
Haar Wavelets
• We have already encountered Haar wavelets
11-755/18-797 137
![Page 109: Machine Learning for Signal Processingmlsp.cs.cmu.edu/courses/fall2015/slides/Class4... · Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4.](https://reader034.fdocuments.net/reader034/viewer/2022052100/6039b7f594fc1c12d754a828/html5/thumbnails/109.jpg)
Other characterizations
• Content-based characterizations
– E.g. Hough transform
• Captures linear arrangements of pixels
– Radon transform
– SIFT features
– Etc.
• Will revisit in homework..
11-755/18-797 138