Time%frequency-Masking- - Interactive Audio Lab ·...

Post on 01-Oct-2020

9 views 0 download

Transcript of Time%frequency-Masking- - Interactive Audio Lab ·...

Time-­‐frequency  Masking  

EECS  352:  Machine  Percep=on  of  Music  &  Audio  

Zafar  Rafii,  Winter  2014   1  

Real spectrogram

time

frequ

ency

•  The  Short-­‐Time  Fourier  Transform  (STFT)  is  a  succession  of  local  Fourier  Transforms  (FT)  

STFT  

STFT  

Zafar  Rafii,  Winter  2014   2  

+  j*  Imaginary spectrogram

time

frequ

ency

frame  i   frame  i  

Time signal

timewindow  i  

Imaginary spectrum

frequency

Real spectrum

frequency

•  If  we  used  a  window  of  N  samples,  the  FT  has  N  values,  from  0  to  N-­‐1;  e.g.,  if  N  =  8…  

Time signal

timewindow  i  

FT  

STFT  

Zafar  Rafii,  Winter  2014   3  

+  j*  window  i   frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

N  frequency  values   N  frequency  values  

Imaginary spectrum

frequency

Real spectrum

frequency

•  Frequency  index  0  is  the  DC  component;  it  is  always  real  (it  is  the  sum  of  the  =me  values!)  

Time signal

timewindow  i  

FT  

STFT  

Zafar  Rafii,  Winter  2014   4  

+  j*  window  i   frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Imaginary spectrum

frequency

Real spectrum

frequency

•  Frequency  indices  from  1  to  floor(N/2)  are  the  “unique”  complex  values  (a  +  j*b)  

Time signal

timewindow  i  

FT  

STFT  

Zafar  Rafii,  Winter  2014   5  

+  j*  window  i   frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Imaginary spectrum

frequency

Real spectrum

frequency

•  Frequency  indices  from  floor(N/2)  to  N-­‐1  are  the  “mirrored”  complex  conjugates  (a  -­‐  j*b)  

Time signal

timewindow  i  

FT  

STFT  

Zafar  Rafii,  Winter  2014   6  

+  j*  window  i   frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Imaginary spectrum

frequency

Real spectrum

frequency

•  If  N  is  even,  there  is  a  pivot  component  at  frequency  index  N/2;  it  is  always  real!  

Time signal

timewindow  i  

FT  

STFT  

Zafar  Rafii,  Winter  2014   7  

+  j*  window  i   frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Real spectrogram

time

frequ

ency

•  Summary  of  the  frequency  indices  and  values  in  the  STFT  (in  colors!)  

STFT  

Zafar  Rafii,  Winter  2014   8  

Imaginary spectrogram

time

frequ

ency

Frequency  0  =    DC  component  (always  real)    

Frequency  1  to  floor(N/2)  =    “unique”  complex  values  

Frequency  N/2  =    “pivot”  component  (always  real)    

Frequency  floor(N/2)  to  N-­‐1  =    “mirrored”  complex  conjugates  

N  frequency  values  =    frequency  0  to  N-­‐1  

+  j*  

•  The  (magnitude)  spectrogram  is  the  magnitude  (absolute  value)  of  the  STFT  

Magnitude spectrogram

time

frequ

ency

Real spectrogram

time

frequ

ency

Spectrogram  

Zafar  Rafii,  Winter  2014   9  

Imaginary spectrogram

time

frequ

ency

abs  

frame  i  frame  i   frame  i  

+  j*  

•  For  a  complex  number  𝑎+𝑗∗𝑏,  the  absolute  value  is  |𝑎+𝑗∗𝑏|=√𝑎↑2 + 𝑏↑2    

abs  

Spectrogram  

Zafar  Rafii,  Winter  2014   10  

Imaginary spectrum

frequency

Real spectrum

frequency

+  j*  frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Magnitude spectrum

frequency

=  

frame  i  

1.4   1.4   1.4   1.4  3   0   2   0  

1.4   1.4   1.4  

•  All  the  N  frequency  values  (frequency  indices  from  0  to  N-­‐1)  are  real  and  posiHve  (abs!)  

abs  

Spectrogram  

Zafar  Rafii,  Winter  2014   11  

Imaginary spectrum

frequency

Real spectrum

frequency

+  j*  frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Magnitude spectrum

frequency

0   1   2   3   4   5   6   7  =   1.4  3   0   2   0  

frame  i  

N  frequency  values  

1.4   1.4   1.4  

•  Frequency  indices  from  0  to  floor(N/2)  are  the  unique  frequency  values  (with  DC  and  pivot)  

abs  

Spectrogram  

Zafar  Rafii,  Winter  2014   12  

Imaginary spectrum

frequency

Real spectrum

frequency

+  j*  frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Magnitude spectrum

frequency

0   1   2   3   4   5   6   7  =   1.4  3   0   2   0  

frame  i  

1.4   1.4   1.4  

•  Frequency  indices  from  floor(N/2)+1  to  N-­‐1  are  the  mirrored  frequency  values  

abs  

Spectrogram  

Zafar  Rafii,  Winter  2014   13  

Imaginary spectrum

frequency

Real spectrum

frequency

+  j*  frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Magnitude spectrum

frequency

0   1   2   3   4   5   6   7  =   1.4  3   0   2   0  

frame  i  

1.4   1.4   1.4  

•  Since  they  are  redundant,  we  can  discard  the  frequency  values  from  floor(N/2)+1  to  N-­‐1  

abs  

Spectrogram  

Zafar  Rafii,  Winter  2014   14  

Imaginary spectrum

frequency

Real spectrum

frequency

+  j*  frame  i   frame  i  

-­‐3   -­‐1   0   1   2   1   0   -­‐1   0   -­‐1   0   1   0   -­‐1   0   1  +  j*  0   1   2   3   4   5   6   7   0   1   2   3   4   5   6   7  

Magnitude spectrum

frequency

0   1   2   3   4   5   6   7  =   1.4  3   0   2   0  

frame  i  

floor(N/2)+1  unique    frequency  values  

•  The  spectrogram  has  therefore  floor(N/2)+1  unique  frequency  values  (with  DC  and  pivot)  

Magnitude spectrogram

time

frequ

ency

Real spectrogram

time

frequ

ency

Spectrogram  

Zafar  Rafii,  Winter  2014   15  

Imaginary spectrogram

time

frequ

ency

frame  i  frame  i   frame  i  

abs  +  j*  

Spectrogram  

•  Why  the  magnitude  spectrogram?  – Easy  to  visualize  (compare  with  the  STFT)  – Magnitude  informa=on  more  important  – Human  ear  less  sensi=ve  to  phase  

Zafar  Rafii,  Winter  2014   16  

Time signal

time

Magnitude spectrogram

time

frequ

ency +  

 0  

Spectrogram  

•  When  you  display  a  spectrogram  in  Matlab…  –  imagesc:  data  is  scaled  to  use  the  full  colormap  – 10*log10(V):  magnitude  spectrogram  in  dB  – set(gca,’YDir’,’normal’):  y-­‐axis  from  boiom  to  top  

Zafar  Rafii,  Winter  2014   17  

Time signal

time

Magnitude spectrogram

time

frequ

ency +  

 0  

•  The  signal  cannot  be  reconstructed  from  the  spectrogram  (phase  informa=on  is  missing!)  

Spectrogram  

Zafar  Rafii,  Winter  2014   18  

Time signal

time

Imaginary spectrogram

time

frequ

ency

Real spectrogram

time

frequ

ency

Magnitude spectrogram

time

frequ

ency iSTFT  ???  

???  

•  Suppose  we  have  a  mixture  of  two  sources:        a  music  signal  and  a  voice  signal  

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   19  

Music signal

time

Mixture signal

time

Voice signal

time

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

+    0  

+    0  

+    0  

•  We  assume  that  the  sources  are  sparse  =  most  of  the  =me-­‐frequency  bins  have  null  energy  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   20  

Music signal

time

Mixture signal

time

Voice signal

time

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

•  We  assume  that  the  sources  are  sparse  =  most  of  the  =me-­‐frequency  bins  have  null  energy  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   21  

Music signal

time

Mixture signal

time

Voice signal

time

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mostly  low    energy  bins  

Mostly  low    energy  bins   Mixture spectrogram

timefre

quen

cy

•  We  assume  that  the  sources  are  disjoint  =  their  =me-­‐frequency  bins  do  not  overlap  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   22  

Music signal

time

Mixture signal

time

Voice signal

time

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

•  We  assume  that  the  sources  are  disjoint  =  their  =me-­‐frequency  bins  do  not  overlap  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   23  

Music signal

time

Mixture signal

time

Voice signal

time

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Not  a  lot  of    overlapping  

•  Assuming  sparseness  and  disjointness,  we  can  discriminate  the  bins  between  mixed  sources  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   24  

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Music signal

time

Mixture signal

time

Voice signal

time

•  Assuming  sparseness  and  disjointness,  we  can  discriminate  the  bins  between  mixed  sources  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   25  

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Music signal

time

Mixture signal

time

Voice signal

time

Source  1  =  bright  Source  2  =  dark  

•  Bins  that  are  likely  to  belong  to  one  source  are  assigned  to  1,  the  rest  to  0  =  binary  masking!  

Binary mask

time

frequ

ency

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   26  

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Binary mask

time

frequ

ency 1  

 0  

Music signal

time

Voice signal

time

Source  of  interest   Interfering  source  

•  By  mul=plying  the  binary  mask  to  the  mixture  spectrogram,  we  can  “preview”  the  es=mate  

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   27  

Binary mask

time

frequ

ency .x  

Mixture spectrogram

time

frequ

ency

Masked spectrogram

time

frequ

ency

+    0  

+    0  

1    0  

•  However,  we  cannot  derive  the  es=mate  itself  because  we  cannot  invert  a  spectrogram!  

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   28  

Binary mask

time

frequ

ency .x  

Mixture spectrogram

time

frequ

ency

Masked spectrogram

time

frequ

ency

+    0  

+    0  

Music estimate

time

1    0  

•  We  mirror  the  redundant  frequencies  from  the  unique  frequencies  (without  DC  and  pivot)    

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   29  

Binary mask

time

frequ

ency

Binary mask

time

frequ

ency

Binary mask

time

frequency

1    0  

1    0  

•  We  then  apply  this  full  binary  mask  to  the  STFT  using  a  element-­‐wise  mul=plica=on  

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   30  

Binary mask

time

frequ

ency

Binary mask

time

frequ

ency

Binary mask

time

frequency

Imaginary spectrogram

time

frequ

ency

Real spectrogram

timefre

quen

cy

1    0  

1    0  

.x  

•  The  es=mate  signal  can  now  be  reconstructed  via  inverse  STFT  

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   31  

Binary mask

time

frequ

ency

1    0  

Masked imaginary

time

frequ

ency

Masked real

time

frequ

ency

Music estimate

time

iSTFT  

•  Sources  are  not  really  sparse  or  disjoint  in  =me-­‐frequency  in  the  mixture  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   32  

Music signal

time

Mixture signal

time

Voice signal

time

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Soft mask

time

frequ

ency

•  Bins  that  are  likely  to  belong  to  one  source  are  close  to  1,  the  rest  close  to  0  =  soP  masking!  

Music spectrogram

time

frequ

ency

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   33  

Music spectrogram

time

frequ

ency +  

 0  

+    0  

+    0  

Voice spectrogram

time

frequ

ency

Mixture spectrogram

timefre

quen

cy

Music signal

time

Voice signal

time

Source  of  interest   Interfering  source  

1    0  

Soft mask

time

frequ

ency

•  Let’s  listen  to  the  results!  

Time-­‐frequency  Masking  

Zafar  Rafii,  Winter  2014   34  

Music estimate

time

Music signal

time

demix  mix  

Mixture signal

time

•  How  can  we  efficiently  model  a  binary/som  =me-­‐frequency  mask  for  source  separa=on?...  

•  To  be  con=nued…  

Ques=on  

Zafar  Rafii,  Winter  2014   35  

Mixture spectrogram

time

frequ

ency

+    0  

Soft mask

time

frequ

ency

1    0  ???