Audio Features & Machine Learning
description
Transcript of Audio Features & Machine Learning
API2010API2010
Audio Features &Audio Features &Machine LearningMachine Learning
E.M. Bakker E.M. Bakker
API2010API2010
Features for Speech Recognition Features for Speech Recognition and Audio Indexingand Audio Indexing
Parametric Representations– Short Time Energy– Zero Crossing Rates– Level Crossing Rates– Short Time Spectral Envelope
Spectral Analysis– Filter Design– Filter Bank Spectral Analysis Model– Linear Predictive Coding (LPC)
API2010API2010
MethodsMethods
Vector QuantizationVector Quantization– Finite code book of spectral shapesFinite code book of spectral shapes– The code book codes for ‘typical’ spectral shapeThe code book codes for ‘typical’ spectral shape– Method for all spectral representations (e.g. Filter Method for all spectral representations (e.g. Filter
Banks, LPC, ZCR, etc. …)Banks, LPC, ZCR, etc. …)Ensemble Interval Histogram (EIH) ModelEnsemble Interval Histogram (EIH) Model– Auditory-Based Spectral Analysis ModelAuditory-Based Spectral Analysis Model– More robust to noise and reverberationMore robust to noise and reverberation– Expected to be inherently better representation of Expected to be inherently better representation of
relevant spectral information because it models the relevant spectral information because it models the human cochlea mechanicshuman cochlea mechanics
API2010API2010
Pattern RecognitionPattern Recognition
ReferencePatterns
ParameterMeasurements
DecisionRules
PatternComparison
SpeechAudio, …
RecognizedSpeech, Audio, …
Test PatternQuery Pattern
API2010API2010
Pattern RecognitionPattern Recognition
Reference VocabularyFeatures
Feature Detector1
HypothesisTester
Feature Combinerand
Decision Logic
SpeechAudio, …
RecognizedSpeech, Audio, …
Feature Detectorn
API2010API2010
Spectral Analysis ModelsSpectral Analysis Models
Pattern Recognition ApproachPattern Recognition Approach1.1. Parameter Measurement => PatternParameter Measurement => Pattern2.2. Pattern ComparisonPattern Comparison3.3. Decision MakingDecision Making
Parameter MeasurementsParameter Measurements– Bank of Filters ModelBank of Filters Model– Linear Predictive Coding ModelLinear Predictive Coding Model
API2010API2010
Band Pass FilterBand Pass Filter
Audio Signals(n)
Bandpass FilterF()
Result Audio SignalF(s(n)
Note that the bandpass filter can be defined as:
• a convolution with a filter response function in the time domain,
• a multiplication with a filter response function in the frequency domain
API2010API2010
Bank of Filters Analysis ModelBank of Filters Analysis Model
API2010API2010
Bank of Filters Analysis ModelBank of Filters Analysis ModelSpeech Signal: s(n), n=0,1,…– Digital with Fs the sampling frequency of s(n)
Bank of q Band Pass Filters: BPF1, …,BPFq– Spanning a frequency range of, e.g., 100-3000Hz or
100-16kHz– BPFi(s(n)) = xn(ejωi), where ωi = 2πfi/Fs is equal to the
normalized frequency fi, where i=1, …, q.– xn(ejωi) is the short time spectral representation of s(n)
at time n, as seen through the BPFi with centre frequency ωi, where i=1, …, q.
Note: Each BPF independently processes s to produce the spectral representation x
API2010API2010
Bank of Filters Front End ProcessorBank of Filters Front End Processor
API2010API2010
Typical Speech Wave FormsTypical Speech Wave Forms
API2010API2010
MFCCsMFCCs
Mel-Scale Filter Bank
MFCC’sfirst 12 most
Signiifcantcoefficients
Log()
SpeechAudio, … Preemphasis Windowing Fast Fourier
Transform
Direct CosineTransform
MFCCs are calculated using the formula:
N
kiki NkXC
1
)/)5.0(cos(
Where • Ci is the cepstral coefficient• P the order (12 in our case)• K the number of discrete Fourier transform magnitude coefficients• Xk the kth order log-energy output from the Mel-Scale filterbank.• N is the number of filters
API2010API2010
Linear Predictive Coding ModelLinear Predictive Coding Model
API2010API2010
Filter Response FunctionsFilter Response Functions
API2010API2010
SomeSomeExamples Examples
ofof Ideal Band Ideal Band
FiltersFilters
API2010API2010
Perceptually Based Perceptually Based Critical Band ScaleCritical Band Scale
API2010API2010
Short Time Fourier TransformShort Time Fourier Transform
• s(m) signal• w(n-m) a fixed low pass window
API2010API2010
Short Time Fourier TransformShort Time Fourier TransformLong Hamming Window: 500 samples (=50msec)Long Hamming Window: 500 samples (=50msec)
Voiced Speech
API2010API2010
Short Time Fourier TransformShort Time Fourier TransformShort Hamming Window: 50 samples (=5msec)Short Hamming Window: 50 samples (=5msec)
Voiced Speech
API2010API2010
Short Time Fourier TransformShort Time Fourier TransformLong Hamming Window: 500 samples (=50msec)Long Hamming Window: 500 samples (=50msec)
Unvoiced Speech
API2010API2010
Short Time Fourier TransformShort Time Fourier TransformShort Hamming Window: 50 samples (=5msec)Short Hamming Window: 50 samples (=5msec)
Unvoiced Speech
API2010API2010
Short Time Fourier TransformShort Time Fourier TransformLinear Filter InterpretationLinear Filter Interpretation
API2010API2010
Linear Predictive Coding (LPC) Linear Predictive Coding (LPC) ModelModel
Speech Signal: s(n), n=0,1,…– Digital with Fs the sampling frequency of s(n)
Spectral Analysis on Blocks of Speech with an all pole modeling constraintLPC of analysis order p– s(n) is blocked into frames [n,m]– Again consider xn(ejω) the short time spectral representation of s(n) at
time n. (where ω = 2πf/Fs is equal to the normalized frequency f). – Now the spectral representation xn(ejω) is constrained to be of the form
σ/A(ejω), where A(ejω) is the pth order polynomial with z-transform: A(z) = 1 + a1z-1 + a2z-2 + … + apz-p
– The output of the LPC parametric Conversion on block [n,m] is the vector [a1,…,ap].
– It specifies parametrically the spectrum of an all-pole model that best matches the signal spectrum over the period of time in which the frame of speech samples was accumulated (pth order polynomial approximation of the signal).
API2010API2010
Vector QuantizationVector Quantization
Data represented as feature vectors.Data represented as feature vectors.VQ Training set to determine a set of code VQ Training set to determine a set of code words that constitute a code book.words that constitute a code book.Code words are centroids using a similarity or Code words are centroids using a similarity or distance measure d.distance measure d.Code words together with d divide the space into Code words together with d divide the space into a Voronoi regions.a Voronoi regions.A query vector falls into a Voronoi region and will A query vector falls into a Voronoi region and will be represented by the respective codeword.be represented by the respective codeword.
API2010API2010
Vector QuantizationVector Quantization
Distance measures d(x,y):Distance measures d(x,y):
Euclidean distanceEuclidean distanceTaxi cab distanceTaxi cab distanceHamming distanceHamming distanceetc.etc.
API2010API2010
Vector QuantizationVector QuantizationClustering the Training VectorsClustering the Training Vectors
Initialize:Initialize: choose M arbitrary vectors of the L vectors of choose M arbitrary vectors of the L vectors of the training set. This is the initial code book.the training set. This is the initial code book.Nearest neighbor search:Nearest neighbor search: for each training vector, find for each training vector, find the code word in the current code book that is closest the code word in the current code book that is closest and assign that vector to the corresponding cell.and assign that vector to the corresponding cell.Centroid update:Centroid update: update the code word in each cell update the code word in each cell using the centroid of the training vectors that are using the centroid of the training vectors that are assigned to that cell.assigned to that cell.Iteration:Iteration: repeat step 2-3 until the averae distance falls repeat step 2-3 until the averae distance falls below a preset threshold. below a preset threshold.
API2010API2010
Vector ClassificationVector Classification
For an M-vector code book CB with codesFor an M-vector code book CB with codesCB = {yCB = {yii | 1 | 1 ≤ i ≤ M} ,≤ i ≤ M} ,
the index mthe index m** of the best codebook entry for a of the best codebook entry for a given vector v is:given vector v is:
mm** = arg min d(v, y = arg min d(v, y ii)) 1 1 ≤ i ≤ M≤ i ≤ M
API2010API2010
VQ for ClassificationVQ for Classification
A code book CBA code book CBkk = {y = {ykkii | 1 | 1 ≤ i ≤ M}, can be used to ≤ i ≤ M}, can be used to
define a class Cdefine a class Ckk..
Example Audio Classification:Example Audio Classification:
Classes ‘crowd’, ‘car’, ‘silence’, ‘scream’, Classes ‘crowd’, ‘car’, ‘silence’, ‘scream’, ‘explosion’, etc.‘explosion’, etc.Determine by using VQ code books Determine by using VQ code books CBCBkk for each for each of the classes.of the classes.VQ is very often used as a baseline method for VQ is very often used as a baseline method for classification problems.classification problems.
API2010API2010
Sound, DNA: Sequences!Sound, DNA: Sequences!
DNA: helix-shaped molecule DNA: helix-shaped molecule whose constituents are two whose constituents are two parallel strands of nucleotidesparallel strands of nucleotides
DNA is usually represented by DNA is usually represented by sequences of these four sequences of these four nucleotidesnucleotides
This assumes only one strand This assumes only one strand is considered; the second is considered; the second strand is always derivable strand is always derivable from the first by pairing A’s from the first by pairing A’s with T’s and C’s with G’s and with T’s and C’s with G’s and vice-versavice-versa
Nucleotides (bases)Nucleotides (bases)– Adenine (A)Adenine (A)– Cytosine (C)Cytosine (C)– Guanine (G)Guanine (G)– Thymine (T)Thymine (T)
API2010API2010
Biological Information: Biological Information: From Genes to ProteinsFrom Genes to Proteins
GeneDNA
RNA
Transcription
Translation
Protein Protein folding
genomics
molecular biology
structural biology
biophysics
API2010API2010
DNA / amino acid sequence 3D structure protein functions
DNA (gene) →→→ pre-RNA →→→ RNA →→→ Protein RNA-polymerase Spliceosome Ribosome
CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCTG…
TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAISTAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVTEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDEPSEKDALQPGRNLVAAGYALYGSATML
From Amino Acids to Proteins From Amino Acids to Proteins FunctionsFunctions
API2010API2010
Motivation for Markov ModelsMotivation for Markov Models
TThere are many cases in which we would here are many cases in which we would like to representlike to represent the statistical regularities the statistical regularities of some class of sequencesof some class of sequences– genesgenes– proteins in a given familyproteins in a given family– Sequences of audio featuresSequences of audio featuresMarkov models are well suited to this type Markov models are well suited to this type of taskof task
API2010API2010
A Markov Chain ModelA Markov Chain Model
Transition Transition probabilitiesprobabilities– Pr(xPr(xii=a|x=a|xi-1i-1=g)=0.16=g)=0.16
– Pr(xPr(xii=c|x=c|xi-1i-1=g)=0.34=g)=0.34
– Pr(xPr(xii=g|x=g|xi-1i-1=g)=0.38=g)=0.38
– Pr(xPr(xii=t|x=t|xi-1i-1=g)=0.12=g)=0.12 1)|Pr( 1 gxx ii
API2010API2010
Definition of Markov Chain ModelDefinition of Markov Chain Model
A Markov chainA Markov chain[1][1] model is defined by model is defined by
– a set of statesa set of states
some states emit symbolssome states emit symbols
other states (e.g., the begin state) are silentother states (e.g., the begin state) are silent
– a set of transitions with associateda set of transitions with associated probabilitiesprobabilities
the transitions emanating from a given state define athe transitions emanating from a given state define a
ddistribution over the possible next statesistribution over the possible next states
[1] Марков А. А., Распространение закона больших чисел на величины, зависящие друг [1] Марков А. А., Распространение закона больших чисел на величины, зависящие друг
от друга. — Известия физико-математического общества при Казанском от друга. — Известия физико-математического общества при Казанском
университете. — 2-я серия. — Том 15. (1906) — С. 135—156 университете. — 2-я серия. — Том 15. (1906) — С. 135—156
API2010API2010
Markov Chain Models: Markov Chain Models: PropertiesProperties
Given some sequence x of length L, we can ask howGiven some sequence x of length L, we can ask how probable the sequence is given our modelprobable the sequence is given our modelFor any probabilistic model of sequences, we can For any probabilistic model of sequences, we can write thiswrite this probability asprobability as
key property of a (1key property of a (1stst order) Markov chain: the order) Markov chain: the probabilityprobability of each of each xxii depends only on the value ofdepends only on the value of x xi-1i-1
)Pr()...,...,|Pr(),...,|Pr(),...,,Pr()Pr(
112111
11
xxxxxxxxxxx
LLLL
LL
L
iii
LLLL
xxx
xxxxxxxx
211
112211
)|Pr()Pr(
)Pr()|Pr()...|Pr()|Pr()Pr(
API2010API2010
The Probability of a Sequence for a The Probability of a Sequence for a Markov Chain ModelMarkov Chain Model
Pr(cggt)=Pr(c)Pr(g|c)Pr(g|g)Pr(t|g)
API2010API2010
Example ApplicationExample Application
CpGCpG islands islands
CGCG di-nucleotides are rarer in eukaryotic genomes than di-nucleotides are rarer in eukaryotic genomes than expected expected
given the marginal probabilities of given the marginal probabilities of CC and and GG
but the regions upstream of genes are richer in but the regions upstream of genes are richer in CGCG di-nucleotides di-nucleotides
than elsewhere – than elsewhere – CpGCpG islands islands
useful evidence for finding genesuseful evidence for finding genes
Application: Predict Application: Predict CpGCpG islands with Markov chains islands with Markov chains
one Markov chain to represent one Markov chain to represent CpGCpG islands islands
another Markov chain to represent the rest of the genomeanother Markov chain to represent the rest of the genome
API2010API2010
Markov Chains for Markov Chains for DiscriminationDiscrimination
Suppose we want to distinguish CpG islands from Suppose we want to distinguish CpG islands from otherother sequence regionssequence regionsGiven sequences from CpG islands, and sequences Given sequences from CpG islands, and sequences fromfrom other regions, we can constructother regions, we can construct– a model to represent CpG islandsa model to represent CpG islands– a null model to represent the other regionsa null model to represent the other regions
We can then score a test sequence by:We can then score a test sequence by:
)|Pr()|Pr(log)(
nullModelxCpGModelxxscore
API2010API2010
Markov Chains for DiscriminationMarkov Chains for DiscriminationWhy can we use Why can we use
According to Bayes’ rule:According to Bayes’ rule:
If we are not taking into account prior probabilities If we are not taking into account prior probabilities (Pr(CpG)(Pr(CpG) and and Pr(null))Pr(null)) of of the twothe two classes, then from Bayes’ rule it is clear that we just need toclasses, then from Bayes’ rule it is clear that we just need to compare compare Pr(x|CpG)Pr(x|CpG) andand Pr(x|null)Pr(x|null) as is done in our scoring function as is done in our scoring function score().score().
)Pr()Pr()|Pr()|Pr(
xCpGCpGxxCpG
)Pr()Pr()|Pr()|Pr(
xnullnullxxnull
)|Pr()|Pr(log)(
nullModelxCpGModelxxscore
API2010API2010
Higher Order Markov ChainsHigher Order Markov Chains
The Markov property specifies that the probability of a stateThe Markov property specifies that the probability of a state depends depends onlyonly on the probability of the previous state on the probability of the previous state
But we can build more “memory” into our states by using aBut we can build more “memory” into our states by using a higher higher orderorder Markov model Markov model
In an In an n-thn-th order Markov model order Markov model
The probability of the current state depends on the previous The probability of the current state depends on the previous nn states. states.
),...,|Pr(),...,,|Pr( 1121 niiiiii xxxxxxx
API2010API2010
Selecting the Order of aSelecting the Order of a MarkovMarkov Chain Chain ModelModel
But the number of parameters we need to estimate But the number of parameters we need to estimate growsgrows exponentially with the orderexponentially with the order– for modeling DNA we need for modeling DNA we need parameters for anparameters for an n-th n-th
order modelorder model
The higher the order, the less reliable we can expect The higher the order, the less reliable we can expect ourour parameter estimates to beparameter estimates to be– estimating the parameters of a estimating the parameters of a 22ndnd order Markov chainorder Markov chain from the from the
complete genome of E. Coli (5.44 x 10complete genome of E. Coli (5.44 x 1066 bases) , we’d see bases) , we’d see eacheach word ~ 85.000 times on average (divide by 4word ~ 85.000 times on average (divide by 433))
– estimating the parameters of a 9estimating the parameters of a 9thth order chain, we’d order chain, we’d see each see each word ~ 5 times on average (divide by 4word ~ 5 times on average (divide by 410 10 ~ 10~ 1066))
)4( 1nO
API2010API2010
Higher Order Markov ChainsHigher Order Markov Chains
An n-th order Markov chain over some alphabet An n-th order Markov chain over some alphabet A A isis equivalent to a first order Markov chain over the equivalent to a first order Markov chain over the alphabetalphabet of n-tuples: Aof n-tuples: Ann
Example: A 2Example: A 2ndnd order Markov model for DNA can be order Markov model for DNA can be treated as a 1treated as a 1stst order Markov model over alphabet order Markov model over alphabetAA, AC, AG, ATAA, AC, AG, AT
CA, CC, CG, CTCA, CC, CG, CT
GA, GC, GG, GTGA, GC, GG, GT
TA, TC, TG, TTTA, TC, TG, TT
API2010API2010
A Fifth Order Markov ChainA Fifth Order Markov Chain
Pr(gctaca)=Pr(gctac)Pr(a|gctac)
API2010API2010
Hidden Markov Model: A Simple Hidden Markov Model: A Simple HMMHMM
Given observed sequence AGGCT, which state emits every item?
Model 1 Model 2
API2010API2010
Tutorial on HMMTutorial on HMM
L.R. Rabiner, A Tutorial on Hidden Markov Models L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech and Selected Applications in Speech Recognition,Recognition,
Proceeding of the IEEE, Vol. 77, No. 22, February Proceeding of the IEEE, Vol. 77, No. 22, February 1989.1989.
API2010API2010
HMM for Hidden Coin TossingHMM for Hidden Coin Tossing
HT
TT T
TH
T……… H H T T H T H H T T H
API2010API2010
Hidden StateHidden State
We’ll distinguish between the observed parts of a We’ll distinguish between the observed parts of a
problemproblem and the hidden partsand the hidden parts
In the Markov models we’ve considered previously, it isIn the Markov models we’ve considered previously, it is
clear which state accounts for each part of the observedclear which state accounts for each part of the observed
sequencesequence
In the model above, there are multiple states that couldIn the model above, there are multiple states that could
account for each part of the observed sequenceaccount for each part of the observed sequence– this is the hidden part of the problemthis is the hidden part of the problem
API2010API2010
Learning and Prediction TasksLearning and Prediction Tasks(in general, i.e., applies on both MM as HMM)(in general, i.e., applies on both MM as HMM)
LearningLearning– GivenGiven: a model, a set of training sequences: a model, a set of training sequences– DoDo: find model parameters that explain the training sequences with: find model parameters that explain the training sequences with
relatively high probability (goal is to find a model that relatively high probability (goal is to find a model that generalizes generalizes wellwell to to sequences we haven’t seen before)sequences we haven’t seen before)
ClassificationClassification– GivenGiven: a set of models representing different sequence classes,: a set of models representing different sequence classes, and given and given
a test sequencea test sequence– DoDo: determine which model/class best explains the sequence: determine which model/class best explains the sequence
SegmentationSegmentation– GivenGiven: a model representing different sequence classes,: a model representing different sequence classes, and given and given a test a test
sequencesequence– DoDo: segment the sequence into subsequences, predicting the class of each: segment the sequence into subsequences, predicting the class of each
subsequencesubsequence
API2010API2010
Algorithms for Learning & PredictionAlgorithms for Learning & Prediction
LearningLearning– correct path known for each training sequencecorrect path known for each training sequence ->-> simple maximum simple maximum
likelihoodlikelihood or Bayesian estimationor Bayesian estimation– correct path not known correct path not known ->-> Forward-Backward algorithm + ML or Forward-Backward algorithm + ML or Bayesian Bayesian
estimationestimation
ClassificationClassification– simple Markov modelsimple Markov model -> -> calculate probability of sequence along singlecalculate probability of sequence along single
path for each modelpath for each model– hidden Markov modelhidden Markov model ->-> Forward algorithm to calculate probability of Forward algorithm to calculate probability of
sequence along all paths for each modelsequence along all paths for each model
SegmentationSegmentation– hidden Markov modelhidden Markov model ->-> Viterbi algorithm to find most probable path Viterbi algorithm to find most probable path for for
sequencesequence
API2010API2010
The Parameters of an HMMThe Parameters of an HMM
Transition ProbabilitiesTransition Probabilities
– Probability of transition from state k to state lProbability of transition from state k to state l
Emission ProbabilitiesEmission Probabilities
– Probability of emitting character b in state kProbability of emitting character b in state k
Note: HMM’s can also be formulated using an emission probability Note: HMM’s can also be formulated using an emission probability associated with a transition from state k to state l.associated with a transition from state k to state l.
)|Pr( 1 kla iikl
)|Pr()( kbxbe iik
API2010API2010
An HMM ExampleAn HMM Example
Emission probabilities∑ pi = 1
Transition probabilities∑ pi = 1
API2010API2010
Three Important QuestionsThree Important Questions(See also L.R. Rabiner (1989))(See also L.R. Rabiner (1989))
How likely is a given sequence?How likely is a given sequence?– The Forward algorithmThe Forward algorithm
What is the most probable “path” for generating What is the most probable “path” for generating a givena given sequence?sequence?– The Viterbi algorithmThe Viterbi algorithm
How can we learn the HMM parameters given a How can we learn the HMM parameters given a set ofset of sequences?sequences?– The Forward-Backward (Baum-Welch) algorithmThe Forward-Backward (Baum-Welch) algorithm
API2010API2010
How Likely is a Given Sequence?How Likely is a Given Sequence?The probability that a The probability that a givengiven path is taken and path is taken and thethe sequence is generated:sequence is generated:
L
iiNL iiiaxeaxx
1001 11
)()...,...Pr(
6.3.8.4.2.4.5.)(
)()(),Pr(
35313
111101
aCea
AeaAeaAAC
API2010API2010
How Likely is a Given Sequence?How Likely is a Given Sequence?
The probability The probability over all pathsover all paths is is
but the number of paths can be exponential in but the number of paths can be exponential in the length of the sequence...the length of the sequence...the Forward algorithm enables us to compute the Forward algorithm enables us to compute this efficientlythis efficiently
API2010API2010
The Forward AlgorithmThe Forward Algorithm
Define Define to be the probability of being in to be the probability of being in state kstate k having observed the first i characters of having observed the first i characters of sequence sequence x of length Lx of length LTo compute To compute , the probability of being in, the probability of being in the end state having observed all of the end state having observed all of sequence sequence xxCan be defined recursivelyCan be defined recursivelyCompute using dynamic programmingCompute using dynamic programming
)(ifk
)(LfN
API2010API2010
The Forward AlgorithmThe Forward Algorithm
ffkk(i)(i) equal to equal to the probability of being in state the probability of being in state kk having having observed the first observed the first ii characters of characters of sequence sequence xxInitializationInitialization– ff00(0) = 1(0) = 1 for start state; for start state; ffii(0) = 0(0) = 0 for other state for other state
RecursionRecursion– For emitting state For emitting state (i = 1, … L)(i = 1, … L)
– For silent stateFor silent state
TerminationTermination
k
klkl aifif )()(
k
klkll aifieif )1()()(
k
kNkNL aLfLfxxx )()()...Pr()Pr( 1
API2010API2010
Forward Algorithm ExampleForward Algorithm Example
Given the sequence x=TAGA
API2010API2010
Forward Algorithm ExampleForward Algorithm Example
InitializationInitialization– ff00(0)=1, f(0)=1, f11(0)=0…f(0)=0…f55(0)=0(0)=0
Computing other valuesComputing other values– ff11(1)=e(1)=e11(T)*(f(T)*(f00(0)a(0)a0101+f+f11(0)a(0)a1111))
=0.3*(1*0.5+0*0.2)=0.15=0.3*(1*0.5+0*0.2)=0.15– ff22(1)=0.4*(1*0.5+0*0.8)(1)=0.4*(1*0.5+0*0.8)
– ff11(2)=e(2)=e11(A)*(f(A)*(f00(1)a(1)a0101+f+f11(1)a(1)a1111))
=0.4*(0*0.5+0.15*0.2)=0.4*(0*0.5+0.15*0.2)……– Pr(TAGA)= fPr(TAGA)= f55(4)=f(4)=f33(4)a(4)a3535+f+f44(4)a(4)a4545
API2010API2010
Three Important QuestionsThree Important Questions
How likely is a given sequence?How likely is a given sequence?
What is the most probable “path” for generating What is the most probable “path” for generating a givena given sequence?sequence?
How can we learn the HMM parameters given a How can we learn the HMM parameters given a set ofset of sequences?sequences?
API2010API2010
Finding the Most Probable Path: The Viterbi AlgorithmFinding the Most Probable Path: The Viterbi Algorithm
Define Define vvkk(i)(i) to be the probability of to be the probability of the most probablethe most probable
pathpath accounting for the first accounting for the first ii characters of characters of xx and ending and ending inin state state kk
We want to compute We want to compute vvNN(L)(L),, the probability of the probability of the mostthe most
probable pathprobable path accounting for all of the sequence and accounting for all of the sequence and ending in the end stateending in the end state
Can be defined recursivelyCan be defined recursively
Again we can use use Dynamic Programming to Again we can use use Dynamic Programming to compute compute vvNN(L)(L) and find the most probable path and find the most probable path efficientlyefficiently
API2010API2010
Finding the Most Probable Path: The Viterbi AlgorithmFinding the Most Probable Path: The Viterbi Algorithm Define Define vvkk(i)(i) to be the probability of to be the probability of the most probablethe most probable path path ππ
accounting for the first accounting for the first ii characters of characters of xx and ending in and ending in state state kk
The Viterbi Algorithm:The Viterbi Algorithm:1.1. Initialization Initialization (i = 0)(i = 0)
vv00(0) = 1, v(0) = 1, vkk(0) = 0(0) = 0 for for k>0k>0
2.2. Recursion Recursion (i = 1,…,L)(i = 1,…,L) vvll(i) = e(i) = ell(x(xii) .max) .maxkk(v(vkk(i-1).a(i-1).aklkl))
ptrptrii(l) = argmax(l) = argmaxkk(v(vkk(i-1).a(i-1).aklkl))
3.3. Termination: Termination: P(x,P(x,ππ**) = max) = maxkk((vvkk(L).a(L).ak0k0))
ππ**LL = argmax = argmaxkk(v(vkk(L).a(L).ak0k0))
API2010API2010
Three Important QuestionsThree Important Questions
How likely is a given sequence?How likely is a given sequence?
What is the most probable “path” for What is the most probable “path” for
generating a givengenerating a given sequence?sequence?
How can we learn the HMM parameters How can we learn the HMM parameters
given a set ofgiven a set of sequences?sequences?
API2010API2010
Learning Without Hidden StateLearning Without Hidden StateLearning is simple if we know the correct path for each Learning is simple if we know the correct path for each sequence in our training setsequence in our training set
estimate parameters by counting the number of times each estimate parameters by counting the number of times each parameter is used across the training setparameter is used across the training set
API2010API2010
Learning With Hidden StateLearning With Hidden StateIf we don’t know the correct path for each sequence in ourIf we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequencetraining set, consider all possible paths for the sequence
Estimate parameters through a procedure that counts the Estimate parameters through a procedure that counts the expected number of times each parameter is used across expected number of times each parameter is used across the training setthe training set
API2010API2010
Learning Parameters: The Baum-Learning Parameters: The Baum-Welch AlgorithmWelch Algorithm
Also known as the Forward-Backward algorithmAlso known as the Forward-Backward algorithm
An Expectation Maximization (EM) algorithmAn Expectation Maximization (EM) algorithm– EM is a family of algorithms for learning probabilisticEM is a family of algorithms for learning probabilistic
models in problems that involve hidden statesmodels in problems that involve hidden states
In this context, the hidden state is the path that In this context, the hidden state is the path that bestbest explains each training sequenceexplains each training sequence
API2010API2010
Learning Parameters: The Baum-Learning Parameters: The Baum-Welch AlgorithmWelch Algorithm
Algorithm sketch:Algorithm sketch:– initialize parameters of modelinitialize parameters of model
– iterate until convergenceiterate until convergence
calculate the calculate the expected expected number of times number of times eacheach transition or emission is usedtransition or emission is used
adjust the parameters to adjust the parameters to maximize maximize the the likelihood oflikelihood of these expected valuesthese expected values
API2010API2010
Computational Complexity of HMM AlgorithmsComputational Complexity of HMM Algorithms
Given an HMM with S states and a sequence of length Given an HMM with S states and a sequence of length L,L, the complexity of the Forward, Backward and Viterbithe complexity of the Forward, Backward and Viterbi algorithms isalgorithms is
– This assumes that the states are densely interconnectedThis assumes that the states are densely interconnected
Given M sequences of length L, the complexity of Given M sequences of length L, the complexity of BaumBaum Welch on each iteration isWelch on each iteration is
)( 2LSO
)( 2LMSO
API2010API2010
Markov Models SummaryMarkov Models SummaryWe considered models that vary in terms of order,We considered models that vary in terms of order, hidden statehidden state
Three DP-based algorithms for HMMs: Forward, Three DP-based algorithms for HMMs: Forward, BackwardBackward and Viterbiand Viterbi
We discussed three key tasks: learning, classification We discussed three key tasks: learning, classification andand segmentationsegmentation
The algorithms used for each task depend on whether The algorithms used for each task depend on whether therethere is hidden state (correct path known) in the is hidden state (correct path known) in the problem or notproblem or not
API2010API2010
SummarySummaryMarkov chains and hidden Markov models are Markov chains and hidden Markov models are probabilistic models in which the probability of a probabilistic models in which the probability of a state depends only on that of the previous statestate depends only on that of the previous state– Given a sequence of symbols, x, the Given a sequence of symbols, x, the forwardforward algorithm algorithm
finds the probability of obtaining x in the model finds the probability of obtaining x in the model
– The The ViterbiViterbi algorithm finds the most probable path algorithm finds the most probable path (corresponding to x) through the model(corresponding to x) through the model
– The The Baum-WelchBaum-Welch learns or adjusts the model learns or adjusts the model parameters (transition and emission probabilities) to parameters (transition and emission probabilities) to best explain a set of training sequences.best explain a set of training sequences.