HTK Presentation

21
HMM Toolkit (HTK) Presentation by Daniel Whiteley AME department

Transcript of HTK Presentation

Page 1: HTK Presentation

HMM Toolkit (HTK)

Presentation byDaniel Whiteley

AME department

Page 2: HTK Presentation

What is HTK?

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.

Page 3: HTK Presentation

What is HTK?

HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems.

Page 4: HTK Presentation

Basic HTK command format

● The commands in HTK follow a basic command line format:

HCommand [options] files

● Options are indicated by a dash followed by the option letter. Universal options are capital letters.

● In HTK, it is not necessary to use file extentions, but headers to determine their format.

Page 5: HTK Presentation

Configuration files

● As well, you can set up the configuration of HTK modules using config files. They are implemented using the -C option; or they can be implemented globally using the command setenv HCONFIG myconfig where myconfig is your own config modifications.

● All possible configuration variables can be found in chapter 18 of the HTK manual. However, for most of our purposes, we only need to create a config file with these lines:

SOURCEKIND = USER %The user defined file format (not sound)

TARGETKIND = ANON_D %Keep the file the same format.

Page 6: HTK Presentation

Using HTK

● Parts of HMM modeling– Data Preparation– Model Training– Pattern Recognition– Model Analysis

Page 7: HTK Presentation

Data Preparation● One small problem:

– HTK was tailored for speech recognition. Therefore, most of the data preparation tools are for audio.

– Due to this, we need to jerry-rig our data to the HTK parameterized data file format.

● HTK parameter files consist of a sequence of samples preceeded by a header. The samples are simply data vectors, whose components are 2-byte integers or 4-byte floating point numbers.

● For us, these vectors will be a sequence of joint angles received from a motion capture session.

Page 8: HTK Presentation

HTK file format

● The file begins with a 12-byte header containing the following information:– nSamples (4-byte int): Number of samples– samplePeriod (4-byte int): Sample period (calculated

by multiplying the number by 100ns)– sampleSize (2-byte): Number of bytes per vector– parameterKind (2-byte int): Defines the type of data

● For our purposes, either this parameter will be 0x2400, which is the user defined parameter kind, or 0x2800, which is the discrete case.

Page 9: HTK Presentation

HMM model creation

● In order to model the motion capture squence, we need to create a prototype of the HMM. In this prototype, the values of B and are arbitrary. The same is true for the transition matrix A, save that any transition probability you set to zero will remain as zero.

● Models are created using a scripting language similar to HTML.

● As well, models in HTK have a beginning and ending state which are non-emitting. These states are not defined in the script.

Page 10: HTK Presentation

HMM Model Example~h ''prototype''

<BeginHMM>

<VectorSize> 4 <USER>

<NumStates> 5

<State> 2 <NumMixes> 3

<Mixture> 1 0.3<Mean> 40.0 0.0 0.0 0.0<Variance> 41.0 1.0 1.0 1.0

<Mixture> 2 0.4 ...<State> 3 ...

...

<TransP>

0.0 0.4 0.3 0.3 0.0

0.0 0.2 0.5 0.3 0.0

0.0 0.2 0.2 0.4 0.2

0.0 0.1 0.2 0.3 0.4

0.0 0.0 0.0 0.0 0.0

Transition matrix A

Name of the file

All the transition probabilities for the ending state are always zero

Sample size

Number of Gaussian distributions

Mean observation vector

Covariance matrix diagonal

The distribution’s ID and weight

Number of states

Page 11: HTK Presentation

Vector Quantization

● In order to reduce computation, we can make the HMM discreete.

● In order to use a discreete HMM, we must first quantize the data into a set of standard vectors.

● Warning: in quantizing the data, error is inheritably introduced.

● Before quantizing the data, we must first have a standard set of vectors, or a “vector cookbook”. This is made with HQuant.

Page 12: HTK Presentation

HQuant

● HQuant takes the training data and uses a K-means algorithm to evenly partition the data and find the centriods of these partitions to create our quantization vectors (QVs).

● A sample command:

HQuant -C config -n 1 64 -S train.scp vqcook

● To reduce quatization time, a cookbook using a binary tree search algorithm can be made using the -t option.

Use the configuration variables found in config

Number of QVs for a certain data stream

You can use a script to list all of your training files

Our cookbook will be written to this file

Page 13: HTK Presentation

Converting to Discrete

● The conversion of data files is done using the HCopy command. In order to quantize our data, we do this:

HCopy –C quantize rawdata qvdata

Where rawdata is our original data, qvdata is our quantized data, and quantize is a config file having these commands:

SOURCEKIND = USER %We start with our original data

TARGETKIND = DISCRETE %Convert it into discrete data

SAVEASVQ = T %We throw away the continuous data

VQTABLE = vqcook %We use are previously made %cookbook to quantize the

data

Page 14: HTK Presentation

Discrete HMM

● Discreete HMMs are very similar to their continuous counterparts, save for a few changes.

● Discrete probabilities are in logrithmic form, where:

P(v) = exp(-d(v)/2371.8)

~o <Discrete> <StreamInfo> 1 1

~h “dhmm”

<BeginHMM>

<NumStates> 5

<State> 2 <NumMixes> 10

<DProb> 5461*10

....

<EndHMM>

Number of discrete symbols

Duplicate function

Page 15: HTK Presentation

Model Training (token HMM)

● The initialization of our prototype can be done using HInit:

HInit [options] hmm data1 data2 data3 ...

● HInit is used mainly for left-right HMMs. For more ergodic HMMs, it can be initialized by doing a flat-start. This is done by setting all means and variances to the global counterparts using HCompV:

HCompV -m -S trainlist hmm

(The HHMM being trained)

Page 16: HTK Presentation

Retraining

● The model this then retrained using the Welch-Baum algorithm found in HRest:

HRest -w 1.0 -v 0.0001 -S trainlist hmm

● The -w and -v options are to set floors for the mixture probability and variances respectively. The float used in -w represents a multiplier of 10^-5.

● This can be iterated as many times as wanted to achieve desired results.

Page 17: HTK Presentation

Dictionary Creation

● In order to create a recognition program or script, we must first create a dictionary.

● A dictionary in HTK gives the word and its pronunciation. For our purposes, it will just consist of our token HMM that we trained.

RUNNING run

WALKING walk

JUMPING[SKIPPING] jumpWord

Displayed output (if not specified the word is displayed)

Tokens used to form the word

Page 18: HTK Presentation

Label Files

● Label files contain a transcription of what is going on in the data sequence.

000000 100000 walk

100001 200000 run

200001 300000 jump

Start of frame in samples

Token found in that time frame

End of frame in samples

Page 19: HTK Presentation

Master Label Files (MLFs)

• During training and recognition, we may have many test files and their accompanying label files. The label files can be condensed into one file called a master label file, or MLF.

“#!MLF!#”“*/a.lab”000000 100000 walk100001 200000 run200001 300000 jump

.“*/b.lab”run.“*/jump*.lab”jump.

Same as a original label file

If the entire file is one token, it can be labeled with just the token

The wildcard operator can be used to label multiple files at once

Page 20: HTK Presentation

Pattern Recognition

● The recognition of a motion sequence is done by using HVite.

● To receive a transcription of the recognition data in MLF format, we use:

HVite –a –i results –o SWT –H hmmlist \

–I transcripts.mlf –S testfiles

Output transcription file in MLF format

Text file containing a list of HMM used

Motion capture data to be recognized

Create word network from given transcriptions

Throws away unnecessary data in the label files

MLF file that has the test files’ transcriptions

Page 21: HTK Presentation

Model Analysis

● The analysis of the recognition results is done by HResults.

HResults -I transcripts.mlf -H hmmlist results

● Note: The reference labels and the results labels must have different file extensions

MLF containing the reference labels

List of HMMs used

MLF containing result labels