Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is...

25
Oct 25, 2019 Page 1 of 25 Rev 1.00 AN0029 Application Note for NuMicro® M480 Series Document Information Abstract This document describes how to program TensorFlow for deep learning technology to develop speech recognition, and to help users to implement the speech recognition on the M480 microcontroller. Apply to NuMicro ® M480 microcontroller family with DSP. The information described in this document is the exclusive intellectual property of Nuvoton Technology Corporation and shall not be reproduced without permission from Nuvoton. Nuvoton is providing this document only for reference purposes of NuMicro microcontroller based system design. Nuvoton assumes no responsibility for errors or omissions. All data and specifications are subject to change without notice. For additional information or questions, please contact: Nuvoton Technology Corporation. www.nuvoton.com Speech Recognition Based on Machine Learning

Transcript of Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is...

Page 1: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 1 of 25 Rev 1.00

AN0029

Application Note for NuMicro® M480 Series

Document Information

Abstract This document describes how to program TensorFlow for deep learning technology to develop speech recognition, and to help users to implement the speech recognition on the M480 microcontroller.

Apply to NuMicro® M480 microcontroller family with DSP.

The information described in this document is the exclusive intellectual property of Nuvoton Technology Corporation and shall not be reproduced without permission from Nuvoton.

Nuvoton is providing this document only for reference purposes of NuMicro microcontroller based system design. Nuvoton assumes no responsibility for errors or omissions.

All data and specifications are subject to change without notice.

For additional information or questions, please contact: Nuvoton Technology Corporation.

www.nuvoton.com

Speech Recognition Based on Machine Learning

Page 2: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 2 of 25 Rev 1.00

AN0029

Table of Contents

1 OVERVIEW ................................................................................................................................ 3

1.1 Introduction to Deep Learning .................................................................................................. 3

1.1.1 Introduction to Deep Neural Networks (DNN) ................................................................................ 4

1.2 Introduction to TensorFlow........................................................................................................ 5

1.2.1 TensorFlow Architecture ................................................................................................................... 5 1.2.2 TensorFlow Setup .............................................................................................................................. 6

2 INTRODUCTION TO KEYWORD SPOTTING (KWS) ................................................. 8

2.1 Record Speech Files ................................................................................................................... 9

2.2 Establish Python Environment .................................................................................................. 9

2.3 Generate Neural Networks Parameters for MCU .................................................................. 12

2.4 M480 Series Speech Recognition Sample Code for Keil .................................................... 13

3 EXAMPLE CODE FOR KEYWORD SPOTTING ......................................................... 16

3.1 Deep Learning Source Code Variable Setting ...................................................................... 16

3.2 Quantized Parameters Source Code Variable Setting ........................................................ 18

3.3 Using the Parameters on the M480 Series ............................................................................ 20

3.3.1 KWS main ......................................................................................................................................... 20 3.3.2 KWS Function (Using MFCC for Feature Extraction and Using DNN Model) ........................ 21 3.3.3 NN Function ...................................................................................................................................... 22

4 CONCLUSION ........................................................................................................................ 23

Page 3: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 3 of 25 Rev 1.00

AN0029

1 Overview

Voice control electronic devices are already a popular trend. The advantage is that electronic devices can be hands-free controlled in an inconvenient button environment. When developing speech recognition algorithms, researchers have discovered the algorithm of the simulated organisms’ neural system is more accurate than the traditional speech processing algorithm. One of the reasons is the neural networks called “deep learning”, which can be trained and learned from experience. But the deep learning workload is characterized by the large amount of computation, large amounts of memory, which is one of the biggest challenges for mobile devices and embedded devices.

Speech recognition means that all words and sentences can be recognized, but many of the devices people use every day do not have enough memories and processing power to present good accuracy in speech recognition. Therefore, TensorFlow can be used as the algorithm development environment of speech recognition for deep learning. Then, speech recognition can be implemented on the NuMaker-PFM-M487 platform to realize an offline and instant speech recognition system by Keyword Spotting.

The microcontroller (MCU) implements the neural networks based on ARM® CMSIS-NN library. The library contains two parts: NNFunction and NNSupportFunctions. NNFunction contains functions that implement common neural network layer types, such as Convolution, Pooling and Activation, these functions are used by the application code to implement neural networks; NNSupportFunctions are used to do different data types conversion and startup functions. CMSIS-NN includes optimized functions to accelerate the critical NN layer and helps to reduce memory usage. The kernel API is also simple, so it can be easily recomposed into any machine learning frameworks. Users just call these functions instead of combining multiplex instructions.

1.1 Introduction to Deep Learning

Machine learning is a branch of Artificial Intelligence (AI). The operation process is to use an algorithm to train a large amount of data. After the training is completed, a model will be generated. When there is new data in the future, user can predict the new data using the training model. Machine learning applications are quite extensive, such as recommendation engines, targeted advertising, demand forecasting, spam filtering, medical diagnostics, natural languages processing, search engines, fraud detection, securities analysis, visual identification, speech recognition, handwriting recognition, and more.

Deep learning is a branch of machine learning. It is the fastest growing field of artificial intelligence. There are several deep learning frameworks, such as Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), etc.. Actually the answer that the user should use what kind of architectures is not certain. It depends on the size, nature, acceptable calculating time, the urgency of learning, and what you want to do with this data. Deep learning is particularly effective in applications such as visual recognition, speech recognition, natural language processing, and biomedical applications.

Page 4: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 4 of 25 Rev 1.00

AN0029

Deep learning is to let the machine simulate the working mode of the human brain, and thus has the same learning ability as human beings. However, the human neural network is too complicated to be simulated. The neurons are divided into multiple levels to simulate the neural network. The neural network usually has an input layer, an output layer and a number of hidden layers that can be trained. More than 2 hidden layers can be called Deep Learning.

If the users do not have a basic understanding of deep learning, the course of deep learning can be used to get the most out of this document.

1.1.1 Introduction to Deep Neural Networks (DNN)

Deep Neural Networks (DNN) is the most basic discriminant model for deep learning. It can be trained by the backpropagation algorithm. The weights can be updated iteratively by the gradient descent method:

( ) ( )

is the Learning Rate, C is the Loss Function and the choice of the function is related to the Active Function. For this document as example, in order to solve the problem of Multi-Class Classification supervised learning, choose Rectified Linear Unit (ReLU) as the activation function, and use Cross Entropy as the loss function. The definition of cross entropy is as follows:

∑ ( )

represents the target probability of output unit , represents the probability output unit

after applying the activation function; and the Softmax Function of the output layer is defined as follows:

( )

∑ ( )

represents the probability of category . and are inputs to units and , respectively.

The network architecture of the deep neural networks is shown in Figure 1-1.

Page 5: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 5 of 25 Rev 1.00

AN0029

Figure 1-1 The Architecture of Deep Neural Networks (DNN)

1.2 Introduction to TensorFlow

TensorFlow is an open source code library provided by Google. Google has many products that use TensorFlow technology to develop deep learning and machine learning functions, such as Gmail filtering spam, Google voice recognition, Google image recognition, Google Translate, etc. The introduction to deep learning has described the core of deep learning and simulate neural networks with tensor (matrix) operations. Accordingly, the main design of TensorFlow is to maximize the performance of matrix operations and to develop on different platforms.

1.2.1 TensorFlow Architecture

Figure 1-2 TensorFlow Architecture Diagram

Page 6: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 6 of 25 Rev 1.00

AN0029

The following describes the details of the above architecture diagram, starting from the bottom:

Processor: TensorFlow can be executed on CPU, GPU, TPU.

CPU:Each computer has a central processing unit (CPU) that can execute

TensorFlow. It’s enough to use CPU for this speech recognition system.

GPU:Graphics processor with thousands of small and high-efficiency cores that

harness the power of parallel computing.

TPU:The Tensor Processing Unit is a proprietary chip developed by Google's

Artificial Intelligence and has better execution capabilities than the GPU.

Platform

TensorFlow is a cross-platform capability that can be implemented on current

mainstream platforms. This document uses the Windows 10 operating system.

TensorFlow Distributed Execution Engine

In deep learning, the most time-consuming is the training of the model, especially the

large-scale deep learning model, which must be trained with a large amount of data.

TensorFlow has the ability of distributed computing, which can perform model training on

hundreds of machines at the same time, greatly shorten the time of model training.

Low-level APIs

TensorFlow is available in a variety of programming languages and Python has a concise,

easy-to-learn, high-productivity, object-oriented, and functional dynamic language that is

widely used. The deep learning code of this document is also developed by Python.

High-order APIs

TensorFlow is a relatively low-level deep learning APIs. When designing a model, users

must design the underlying operations such as tensor product and convolution. Therefore,

this document is matched with the high-order API – Keras, which makes developers use

more concise and readable codes to construct a variety of complex deep learning models.

1.2.2 TensorFlow Setup

Follow the procedure below to install TensorFlow:

1. TensorFlow installation official website. (https://www.tensorflow.org/install/install_windows)

Page 7: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 7 of 25 Rev 1.00

AN0029

2. Download the Anaconda installation file from the Anaconda archive download page.

(https://repo.continuum.io/archive) The environment of this file is Windows. Users can

choose Anaconda3-4.1.0-Windows-x86_64.exe.

After installation, user can start to install the TensorFlow package as follows.

1. Click Anaconda Prompt

2. Install TensorFlow according to the instructions of the official website.

3. After the installation, the user can execute the test program in Anaconda Prompt. Enter

the Python shell first.

python

4. Follow the command step by step.

>>> import tensorflow as tf

>>> hello = tf.constant('Hello, TensorFlow!')

>>> sess = tf.Session()

>>> print(sess.run(hello))

5. Verify the installation. If the installation of TensorFlow is successful, the system will

respond to the following content. Then the user can start the program.

Hello, TensorFlow!

Page 8: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 8 of 25 Rev 1.00

AN0029

2 Introduction to Keyword Spotting (KWS)

A complete deep learning speech recognition system requires two platforms. As shown in Figure 2-1, one is PC platform. User can program the deep learning code and train the model by Tensorflow and Python. Due to the supervised learning for the training mode, it is necessary to give the system a large amount of training data and labels. Then the user can extract the features of speech data and train the model by deep neural networks (DNN). Until the system reaches the optimization, the user evaluates the accuracy by modifying the training model repeatedly. The other platform is Nuvoton NuMaker-PFM-M487 development board. The speech recognition system can be implemented based on the training parameters from PC platform.

Figure 2-1 The Flow for Speech Recognition System

Nuvoton provides a complete speech recognition system code, including the python code of training on PC and the C code of recognition part on NuMaker PFM-M487. The process of using TensorFlow and NuMaker PFM-M487 for speech recognition can be simplified into four steps as follows.

1. Record a speech file or use the speech file provided by Google.

2. Use Python to write a deep learning algorithm for user needs.

3. After the training is completed, generate a neural network model suitable for microcontroller reading.

4. Use the CMSIS-NN library to build the neural network architecture of the microcontroller and complete the speech recognition system.

This document will follow these four steps to explain how the user can easily implement a set of speech recognition system.

Page 9: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 9 of 25 Rev 1.00

AN0029

2.1 Record Speech Files

First, user needs to build a database of speech recognition systems. TensorFlow provides 30 KeyWords: down, go, left, no, off, on, right, stop, up, yes, bed, bird, cat, dog, happy, house, Marvin, Sheila, tree, wow, zero, one, two, tree, four, five, six, seven, eight, nine. The sample folder provided does not contain a speech database. The user should download it from TensorFlow and decompress the speech file into the following path: \EC-ML-KWS-for-MCU-M487- V1.0\tmp\speech_dataset, as shown in Figure 2-2.

The database of speech: http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

Figure 2-2 The Path of Speech Database

To add other key words, the user can use the sample code of BSP of NuMicro® M480 series to record. The keywords that the system uses are 10 English numbers. The information of the recording file is as follows:

File type: .wav

Sampling rate: 16000 Hz

Recording time: 1 sec

Key Word: one, two, three, four, five, six, seven, eight, nine, zero

2.2 Establish Python Environment

The python source code used in this document is based on the “ML-KWS-for-MCU” code provided by ARM-software. Figure 2-3 shows the code file of deep learning training, “train.py” is the main file. After the execution, start training of model, and the result of the training is shown in Figure 2-4.

Page 10: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 10 of 25 Rev 1.00

AN0029

Figure 2-3 The Python File for Training

Figure 2-4 The Result of Training

The deep learning model used is Deep Neural Networks (DNN). The detailed architecture is shown in Figure 2-5. It includes 1 input layer and 3 hidden layers. The number of neurons in each layer is set to 144, and finally connected to 1 output layer.

Page 11: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 11 of 25 Rev 1.00

AN0029

Figure 2-5 The Architecture of DNN

The Python code from ARM-software provides 11 deep learning models. The user can replace different models for training according to their needs as Table 2-1.

Model Description

single_fc A model with a single hidden fully-connected layer

conv A standard convolutional model

low_latency_conv A convolutional model with low compute requirements

low_latency_svdf An SVDF model with low compute requirements

dnn A model with multiple hidden fully-connected layers

cnn A model with 2 convolutional layers

lstm A model with a basic lstm layer (without output projection and peep-hole connections)

lstm A model with a basic lstm layer (with output projection and peep-hole connections)

gru A model with multi-layers GRUs

crnn A model with convolutional recurrent networks with GRUs

ds_cnn A model with depthwise separable convolutional neural network

Table 2-1 Deep Learning Models

Page 12: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 12 of 25 Rev 1.00

AN0029

2.3 Generate Neural Networks Parameters for MCU

The parameters of the deep learning model like weight and bias are usually trained using floating point numbers, but in the process of microcontroller prediction, the floating point can be quantized into integers, which will not affect the prediction (Accuracy). Therefore, the parameters of the trained neural network model are quantized to an integer, and the parameters are placed in the code of the NuMicro® M480 series microcontroller for recognition.

After the previous section of the neural network training is completed, the trained model will be stored in the “checkpoint” path: \tmp\speech_commands_train\best\checkpoint. The file is shown in Figure 2-6. There are different stages of the training model. The user can select the last training model to perform the quantification, as shown in Figure 2-7.

Figure 2-6 The Checkpointof Training Model

Figure 2-7 The Path of Training Model Checkpoint

The user can run the file of the quantitative model, as shown in Figure 2-8, where “quant_test.py” is the main file. After execution, it can generate the network parameters for the microcontroller: weight and bias, named dnn_weights.h.

Page 13: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 13 of 25 Rev 1.00

AN0029

Figure 2-8 The Python File for Quatilization

The results of the quantization are shown in Figure 2-9. The weights and bias in the dnn_weights.h file are shown in Figure 2-10.

Figure 2-9 The Results of Quantization

Figure 2-10 The Content of “dnn_weights.h”

2.4 M480 Series Speech Recognition Sample Code for Keil

The CMSIS-NN library is used as the NN function that the library is fully functional. User can easily use it for development directly, and the algorithms of the operations in each function have been optimized to reduce the running time effectively.

Nuvoton provides an example code for the speech recognition including the NuMicro® M480 series CMSIS-NN library. The main file path is \KWS\SampleCode \KWS\Keil is shown in Figure 2-11.

Page 14: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 14 of 25 Rev 1.00

AN0029

Figure 2-11 The Path of NuMicro® M487 main File for KWS

The user can replace the contents of dnn_weights.h with the trained weights and bias, and the speech recognition system can be completed. The file path is \ KWS\SampleCode\KWS \Source\DNN as shown in Figure 2-12.

Figure 2-12 The Path of “dnn_weights.h”

Page 15: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 15 of 25 Rev 1.00

AN0029

Finally, after compiling and burning the code to the MCU, the microphone can be connected to the MCU and the user can see the recognition results on the terminal shown in Figure 2-13.

Figure 2-13 The Effect of Recognition

Page 16: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 16 of 25 Rev 1.00

AN0029

3 Example Code for Keyword Spotting

This chapter describes how to program a complete speech recognition system, and divide the database into training sets (Train): validation sets (Validation): testing set (Test) = 8:1:1. The accuracy of the testing set is the total recognition accuracy of the system.

The whole process needs to execute three main files, as shown in Figure 3-1. It is divided into two platforms. One is to train model and generates weights by Python. The user can use the Python original code structure and train the model by setting the environment variables, and finally implement it on the NuMaker PFM-M487 development board.

Figure 3-1 Main Files Flowchart for Each Stage of Speech Recognition

3.1 Deep Learning Source Code Variable Setting

The program architecture has been written well, so the user can just set the environment of variables, paths, neural network types, neuron numbers, etc. in [ if __name__ == '__main__' ]. Users can apply it directly to all the file, such as the red box in the sample code.

The environment variables that need to be set in the “train.py”, “freeze.py”, and “test.py” files when training the neural network model are as follows:

if __name__ == '__main__':

parser = argparse.ArgumentParser()

/* Where to download the speech training data. */

parser.add_argument(

'--data_dir',

type=str,

default='tmp/speech_dataset/',

help="""\Where to download the speech training data to.""")

/* Range to randomly shift the training audio by in time. */

parser.add_argument(

'--time_shift_ms',

type=float,

default=40.0,

help="""\Range to randomly shift the training audio by in time.""")

/* Sample rate of the wavs. */

parser.add_argument(

'--sample_rate',

type=int,

default=16000,

train.py quant_test.py main.cpp

Page 17: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 17 of 25 Rev 1.00

AN0029

help='Expected sample rate of the wavs',)

/* how many training steps to run. */

parser.add_argument(

'--how_many_training_steps',

type=str,

default='15000,3000',

help='How many training loops to run',)

/* How often to evaluate the training results. */

parser.add_argument(

'--eval_step_interval',

type=int,

default=400,

help='How often to evaluate the training results.')

/* Learning rate to use. */

parser.add_argument(

'--learning_rate',

type=str,

default='0.001,0.0001',

help='How large a learning rate to use when training.')

/* Where to save summary logs for TensorBoard. */

parser.add_argument(

'--summaries_dir',

type=str,

default='tmp/retrain_logs',

help='Where to save summary logs for TensorBoard.')

/* Words to use. */

parser.add_argument(

'--wanted_words',

type=str,

default='one,two,three,four,five,six,seven,eight,nine,zero',

help='Words to use (others will be added to an unknown label)',)

/* Directory to write event logs and checkpoint. */

parser.add_argument(

'--train_dir',

type=str,

default='tmp/speech_commands_train',

help='Directory to write event logs and checkpoint.')

/* What model architecture to use. */

parser.add_argument(

'--model_architecture',

Page 18: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 18 of 25 Rev 1.00

AN0029

type=str,

default='dnn',

help='What model architecture to use')

/* Model dimensions. */

parser.add_argument(

'--model_size_info',

type=int,

nargs="+",

default=[144,144,144],

help='Model dimensions - different for various models')

3.2 Quantized Parameters Source Code Variable Setting

The environment variables that need to be set in the “quant_test.py” file are as follows:

if __name__ == '__main__':

parser = argparse.ArgumentParser()

/* Where to download the speech training data. */

parser.add_argument(

'--data_dir',

type=str,

default='tmp/speech_dataset/',

help="""\Where to download the speech training data to.""")

/* Sample rate of the wavs. */

parser.add_argument(

'--sample_rate',

type=int,

default=16000,

help='Expected sample rate of the wavs',)

/* Words to use. */

parser.add_argument(

'--wanted_words',

type=str,

default='one,two,three,four,five,six,seven,eight,nine,zero',

help='Words to use (others will be added to an unknown label)',)

/* Checkpoint to load the weights. */

parser.add_argument(

'--checkpoint',

type=str,

default='C:\\tmp/speech_commands_train\\best\\dnn_9224.ckpt-17200',

help='Checkpoint to load the weights from.')

Page 19: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 19 of 25 Rev 1.00

AN0029

/* What model architecture to use. */

parser.add_argument(

'--model_architecture',

type=str,

default='dnn',

help='What model architecture to use')

/* Model dimensions. */

parser.add_argument(

'--model_size_info',

type=int,

nargs="+",

default=[144,144,144],

help='Model dimensions - different for various models')

/* Activations max. */

parser.add_argument(

'--act_max',

type=float,

nargs="+",

default=[32,0,0,0,0],

help='activations max')

Page 20: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 20 of 25 Rev 1.00

AN0029

3.3 Using the Parameters on the M480 Series

Figure 3-2 shows the program flow of the NuMicro® M480 series microcontroller. The block in the figure is the main function in the program. The user can easily understand the architecture of the program.

Figure 3-2 The Program Flow of the M480 Series Microcontroller

3.3.1 KWS main

/* Output Class */

char output_class[12][10] = { "Silence", "Unknown", "One", "Two", "Three", "Four", "Five" ,"Six", "Seven", "Eight", "Nine", "Zero"};

void run_kws()

{

//Averaging window for smoothing out the output predictions

int averaging_window_len = 3; //i.e. average over 3 inferences or 240ms

kws->extract_features(2); //extract mfcc features

kws->classify(); //classify using dnn

kws->average_predictions(averaging_window_len);

max_ind = kws->get_top_detection(kws->averaged_output);

printf("Detected %s\r\n",output_class[max_ind]);

}

main()

run_kws()

extract_feature()

classify()

average_predictions()

get_top_detection()

run_nn()

main.cpp dnn.cpp kws.cpp

Detect output

Page 21: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 21 of 25 Rev 1.00

AN0029

3.3.2 KWS Function (Using MFCC for Feature Extraction and Using DNN Model)

/* This overloaded function is used in streaming audio case */

void KWS::extract_features(uint16_t num_frames)

{

//move old features left

memmove(mfcc_buffer,mfcc_buffer+(num_frames*NUM_MFCC_COEFFS),(NUM_FRAMES-num_frames)*NUM_MFCC_COEFFS);

//compute features only for the newly recorded audio

int32_t mfcc_buffer_head = (NUM_FRAMES-num_frames)*NUM_MFCC_COEFFS;

for (uint16_t f = 0; f < num_frames; f++) {

mfcc->mfcc_compute(audio_buffer+(f*FRAME_SHIFT),2,&mfcc_buffer[mfcc_buffer_head]);

mfcc_buffer_head += NUM_MFCC_COEFFS;

}

}

void KWS::classify()

{

nn->run_nn(mfcc_buffer, output);

// Softmax

arm_softmax_q7(output,OUT_DIM,output);

//do any post processing here

}

void KWS::average_predictions(int window_len)

{

//shift right old predictions

for(int i=window_len-1;i>0;i--) {

for(int j=0;j<OUT_DIM;j++)

predictions[i][j]=predictions[i-1][j];

}

//add new predictions

for(int j=0;j<OUT_DIM;j++)

predictions[0][j]=output[j];

//compute averages

int sum;

for(int j=0;j<OUT_DIM;j++) {

sum=0;

for(int i=0;i<window_len;i++)

Page 22: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 22 of 25 Rev 1.00

AN0029

sum += predictions[i][j];

averaged_output[j] = (q7_t)(sum/window_len);

}

}

int KWS::get_top_detection(q7_t* prediction)

{

int max_ind=0;

int max_val=-128;

for(int i=0;i<OUT_DIM;i++) {

if(max_val<prediction[i]) {

max_val = prediction[i];

max_ind = i;

}

}

return max_ind;

}

3.3.3 NN Function

void DNN::run_nn(q7_t* in_data, q7_t* out_data)

{

// Run all layers

// IP1

arm_fully_connected_q7(in_data, ip1_wt, IN_DIM, IP1_OUT_DIM, 1, 7, ip1_bias, ip1_out, vec_buffer);

// RELU1

arm_relu_q7(ip1_out, IP1_OUT_DIM);

// IP2

arm_fully_connected_q7(ip1_out, ip2_wt, IP1_OUT_DIM, IP2_OUT_DIM, 2, 8, ip2_bias, ip2_out, vec_buffer);

// RELU2

arm_relu_q7(ip2_out, IP2_OUT_DIM);

// IP3

arm_fully_connected_q7(ip2_out, ip3_wt, IP2_OUT_DIM, IP3_OUT_DIM, 2, 9, ip3_bias, ip3_out, vec_buffer);

// RELU3

arm_relu_q7(ip3_out, IP3_OUT_DIM);

// IP4

arm_fully_connected_q7(ip3_out, ip4_wt, IP3_OUT_DIM, OUT_DIM, 0, 6, ip4_bias, out_data, vec_buffer);

}

Page 23: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 23 of 25 Rev 1.00

AN0029

4 Conclusion

This application note provides a complete example code and uses a few simple operating steps for users to train the deep neural network model by TensorFlow, and then implements a complete set of real-time speech recognition on the NuMaker PFM-M487 platform. The applied TensorFlow is an open source software library for machine learning that supports various algorithms for deep learning. The CMSIS-NN library accelerates by optimizing key functions required in the neural network. For example, using fixed-point calculation (8/16 bits) instead of floating-point calculation, by looking up the table to avoid activation function calculation, etc. The user can call the optimized function conveniently to develop the program, and improve the speed of the algorithm to achieve the application requirements. It is expected that this application note will make it easy for users to get started in the application and develop it conveniently.

Page 24: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 24 of 25 Rev 1.00

AN0029

Revision History

Date Revision Description

2019.10.25 1.00 1. Initially issued.

Page 25: Speech Recognition Based on Machine Learning€¦ · 1.2 Introduction to TensorFlow TensorFlow is an open source code library provided by Google. Google has many products that use

Oct 25, 2019 Page 25 of 25 Rev 1.00

AN0029

Important Notice

Nuvoton Products are neither intended nor warranted for usage in systems or equipment, any malfunction or failure of which may cause loss of human life, bodily injury or severe property damage. Such applications are deemed, “Insecure Usage”.

Insecure usage includes, but is not limited to: equipment for surgical implementation, atomic energy control instruments, airplane or spaceship instruments, the control or operation of dynamic, brake or safety systems designed for vehicular use, traffic signal instruments, all types of safety devices, and other applications intended to support or sustain life.

All Insecure Usage shall be made at customer’s risk, and in the event that third parties lay claims to Nuvoton as a result of customer’s Insecure Usage, customer shall indemnify the damages and liabilities thus incurred by Nuvoton.