These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of...

31
Bibliography [Alip00] ALIPOUR, F.; BERRY, D. ; TITZE, I. A finite-element model of vocal- fold vibration. The Journal of the Acoustical Society of America, v.108, p. 3003, 2000. [Alku92] ALKU, P. Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, v.11, n.2-3, p. 109–118, 1992. [Bail10] BAILLY, L.; HENRICH, N. ; PELORSON, X. Vocal fold and ven- tricular fold vibration in period-doubling phonation: Physiological description and aerodynamic modeling. The Journal of the Acoustical Society of America, v.127, p. 3212, 2010. [Bish05] BISHOP, C. Neural networks for pattern recognition. Oxford Univ Press, 2005. [Brun06] BRUNEAU, M.; SCELO, T. Fundamentals of Acoustics. ISTE Ltd., 2006. [Cata09] CATALDO, E.; SOIZE, C.; SAMPAIO, R. ; DESCELIERS, C. Prob- abilistic modeling of a nonlinear dynamical system used for producing voice. Computational Mechanics, v.43, n.2, p. 265–275, 2009. [Chla1809] CHLADNI, E. Trait´ e d’acoustique. Courcier, 1809. [Ciso08] CISONNI, J. Mod´ elisation et inversion d’un syst` eme complexe de production de signaux acoustiques. Application ` a la voix et aux pathologies. 2008. PhD thesis, Institut National Polytechnique de Grenoble. [Clip17] CLIPPINGER, D. A. The Head Voice and Other Problems Practical Talks on Singing. Oliver Ditson company, 1917. [Cook09] COOK, D.; NAUMAN, E. ; MONGEAU, L. Ranking vocal fold model parameters by their influence on modal frequencies. The Journal of the Acoustical Society of America, v.126, n.4, p. 2002, 2009. [Cook09b] COOK, D. Systematic structural analysis of human vocal fold models. 2009. PhD thesis, School of Mechanical Engineering, Purdue University.

Transcript of These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of...

Page 1: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Bibliography

[Alip00] ALIPOUR, F.; BERRY, D. ; TITZE, I. A finite-element model of vocal-

fold vibration. The Journal of the Acoustical Society of America,

v.108, p. 3003, 2000.

[Alku92] ALKU, P. Glottal wave analysis with pitch synchronous iterative adaptive

inverse filtering. Speech Communication, v.11, n.2-3, p. 109–118, 1992.

[Bail10] BAILLY, L.; HENRICH, N. ; PELORSON, X. Vocal fold and ven-

tricular fold vibration in period-doubling phonation: Physiological description

and aerodynamic modeling. The Journal of the Acoustical Society of

America, v.127, p. 3212, 2010.

[Bish05] BISHOP, C. Neural networks for pattern recognition. Oxford

Univ Press, 2005.

[Brun06] BRUNEAU, M.; SCELO, T. Fundamentals of Acoustics. ISTE

Ltd., 2006.

[Cata09] CATALDO, E.; SOIZE, C.; SAMPAIO, R. ; DESCELIERS, C. Prob-

abilistic modeling of a nonlinear dynamical system used for producing voice.

Computational Mechanics, v.43, n.2, p. 265–275, 2009.

[Chla1809] CHLADNI, E. Traite d’acoustique. Courcier, 1809.

[Ciso08] CISONNI, J. Modelisation et inversion d’un systeme complexe

de production de signaux acoustiques. Application a la voix et

aux pathologies. 2008. PhD thesis, Institut National Polytechnique de

Grenoble.

[Clip17] CLIPPINGER, D. A. The Head Voice and Other Problems

Practical Talks on Singing. Oliver Ditson company, 1917.

[Cook09] COOK, D.; NAUMAN, E. ; MONGEAU, L. Ranking vocal fold model

parameters by their influence on modal frequencies. The Journal of the

Acoustical Society of America, v.126, n.4, p. 2002, 2009.

[Cook09b] COOK, D. Systematic structural analysis of human vocal

fold models. 2009. PhD thesis, School of Mechanical Engineering, Purdue

University.

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 2: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Bibliography 87

[Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-

tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p.

303–314, 1989.

[Fant60] FANT, G. Acoustic Theory of Speech Production with Calcu-

lations Based on X-ray Studies of Russian Articulations. Mouton,

The Hague, 1960.

[Feli06] FELIPPE, A. C. N. D.; GRILLO, M. H. M. M. ; GRECHI, T. H.

Standardization of acoustic measures for normal voice patterns. Revista

Brasileira de Otorrinolaringologia, v.72, n.5, p. 659 – 664, 10 2006.

[Flan68] FLANAGAN, J.; LANDGRAF, L. Self-oscillating source for vocal-tract

synthesizers. Audio and Electroacoustics, IEEE Transactions on,

v.16, n.1, p. 57–64, 1968.

[Fuks98] FUKS, L.; HAMMARBERG, B. ; SUNDBERG, J. A self-sustained

vocal-ventricular phonation mode: Acoustical, aerodynamic and glottographic

evidences. KTH TMH-QPSR, v.3, n.1998, p. 49–59, 1998.

[Haga96] HAGAN, M.; DEMUTH, H.; BEALE, M. ; UNIVERSITY OF COL-

ORADO, B. Neural network design. PWS Pub, 1996.

[Hayk98] HAYKIN, S. Neural Networks: A Comprehensive Foundation,

Volume 2. Prentice Hall, 1998.

[Henr03] HENRICH, N.; SUNDIN, G.; AMBROISE, D.; D’ALESSANDRO,

C.; CASTELLENGO, M. ; DOVAL, B. Just noticeable di↵erences of open

quotient and asymmetry coe�cient in singing voice. Journal of Voice,

v.17, n.4, p. 481–494, 2003.

[Hira74] HIRANO, M. Morphological structure of the vocal folds as a vibrator and

its variations. Folia Phoniatrica et Logopaedica, v.26, n.2, p. 89–94,

1974.

[Hsia02] HSIAO, T.; WANG, C.; CHEN, C.; HSIEH, F. ; SHAU, Y. Elas-

ticity of human vocal folds measured in vivo using color Doppler imaging.

Ultrasound in medicine & biology, v.28, n.9, p. 1145–1152, 2002.

[Ishi72] ISHIZAKA, K.; FLANAGAN, J. Synthesis of Voiced Sounds from a

Two-Mass Model of the Vocal Cords. Bell System Tech. J. SI, p. 1233–

1268, 1972.

[Jayn57a] JAYNES, E. T. Information Theory and Statistical Mechanics. Phys-

ical Review Series II, v.106, n.4, p. 620–630, 1957.

[Jayn57b] JAYNES, E. T. Information Theory and Statistical Mechanics II.

Physical Review Series II, v.108, n.2, p. 171–190, 1957.

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 3: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Bibliography 88

[Kapu92] KAPUR, J.; KESAVAN, H. Entropy optimization principles

with applications. Academic Press Boston, 1992.

[Kroe11] KROESE, D.; TAIMRE, T. ; BOTEV, Z. Handbook of Monte

Carlo Methods, volume 706. Wiley, 2011.

[Lous98] LOUS, N.; HOFMANS, G.; VELDHUIS, R. ; HIRSCHBERG, A.

A Symmetrical Two-Mass Vocal-Fold Model Coupled to Vocal Tract and

Trachea, with Application to Prosthesis Design. Acta Acustica united

with Acustica, v.84, n.6, p. 1135–1150, 1998.

[Luce96] J.C., L. Chest- and falsetto-like oscillations in a two-mass model of the

vocal folds. Journal of the Acoustical Society of America, v.100, p.

3355–3359, 1996.

[Maup09] MAUPRIVEZ, J.; CATALDO, E. ; SAMPAIO, R. Compara cao

de modelos mecanicos a duas massas para produ cao da voz. In:

segundo encontro national de engenharia biomecnica (enebi 2009), Florian-

polis, SC , Brasil, Maio 2009. ABCM.

[Pelo94] PELORSON, X.; HIRSCHBERG, A.; VAN HASSEL, R.; WIJ-

NANDS, A. ; AUREGAN, Y. Theoretical and experimental study of

quasisteady-flow separation within the glottis during phonation. Application

to a modified two-mass model. The Journal of the Acoustical Society

of America, v.96, p. 3416, 1994.

[Rank53] RANKE, O. F.; LULLIES, H. Gehor,Stimme,Sprache. Berlin,

Gottingen, Heidelberg: Springer Verlag, 1953.

[Ruty05] RUTY, N.; CISONNI, J.; PELORSON, X.; PERRIER, P.; BADIN,

P. ; VAN HIRTUM, A. A physical model for articulatory speech

synthesis. Theoretical and numerical principles. In: international

workshop on models and analysis of vocal emissions for biomedical applica-

tions, 2005.

[Ruty07] RUTY, N.; PELORSON, X.; VAN HIRTUM, A.; LOPEZ-

ARTEAGA, I. ; HIRSCHBERG, A. An in vitro setup to test the relevance

and the accuracy of low-order vocal folds models. The Journal of the

Acoustical Society of America, v.121, p. 479, 2007.

[Sata07] SATALOFF, R.; HEMAN-ACKAH, Y. ; HAWKSHAW, M. Clinical

anatomy and physiology of the voice. Otolaryngologic Clinics of North

America, v.40, n.5, p. 909–929, 2007.

[Scia04] SCIAMARELLA, D.; D’ALESSANDRO, C. On the acoustic sensitivity

of a symmetrical two-mass model of the vocal folds to the variation of control

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 4: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Bibliography 89

parameters. Acta Acustica united with Acustica, v.90, n.4, p. 746–

761, 2004.

[Shan48] SHANNON, C. E. A mathematical theory of communication. Bell

System Tech. J., v.27, p. 379–423 and 623–659, 1948.

[Stor95] STORY, B.; TITZE, I. Voice simulation with a body-cover model of the

vocal folds. Journal of the Acoustical Society of America, v.97, n.2,

p. 1249–1260, 1995.

[Tao07] TAO, C.; JIANG, J. ; ZHANG, Y. Measurement of stress in vocal folds

during phonation using spatiotemporal synchronization. Physics Letters

A, v.362, n.1, p. 42–49, 2007.

[Thom05] THOMSON, S.; MONGEAU, L. ; FRANKEL, S. Aerodynamic

transfer of energy to the vocal folds. The Journal of the Acoustical

Society of America, v.118, p. 1689, 2005.

[Titz94] TITZE, I. R. Principles of voice production. Prentice-Hall,

Englewood Cli↵s, NJ, 1994.

[Vila02] VILAIN, C.; PELORSON, X. ; VAN DONGEN, M. Contribution

a la synthese de parole par modele physique. Application a

l’etude des voix pathologiques. 2002. PhD thesis, Institut National

Polythechnique de Grenoble.

[Vila04] VILAIN, C.; PELORSON, X.; FRAYSSE, C.; DEVERGE, M.;

HIRSCHBERG, A. ; WILLEMS, J. Experimental validation of a quasi-

steady theory for the flow through the glottis. Journal of Sound and

Vibration, v.276, p. 475–490, 2004.

[Zhan06] ZHANG, K.; SIEGMUND, T. ; CHAN, R. A constitutive model of the

human vocal fold cover for fundamental frequency regulation. The Journal

of the Acoustical Society of America, v.119, p. 1050, 2006.

[deVr02] DE VRIES, M. P.; SCHUTTE, H. K.; VELDMAN, A. E. P. ;

VERKERKE, G. J. Glottal flow through a two-mass model: Comparison of

Navier-Stokes solutions with simplified models. Journal of the Acoustical

Society of America, v.111, n.4, p. 1847–1853, 2002.

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 5: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

APaper in peer reviewed Journals

A.1Mecanica Computacional

The following paper, published in volume XXX of the journal Mecanica

Computacional, was issued in november of 2011 as ”Estimation of a parameter

of a non-linear stochastic model for the vocal folds through a modified MCMC

algorithm.”. It presents the results of some investigations realized as a natural

continuation to the work presented in this thesis.

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 6: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

ESTIMATION OF A PARAMETER OF A NON-LINEAR STOCHASTICMODEL FOR THE VOCAL FOLDS THROUGH A MODIFIED MCMC

ALGORITHM

Julien Maupriveza, Edson Cataldob and Rubens Sampaioa

aMechanical Engineering Department – PUC-Rio, Rua Marquês de São Vicente, 225, Gávea, RJ, CEP:22453-900, Brazil, e-mail: [email protected], http://www.mec.puc-rio.br/

bApplied Mathematics Department – Universidade Federal Fluminense, Graduate Program inTelecommunications Engineering, Rua M’ario Santos Braga, S/N, Centro, Niteroi, CEP: 24020-140,

RJ, Brazil, http://www.uff.br/gma/

Keywords: Inverse model, vocal folds model, stochastic mechanics.

Abstract. Low order non-linear mechanical models for vocal folds, in the phonation process, have beenshown to be useful in the case of normal and disordered voice studies. Despite their relative simplicity,they are able to simulate the main features of the vocal fold dynamics. A good example is the so-calledtwo-mass Lous model, which uses few input parameters and has shown excellent results in understand-ing phonation phenomena. However, to model a real voice, it is required to infer a set of parametersof the model. Recently, some authors pointed out the advantage of using probabilistic approaches tocharacterize vocal fold dynamics. In this paper, a numerical stochastic model for voice production isused to simulate several vowel utterances. Then, the vocal fold tension probability density function isconsidered unknown and estimated from vowel utterances, using a Monte Carlo Markov Chain. Resultsshow a good match between the estimated and actual probability densities.

Mecánica Computacional Vol XXX, págs. 3331-3338 (artículo completo)Oscar Möller, Javier W. Signorelli, Mario A. Storti (Eds.)

Rosario, Argentina, 1-4 Noviembre 2011

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 7: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

1 INTRODUCTION

The physical process responsible for voice production involves various phenomena suchas turbulence, vibration of biological structures, and aero-acoustical couplings, which can bemodeled and simulated in details, but at a high computational cost (Cook et al. (2009)). A goodcompromise between computational cost and accuracy is obtained using low-order models, suchas Lous et al. (1998) model, and referred to as Lous model in the present paper. A discussionabout the accuracy of the model can be found in Ruty et al. (2007).In Sec. 2, Lous model for vowel utterance production is described together with its stochasticreformulation inspired by a recent paper by Cataldo et al. (2009). As no experimental data areavailable for random realizations of vowel utterances, the numerical model described in Sec. 2 isused to build a consistent set of data. This numerical model consists of generating independentrealizations of the vocal fold tension, and simulating the corresponding set of voice utterances,which is a stochastic process.

Section 3 is devoted to the description of a modified Metropolis-Hastings Monte CarloMarkov Chain (M-H MCMC). The algorithm herein described aims to infer the supposedlyunknown probability density function (p.d.f.) of the tension factor, based on the observablep.d.f. of the voice fundamental frequency.

Finally, an application, is discussed together with qualitative results in Sec. 4.

2 DETERMINISTIC AND ASSOCIATED STOCHASTIC MODEL OF THE PHONA-TORY SYSTEM FOR VOWEL PRODUCTION

2.1 Deterministic model

The two-mass model used here is the one created by Lous et al. (1998), and was constructedafter Ishizaka and Flanagan (1972) model, considering an improved description of the airflow.

The complete model is composed of two coupled subsystems: one subsystem modeling thevocal folds, which is called source, and one subsystem modeling the vocal tract, which is calledfilter.

The source subsystem is composed by two mass-spring-damper oscillators coupled by alinear spring, as represented in Fig. 1.

Figure 1: Schematic representation of Lous model.

The geometry of the space between the two plates representing the vocal folds is describedby three quantities: the glottal height at point x at the instant t (h(x, t)), the glottal depth (lg)towards z-direction, and the glottal length (given by the distance x3 � x0). The glottal flow is

J. MAUPRIVEZ, E. CATALDO, R. SAMPAIO3332

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 8: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

denoted by �g(t) and the dynamics of the vocal folds are given by Eq. 1:8>>>>><

>>>>>:

md2y1(t)dt2

+ r(y1(t))dy1(t)

dt+ s(y1(t))y1(t)

+kc(y1(t) � y2(t)) = f1(psub, psup(t), hg1(t), hg2(t))

md2y2(t)dt2

+ r(y2(t))dy2(t)

dt+ s(y2(t))y2(t)

+kc(y2(t) � y1(t)) = f2(psub, psup(t), hg1(t), hg2(t))

(1)

where hg1(t) and hg2(t) are the glottal heights, y1(t) and y2(t) are the corresponding positionfor masses 1 and 2 relative to their rest position, r(y1,2(t)) and s(y1,2(t)) are, respectively, thedamping and stiffness functions, which will be described later. kc is the stiffness constant ofthe linear spring which couples the two mass-spring-damper systems. f1 and f2 are the forcesapplied to the vocal folds due to the pressure field in the glottis and the acoustic pressure at thevocal tract inlet, as defined later. The sub-glottal pressure is given by the constant sub and thesupra-glottal pressure is given by the function sup(t). The displacement of each vocal fold isconsidered to be perpendicular to the direction of the airflow.

The elasticity and damping functions (s(y1,2(t)) and r(y1,2(t))) are piecewise linear functionsof the position of the vocal folds and they take into account the vocal-folds collision.

The elasticity function is given by Eq. 2:

s(yi(t)) =

�kyi(t) , hg

i

(t) > hlim,(k + 3k)yi(t) , hg

i

(t) 6 hlim.i = 1, 2. (2)

where k is a constant. As proposed in Pelorson et al. (1994), the contact between the vocalfolds occurs before the eventual full glottis closure, considering contact for hg1,2(t) 6 hlim,where hlim is a positive constant.

The damping function is given by Eq. 3:

r(yi(t)) =

�2⇠

pmk , hg

i

(t) > hlim,2(⇠ + 1)

pmk , hg

i

(t) 6 hlim.i = 1, 2. (3)

where ⇠ is the damping factor of the oscillators and is constant.

The fundamental frequency of the vocal folds is very sensitive to the variation of the vo-cal fold tension factor q (Ishizaka and Flanagan (1972),Cataldo et al. (2009)), related to thestiffness and mass associated to the mass-spring-damper systems modeling the vocal folds (re-spectively k and m). The values of mass and stiffness to be used are defined by m = bm

q, and

k = qbk. Then, an increase in q causes a diminution of the mass participating of the vibration mand an increase of the stiffness k. This tension factor were introduced in Ishizaka and Flanagan(1972) to simulated the diminution of the active mass during vocal fold vibration as the stiffnessincreases.

The airflow through the glottis is assumed to be quasi-steady, incompressible, and unidimen-sional (along the x axis Pelorson et al. (1994)). As suggested in Lous et al. (1998), the viscosityand fluid inertia are approximated by adding an inertive term and a Poiseuille term to Bernoulli

Mecánica Computacional Vol XXX, págs. 3331-3338 (2011) 3333

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 9: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

equation. The pressure distribution along the glottis, denoted by (x, t), can be described bythe modified Bernoulli’s energy equation and it is given by Eq.4:

(x, t) =

8>>>><

>>>>:

sub � ⇢2

⇣�

g

(t)lg

(h(x,t)�hsub

)

⌘2

�12µl2g�g(t)R x

x0

1l3g

h3(x,t)dx , x < xs

�⇢d�g

(t)dt

R x

x0

1lg

h(x,t)dx

sup(t) , x > xs

(4)

where �g(t) is the volumic flow inside the glottis, xs the position of detachment of a free jetfrom the vocal folds, hsub the height at x0, ⇢ the density of air and µ the air dynamic viscosity.The position of the free jet detachment is defined by Eq.5.

hs(t) = min(↵hg1(t), hg2(t)). (5)

The value of ↵ is set to 1.1 Lous et al. (1998), when viscous and inertive terms are consid-ered.

The force applied on the oscillators at y1(t) and y2(t), due to the pressure field, consideringonly the component normal to the plates, is given by Eq.6:

fi(t) =

Z xi

xi�1

✓x � xi�1

xi � xi�1

◆ (x, t)dx +

Z xi+1

xi

✓xi+1 � x

xi+1 � xi

◆ (x, t)dx, (6)

where i = 1, 2.

The filter subsystem, coupled to the source subsystem, is described as an acoustic tube withvariable cross section, having for input the supraglottal pressure sup(t) and for output r(t).For the sake of simplicity, the vocal tract is represented by a concatenation of cylindrical tubesand at the end of the last acoustic tube a radiation load equivalent to that of a disc in an infiniteplane is imposed (Fant (1960)).

For the whole system (source + filter), the input is the subglottal pressure sub, which isconstant, and the output is the function r(t), the pressure at the lips. Details about Lous modelcan be found in Lous et al. (1998).

2.2 Stochastic model

The present section briefly presents the methodology used to generate the testing data sets.Measurement of the mechanical parameters of the vocal folds for various independent vowelutterance were not available, neither in-vitro nor in-vivo, to us. Then, for practical reasons, allof the data presented in is this work are simulated numerically.

Recently, Cataldo et al. (2009) discussed the uncertainties of some parameters in Ishizakaand Flanagan’s model and probability density functions were constructed for them, using theMaximum Entropy Principle. Herein, the same idea is used, but applied to the Lous model, forparameter q.

The random variable Q is then associated with the parameter q, and, the correspondingstochastic model is constructed by substituting in the deterministic Lous model.

J. MAUPRIVEZ, E. CATALDO, R. SAMPAIO3334

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 10: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The p.d.f pQ(q) of the random variable Q has to verify the constraints given in Cataldo et al.(2009).Applying the Maximum Entropy Principle yields to the gamma probability density functiongiven by Eq. 7:

pQ(q) = 1]0,+1[(q)1Q

⇣1

�2Q

⌘ 1�

2Q ⇥

⇥ 1

��1/�2

Q

�✓

q

Q

◆ 1�

2Q

�1

exp

� q

�2QQ

!(7)

where the positive parameter �Q = �Q/Q is the relative deviation of the random variable Q

such that �Q < 1/p

2 and where �Q is the standard deviation of Q.

The Gamma function � is defined by �(↵) =

Z +1

0

t↵�1e�tdt.

To generate the realizations of the output radiated pressure, the Monte Carlo Method is used.First, independent realizations q(✓i) of the random variable Q are generated using the prob-ability density function defined by Eq. 7. For each realization Q(✓) of the random variableQ, a realization of the output acoustic pressure, r(t, ✓), is calculated using the Lous modelequations.

3 INFERENCE OF THE TENSION’S FACTOR P.D.F.

The stochastic process r(t) is represented in this work as a non-linear mapping of a randomvariable Q, by Lous model. Due to this non-linear mapping, the analytical resolution of theinverse problem, i.e. the inference of the p.d.f of Q given r(t), is intractable. Nevertheless,using numerical approximations, an estimate of this p.d.f can be obtained.

To infer the p.d.f of Q from r(t), the voice signal is first parametrized. For each realization r(t, ✓) of r(t), the corresponding realization of the fundamental frequency F0(✓) is calcu-lated. Then, the equations shown in sections 2.1 and 2.2 are used to map from Q to F0 througha nonlinear function g(.),

F0 = g(Q). (8)

As the analytical inverse mapping Q = g�1(F0) is not available, a Metropolis-Hastings MarkovChain Monte Carlo (M-H MCMC) algorithm (Chib and Greenberg (1995)) is implemented toinfer Q from F0.

It is supposed that pF0(ft0), the p.d.f. of the target fundamental frequency is known. Given

the experimental distribution for F0, pF0(ft0), and the mapping F0 = g(Q), through Lous model,

the Metropolis-Hastings algorithm, for a fixed iteration number N , can be implemented as fol-lows:

1. Choose Q(✓0), and find by simulation F0(✓0) = g(Q(✓0)),

2. set i = 1,

3. at step i, generate a candidate Q(✓c) from the internal Markov kernel ⇢(.|Q(✓i�1)), andfind by simulation F0(✓c) = g(Q(✓c)),

Mecánica Computacional Vol XXX, págs. 3331-3338 (2011) 3335

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 11: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

4. compute ↵ = p(g(Q(�c

)))⇢(Q(�i�1)|Q(�

c

))p(g(Q(�

i�1)))⇢(Q(�c

)|Q(�i�1)) = p(F0(�

c

))⇢(Q(�i�1)|Q(�

c

))p(F0(�

i�1))⇢(Q(�c

)|Q(�i�1)) ,

5. set Q(✓i) = Q(✓c) with probability argmin(1,↵), else set Q(✓i) = Q(✓i�1) with proba-bility 1 � ↵,

6. set i = i + 1,

7. if i N return to 3.

The internal Markov kernel ⇢(.|Q(✓i�1)) causes the support of the posterior distribution tobe progressively explored. It is worth noting that ⇢(.|Q(✓i�1)) should be chosen so that thecandidate Q(✓c) can effectively explore the whole support of the posterior distribution.Due to steps 4 and 5, values of Q(✓i) that maps to more likely F0(✓i) are chosen with higherprobability than those mapping to less likely values.

By choosing a symmetric probability function for the transition kernel, the acceptance rate↵ can be simplified to Eq.9.

↵ =p(g(Q(✓c)))

p(g(Q(✓i�1)))=

p(F0(✓c))

p(F0(✓i�1)). (9)

In this work, the acceptance rate have a uniform p.d.f. of mean Q(✓i�1) and support 2�.

⇢(Q(✓c)|Q(✓i�1)) = U(Q(✓i�1) � �, Q(✓i�1) + �), (10)

4 APPLICATION AND DISCUSSION

A target set Q is generated from the p.d.f given by Eq.7 and the statistics presented in Tab.1.

Table 1: Summary of the statistics of the target set.

Parameter Mean Relative Number ofdeviation realizations

Q 0.94 2.5% 500

From these realizations, and using the forward Lous model, the associated target p.d.f.pF0(f

t0) for F0 is simulated.

During the simulation, the set pF0(ft0) is related to the observable data. The p.d.f. pQ(qt),

used to generate the target p.d.f. pF0(ft0), is only used for comparison with the estimated distri-

bution, i.e. to validate the algorithm.

The choice of the transition kernel is of great importance, in order to work with a reasonablecomputational cost (Gilks et al. (1996)). Its probability density should be chosen so that thecandidate Q(✓c) can effectively explore the whole support of the posterior distribution.Another important parameter is its support 2�. If it is too big, the support of the posterior dis-tribution is quickly explored, at cost of the rejection of many candidates. On the other hand,using a small �, most of the candidates Q(✓c) will be accepted, nevertheless, few will samplethe regions of low probability.

J. MAUPRIVEZ, E. CATALDO, R. SAMPAIO3336

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 12: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

In the following application the starting point of the chain is Q(✓0) = .84. This value wasset by running the deterministic model for different tension factors Q(✓0) until the simulationof a fundamental frequency F0(✓0) of reasonable probability, given pF0(f

t0). The support of the

kernel density is 2� = 0.30.

Figure 2: Posterior (bars) and target (line) pQ(q) for: n = 100 (left), and n = 500 (right)

Figure 2 shows how the sampling is concentrated in the high probability region during thefirst 500 iterations. Very few realizations are sampled from the tails of the distribution.

Figure 3: Posterior (bars) and target (line) histograms for pQ(q) for n = 4000

After 4000 steps, the right tail remains almost unexplored, suggesting � is too small (0.15).Nevertheless, as shown on Fig.3, a good match is obtained between the target and simulateddistributions.

5 CONCLUSION

The inverse mapping of a stochastic non-linear model have been implemented using Metropolis-Hastings Monte Carlo Markov Chain algorithm. Very satisfying results were obtained for theestimation of the vocal fold tension probability density function when compared to the actual

Mecánica Computacional Vol XXX, págs. 3331-3338 (2011) 3337

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 13: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

one.Nevertheless, the probability of the values lying in the tails of the p.d.f. are not well inferred,suggesting that the arbitrarily chosen number of iterations or properties of the Markov Kernelare not optimal.As a next step, to be less dependent of arbitrarily chosen variables, an implementation of a morerecent algorithm such as Sequential Monte Carlo algorithm is being made.

ACKNOWLEDGEMENTS

The authors acknowledge FAPERJ (Fundação de Amparo à Pesquisa no Rio de Janeiro,CAPES (CAPES/COFECUB project N. 672/10) and CNPq (Brazilian Agency: Conselho Na-cional de Desenvolvimento Científico e Tecnológico) for the financial support they gave to thisresearch.

REFERENCES

Cataldo E., Soize C., Sampaio R., and Desceliers C. Probabilistic modeling of a nonlinear dy-namical system used for producing voice. Computational Mechanics, 43(2):265–275, 2009.

Chib S. and Greenberg E. Understanding the metropolis-hastings algorithm. American Statisti-cian, pages 327–335, 1995.

Cook D., Nauman E., and Mongeau L. Ranking vocal fold model parameters by their influenceon modal frequencies. The Journal of the Acoustical Society of America, 126(4):2002, 2009.

Fant G. Acoustic Theory of Speech Production with Calculations Based on X-ray Studies ofRussian Articulations. Mouton, The Hague, 1960.

Gilks W., Richardson S., and Spiegelhalter D. Markov Chain Monte Carlo in Practice. Chap-man & Hall, 1996.

Ishizaka K. and Flanagan J. Synthesis of Voiced Sounds from a Two-Mass Model of the VocalCords. Bell System Tech. J. SI, pages 1233–1268, 1972.

Lous N., Hofmans G., Veldhuis R., and Hirschberg A. A Symmetrical Two-Mass Vocal-FoldModel Coupled to Vocal Tract and Trachea, with Application to Prosthesis Design. ActaAcustica united with Acustica, 84(6):1135–1150, 1998.

Pelorson X., Hirschberg A., van Hassel R., Wijnands A., and Auregan Y. Theoretical and exper-imental study of quasisteady-flow separation within the glottis during phonation. Applicationto a modified two-mass model. The Journal of the Acoustical Society of America, 96:3416,1994.

Ruty N., Pelorson X., Van Hirtum A., Lopez-Arteaga I., and Hirschberg A. An in vitro setupto test the relevance and the accuracy of low-order vocal folds models. The Journal of theAcoustical Society of America, 121:479, 2007.

J. MAUPRIVEZ, E. CATALDO, R. SAMPAIO3338

Copyright © 2011 Asociación Argentina de Mecánica Computacional http://www.amcaonline.org.ar

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 14: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Appendix A. Paper in peer reviewed Journals 99

A.2Journal of Inverse Problems in Science and Engineering

The following paper, published in the journal of Inverse Problems in Science

and Engineering, was issued in november 2011 as ”Artificial neural networks

applied to the estimation of random variables associated to a two-mass model

for the vocal folds.”. It presents a summary of the main results presented in

this thesis.

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 15: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Inverse Problems in Science and Engineering2011, 1–17, iFirst

Artificial neural networks applied to the estimation of random variablesassociated to a two-mass model for the vocal folds

Julien Maupriveza*, Edson Cataldob and Rubens Sampaioa

aMechanical Engineering Department – PUC-Rio, Rua Marques de Sao Vicente, 225, Gavea,RJ, CEP: 22453-900, Brazil; bApplied Mathematics Department – Universidade Federal

Fluminense, Graduate Program in Telecommunications Engineering, Rua Mario Santos Braga,S/N, Centro, Niteroi, CEP: 24020-140, RJ, Brazil

(Received 15 February 2011; final version received 2 July 2011)

The aim of this article is to use artificial neural networks (ANNs) to solve astochastic inverse problem related to a model for voice production. Threeparameters of the model are considered uncertain and random variables areassociated to these parameters. For each random variable, a probability densityfunction is constructed using the Maximum Entropy Principle. Substituting thethree uncertain parameters for the associated random variables, the new modelconstructed is stochastic and its output is a stochastic process consisting ofrealizations of voice signals. The proposed inverse problem consists in mappingthe three random variables from the voice signals and the use of ANNs toconstruct the solution of the inverse problem. Features are extracted from theoutput voice signals and taken as inputs of the designed ANN, whose outputs arerandom variables. The probability density functions of these random outputsare estimated and compared with the original ones. Two kinds of problems arediscussed. At first, the same probability distribution is used to generate the voicesignals and to solve the corresponding inverse stochastic problem. In this case, theactual probability density functions are very well fitted by the simulated ones.Then, different probability density functions are used to generate the voice signalsto be used to train the ANN, and to solve the corresponding inverse problem. Agood surprise appears: the quality of the estimation is almost unchanged, exceptfor one of the random variables.

Keywords: vocal folds; inverse modelling; stochastic mechanics

AMS Subject Classifications: 62M45; 65L09; 74L15

1. Introduction

The physical process responsible for voice production involves various phenomena such asturbulence, vibration of biological structures and aero-acoustical couplings, which can bemodelled and simulated in details, but at a high computational cost [1]. A goodcompromise between computational cost and accuracy is obtained using low-ordermodels, such as Lous et al. [2] model, described in Section 2, and referred to as Lous modelin this article. A discussion on the accuracy of the model can be found in [3].

*Corresponding author. Email: [email protected]

ISSN 1741–5977 print/ISSN 1741–5985 online

! 2011 Taylor & FrancisDOI: 10.1080/17415977.2011.603086http://www.informaworld.com

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 16: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

In a recent article [4], a parametric probabilistic approach was used to take intoaccount uncertainties in a two-mass model for producing voice. A non-linear mapping tothe probability density function corresponding to the voice fundamental frequency(called F0) was achieved, and a trial to fit experimental data were performed by means ofan optimization algorithm. The same approach is used in Section 3 to generate a consistentset of random parameter vectors for the vocal-folds model. Lous model is then used torealize a direct non-linear mapping from the random vectors to vowel samples. In thisarticle, artificial neural networks (ANNs) are trained to substitute for the inverse non-linear mapping. Then, the aim is to construct the map from some random vectors of voicefeatures, described in Section 4, to some random vectors of control parameters of thedirect model.

An application is presented, including construction of the data sets by the direct modeland inversion using an ANN is presented in Section 5, along with some quantitativesimulations.

2. Deterministic model of the phonatory system for vowel production

Although the physics of the phonatory system is rather complex, its modelling by two-mass models has been used by several researchers, because they give a good representationof the physical phenomena involved at a reasonable computational cost.

The two-mass model used here is the one created by Lous et al. [2], and was constructedafter Ishizaka and Flanagan’s model [5], considering an improved description of theairflow.

The complete model is composed of two coupled subsystems: one subsystem modellingthe vocal folds, which is called source, and one subsystem modelling the vocal tract, whichis called filter.

2.1. Source subsystem

The source subsystem is composed by two mass-spring-damper oscillators coupled by alinear spring, as represented in Figure 1.

sub sup

Figure 1. Schematic representation of Lous model.

2 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 17: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The geometry of the space between two plates representing the vocal folds is describedby three quantities: the glottal height at point x at the instant t (h(x, t)), the glottal depth(lg) towards z-direction and the glottal length (given by the distance x3! x0).

The non-linear dynamics of the vocal folds are given by Equation 1:

md2y1ðtÞdt2

þ rð y1ðtÞÞdy1ðtÞdtþ sð y1ðtÞÞ y1ðtÞ þ kcð y1ðtÞ ! y2ðtÞÞ

¼ f1ð sub, supðtÞ, hg1 ðtÞ, hg2ðtÞÞ

md2y2ðtÞdt2

þ rð y2ðtÞÞdy2ðtÞdtþ sð y2ðtÞÞ y2ðtÞ þ kcð y2ðtÞ ! y1ðtÞÞ

¼ f2ð sub, supðtÞ, hg1 ðtÞ, hg2ðtÞÞ

8>>>>>>><

>>>>>>>:

ð1Þ

where hg1 ðtÞ and hg2 ðtÞ are the glottal heights, y1(t) and y2(t) are the corresponding positionfor masses 1 and 2 relative to their rest position, r( y1,2(t)) and s( y1,2(t)) are, respectively,the damping and stiffness functions, which will be described later. kc is the stiffnessconstant of the linear spring which couples the two mass-spring-damper systems. f1 and f2are the forces applied to the vocal folds due to the pressure field in the glottis and theacoustic pressure at the vocal tract inlet, as defined later. The sub-glottal pressure is givenby the constant sub and the supra-glottal pressure is given by the function sup(t). Thedisplacement of each vocal fold is considered to be perpendicular to the direction of theairflow.

The elasticity and damping functions (s( y1,2(t)) and r( y1,2(t))) are piecewise linearfunctions of the position of the vocal folds and they take into account the vocal-foldscollision.

The elasticity function is given by Equation (2):

sð yiðtÞÞ ¼kyiðtÞ, hgiðtÞ4 hlim,

ðkþ 3kÞ yiðtÞ, hgiðtÞ4hlim,

!i ¼ 1, 2, ð2Þ

where k is a constant. As proposed in [6], the contact between the vocal folds occurs beforethe eventual full glottis closure, considering contact for hg1,2ðtÞ4hlim, where hlim is apositive constant.

The damping function is given by Equation (3):

rð yiðtÞÞ ¼2!

ffiffiffiffiffiffiffimkp

, hgi ðtÞ4 hlim,

2ð! þ 1Þffiffiffiffiffiffiffimkp

, hgi ðtÞ4hlim,

(

i ¼ 1, 2, ð3Þ

where ! is the damping factor of the oscillators and is constant.The airflow through the glottis is assumed to be quasi-steady, incompressible and

unidimensional (along the x axis [6]). As suggested in [2], the viscosity and fluid inertia areapproximated by adding an inertive term and a Poiseuille term to Bernoulli equation. Thepressure distribution along the glottis, denoted by (x, t), can be described by the modifiedBernoulli’s energy equation and it is given by Equation 4:

ðx, tÞ ¼

sub !"

2

#gðtÞlgðhðx, tÞ ! hsubÞ

# $2

!12$l2g#gðtÞZ x

x0

1

l3gh3ðx, tÞdx, x5 xs

!"d#gðtÞdt

Z x

x0

1

lghðx, tÞdx

supðtÞ, x5xs,

8>>>>>><

>>>>>>:

ð4Þ

Inverse Problems in Science and Engineering 3

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 18: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

where #g(t) is the volumic flow inside the glottis, xs the position of detachment of a free jetfrom the vocal folds, hsub the height at x0, " the density of air and $ the air dynamicviscosity. The position of the free jet detachment is defined as

hsðtÞ ¼ minð%hg1ðtÞ, hg2ðtÞÞ: ð5Þ

The value of % is set to 1.1 [2], when viscous and inertive terms are considered.The force applied on the oscillators at y1(t) and y2(t), due to the pressure field,

considering only the component normal to the plates, is given as

fiðtÞ ¼Z xi

xi!1

x! xi!1xi ! xi!1

# $ ðx, tÞdxþ

Z xiþ1

xi

xiþ1 ! x

xiþ1 ! xi

# $ ðx, tÞdx, ð6Þ

where i¼ 1, 2.

2.2. Filter subsystem

The filter subsystem, coupled to the source subsystem, is described as an acoustic tubewith variable cross section. For the sake of simplicity, the vocal tract will be represented bya concatenation of cylindrical tubes and at the end of the last acoustic tube a radiationload equivalent to that of a disc in an infinite plane is imposed [7].

For the whole system (sourceþ filter), the input is the sub-glottal pressure sub, whichis constant, and the output is the function r(t), the pressure at the lips.

Figure 2 shows a schematic representation of the complete system, considering aconcatenation of 16 tubes to represent the vocal tract.

The sound propagation through the vocal tract is modelled by a planewave propagation. The acoustic pressure along each cylindrical tube of the modelledvocal tract, (x, t), considering no acoustic source in the vocal tract, is the solution of

sub sup

tract

Figure 2. Complete representation of Lous model, including the vocal tract. Scales are adaptedfor intelligibility. In the actual model, the glottal length is about 3 mm and the vocal tractlength of 17 cm.

4 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 19: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Equation (7), considering the boundary conditions described above and suitable initialconditions.

@2 ðx, tÞ@x2

! 1

c2@2 ðx, tÞ@t2

¼ 0: ð7Þ

The general solution of Equation (7) can be written as

ðx, tÞ ¼ þðx, tÞ þ !ðx, tÞ, ð8Þ

where þ(x, t) is the incident pressure component, travelling towards x, and !(x, t) is thereflected component, travelling against x.

Equation (8) is used to determine the transmitted and reflected pressure at each tubeinterface, and at each time step, considering the continuity of pressure and particlevelocity.

At the outlet of the vocal tract, part of the acoustic pressure is radiated. The radiationload used for a pipe flanged by an infinite plane is applied as output impedance of thevocal tract.

At the vocal tract inlet, a total reflection of the acoustic wave is considered. Thecoupling between the flow in the glottis and the vocal tract, due to the continuity of flow atthe interface glottis–vocal tract, is given as

ðx!3 , tÞ ¼"c

as1#gðtÞ þ 2 !ðxþ3 , tÞ, ð9Þ

where ðx!3 , tÞ is the pressure at the glottis–vocal tract interface on the left side, !ðxþ3 , tÞthe reflected component of the pressure on the right side of the interface and as1 the area ofthe first of the tubes used to approximate the vocal tract geometry.

3. Stochastic model

The fundamental frequency of the vocal folds is mainly sensitive to the variation of threeparameters [4,5,8], which are, in Lous model, the neutral glottal height (hg0), the sub-glottal pressure ( sub) and, mainly, the tension factor (q), defined in [5] and related to thestiffness and mass associated to the mass–spring–damper systems modelling the vocal folds(respectively k and m). The values of mass and stiffness to be used are defined by m ¼ m

qand k ¼ qbk.

Recently, Cataldo et al. [4] discussed the uncertainties of these parameters in Ishizakaand Flanagan’s model and probability density functions were constructed for them, usingthe maximum entropy principle. Herein, the same ideas will be used, but applied to theLous model.

The random variables Hg0, !sub, and Q are then associated with the parameters hg0, sub and q, respectively, and the corresponding stochastic model is constructedsubstituting, in the deterministic Lous model, three parameters for the correspondingthree random variables.

To construct the probability density functions (p.d.f.’s) associated to the randomvariables Hg0 , !sub and Q, the maximum entropy principle is used, as described below.

Inverse Problems in Science and Engineering 5

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 20: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The p.d.f. pHg0 ðhg0Þ of the random variable Hg0 has to verify the following constraints:

Z þ1

!1pHg0ðhg0Þdhg0 ¼ 1, ð10Þ

Z þ1

!1hg0 pHg0ðhg0Þdhg0 ¼ Hg0, ð11Þ

Z þ1

!1h2g0 pHg0 ðhg0Þdhg0 ¼ c1, ð12Þ

in which c1 is an unknown positive finite constant.The use of the maximum entropy principle yields

pHg0 ðhg0Þ ¼ 1&0,þ1½ðhg0Þe!&0!&1hg0!&22ðhg0 Þ, ð13Þ

where &0, &1 and &2 are the solution of the three equations defined by Equations (10)–(12)and 1&0,þ1½ðhg0 Þ is a function equal to one if hg0 belongs to ]0, þ1[ and to zero otherwise.

A new parameterization for c1 is used: c1 ¼ H2g0ð1þ '2Hg0

Þ, where the relative deviation 'Hg0

is defined by 'Hg0 ¼(Hg0

Hg0and (Hg0 is the standard deviation.

The p.d.f. p!subð subÞ of the random variable !sub has to verify the following

constraints:Z þ1

!1p!subð subÞd sub ¼ 1, ð14Þ

Z þ1

!1 subp!sub

ð subÞd sub ¼ !sub, ð15Þ

Z þ1

!1lnð subÞ p!sub

ð subÞd sub ¼ c2, ð16Þ

in which c2 is an unknown positive constant.The Maximum Entropy Principle yields the following probability density

function for !sub

p!subð subÞ ¼ 1&0,þ1½ð subÞ

1

!sub

1

'2!sub

! 1

'2!sub( 1

" 1='2!sub

% & sub

!sub

# $ 1

'2!sub

!1exp ! sub

'2!sub!sub

!

,

ð17Þ

in which '!sub¼ (!sub

=!sub is the relative deviation of the random variable !sub such that0 ) '!sub

5 1=ffiffiffi2p

and where (!subis the standard deviation of !sub. The Gamma function

" is defined by

"ð%Þ ¼Z þ1

0t%!1e!tdt:

From Equation (17), it can be verified that !sub is a second-order random variable.

6 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 21: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The p.d.f. pQ(q) of the random variable Q, whose support is ]0,þ1[, has to verify thefollowing constraints:

Z þ1

!1pQðqÞdq ¼ 1, ð18Þ

Z þ1

!1q pQðqÞdq ¼ Q, ð19Þ

Z þ1

!1lnðqÞ pQðqÞdq ¼ c3, ð20Þ

in which c3 is an unknown positive constant.Applying the Maximum Entropy Principle yields the following p.d.f.:

pQðqÞ ¼ 1&0,þ1½ðqÞ1

Q

1

'2Q

! 1

'2Q

( 1

" 1='2Q

% & q

Q

! 1

'2Q

!1

exp ! q

'2QQ

!

ð21Þ

where the positive parameter 'Q¼ (Q/Q is the relative deviation of the random variable Qsuch that 'Q 5 1=

ffiffiffi2p

and where (Q is the standard deviation of Q. From Equation (21), itcan be proved that Q is a second-order random variable and that E{1/Q2}5þ1.

3.1. Generation of the output radiated pressure realizations

To generate the realizations of the output radiated pressure, the Monte Carlo Method isused. First, independent realizations Hg0j , !subj and Qj of the random variables Hg0, !sub

and Q are generated using the p.d.f.’s defined by Equations (13), (17) and (21). For eachrealization of the random vector )¼ (Hg0, !sub, Q), a realization of the output acousticpressure, !r(t, )), is calculated using the Lous model equations. The system used isillustrated in Figure 3.

The output pressure !r(t, )) is a stochastic process and, from each realization of ), afundamental frequency, denoted by F0()) can be associated to !r(t, )).

The convergence function n ! conv(n) associated to the p.d.f. of the random variableF0 is given by Equation (22):

convðnÞ ¼ 1

n

Xn

j¼1F0ð)j Þ2, ð22Þ

where F0()1), . . . ,F0()n) are independent realizations of F0.It was observed that the convergence was reached for n* 2000 realizations, as shown in

Figure 4.

sub

Figure 3. Illustration of the realized mapping.

Inverse Problems in Science and Engineering 7

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 22: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The sample mean mF0 and relative deviation b'F0 associated to the random variable F0

were estimated using the following equations

bmF0 ¼1

n

Xn

j¼1F0ð)j Þ, ð23Þ

b'F0 ¼1

bmF0

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

ðn! 1ÞXn

j¼1ðF0ð)j Þ ! bmF0Þ

2

vuut : ð24Þ

As an example, the values presented in Table 1 are used to generate randomrealizations of the random variables Q, !sub and Hg0 . These realizations are processed byLous model and realizations of !r(t) are then obtained.

The corresponding histogram for the fundamental frequency F0 is shown in Figure 5.After these results, one may try, given a probability density function for the

fundamental frequency, to obtain a corresponding set of random variables which cangenerate it using the Lous model, i.e. to solve the corresponding inverse problem.

Figure 4. Mean square convergence for 6000 realizations simulated with Lous model.

Table 1. Mean and relative deviation for the randomvariables associated to the stochastic model.

Parameter MeanRelative

deviation (%)

Q 0.96 2.5!sub (Pa) 700 1Hg0 (m) 2.5e!4 3

8 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 23: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

An approach used in [4] was applied to Ishizaka and Flanagan’s model by solving anoptimization problem, considering the distance between the given probability densityfunction and the simulated one as a cost function.

Herein, the model used is the Lous model and the objective is to solve thecorresponding inverse problem, but using ANNs.

4. Solving the inverse problem using ANNs

4.1. General considerations

The aim in this section is to describe the voice signal features to be used by an ANN tomap to three random variables associated to three control parameters of the Lous model.In [4], only the fundamental frequency (F0) was used as the voice signal feature. However,it is possible to generate a voice signal with a given fundamental frequency by usingdifferent values of the set ½hg0, q, sub&. Then, one can say that the inverse problem is ill-posed. The ANN used in this work is trained by minimizing the sum-of-squares error,which leads to very poor results for such multi-modal problems [9], prohibiting its use torealize the mapping from F0 to Q, !sub and Hg0.

Hence, a trial is performed extracting other voice features from the voice signals, inaddition to the fundamental frequency.

4.2. Other voice features used

The stochastic process !r(t) is represented in this work as a non-linear mapping of arandom vector ½Hg0 ,Q,!sub&, obtained with Lous model (Figure 3). To invert thismapping, using ANNs, it is required to extract a vector of features from each realization of!r(t); vector for which the designed ANN can give an estimation of the random vector½Hg0 ,Q,!sub&. The corresponding estimator will be denoted by ½bHg0 , bQ,b!sub&.

Figure 5. PDF for the fundamental frequency of the voice. bmF0 ¼ 125:17Hz, b'F0 ¼ 0:024.

Inverse Problems in Science and Engineering 9

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 24: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

Recently, Mauprivez et al. [10] used features extracted from the glottal signal. Fromeach realization !r(t, )) of the output radiated pressure, the glottal flow #g(t, )) can beobtained by a process of iterative adaptive inverse filtering [11]. TKK Aparat algorithm,described in [12], were used to parameterize the glottal signal. Figure 6 shows an exampleof a typical glottal flow signal, its derivative and some relevant key instants.

Some works, such as [13,14], showed that these key instants are correlated with thevariation of the mechanical parameters of Lous model. Attempts to construct estimators½bHg0 , bQ,b!sub& from these key instants were used and only the vocal folds tension (Q)estimator was achieved with a reasonable accuracy.

Herein, another essay is performed by extracting the spectral amplitudes of the first 50harmonics present in each realization of the output radiated pressure, as illustrated inFigure 7. The fundamental frequency is also extracted.

The inputs of the constructed neural network will then be:

. F0 – the fundamental frequency.

. *1, . . . ,*50 – the normalized spectral amplitudes of the 50 first harmonics of eachoutput radiated pressure realization.

Figure 6. Typical glottal pulse (a) and its derivative (b).

Figure 7. Illustration of the spectral amplitudes obtained from a voice signal realization of !r(t).

10 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 25: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

In the next section, the methodology used to simulate the inverse mapping½F0,*1, . . . ,*50&! ½bHg0 , bQ,b!sub& is described.

5. Resolution of the inverse problem

5.1. Methodology applied to solve the inverse problem

The methodology applied can be divided in three parts:

(i) First, the Monte Carlo method realizations are performed such that for eachrealization ½Hg0j , Qj, !subj &, a realization of the output radiated pressure !r(t, )) isobtained, using the Lous model, and the fundamental frequency F0()) and the 50 firstharmonics *1()), . . . ,*50()) associated to that realization are then extracted.

In other words, a mapping )trs¼ [Hg0,Q,!sub]trs ! +trs¼ [F0,*1, . . . ,*50]trs is realized,where the remainder trs stands for training set.

Here, three sets, )trs1, )trs2 and )tes, will be generated, originating from the three sets+trs1 , +trs2 and +tes . The first two will be used to train an ANN in different situations andthe last one will be used to test the ANNs. The properties of these three sets aresummarized in Table 2.

(ii) In the second part, an ANN is designed (details about it will be given later) and asupervized training is performed. A function to estimate the mappings from the randomvectors +trs1 and +trs2 to the random vectors )trs1 and )trs2 is created.

(iii) Finally, the ANN is tested with sets +tes and )tes. Set +tes is presented as an input tothe ANN, and, as an output, a random vector b)tes is obtained. This random vector is anestimate of )tes. Hence, the quality of the inversion is assessed by comparing the randomvectors )tes and b)tes.

5.2. Considerations about the ANN designed

5.2.1. Evaluation of the ANN performance when used to estimate the histogram of a p.d.f.

After the training phase is performed, the sets [Hg0,Q,!sub]trs1 and [Hg0,Q,!sub]tesare used to construct histograms of the probability densities for each random variable.

Table 2. Summary of the data sets used in the simulations.

Parameter MeanRelative

deviation (%)Number ofrealizations Purpose Set name

Q 0.94 2.5 8000 Training trs1!sub (Pa) 705 1Hg0 (m) 2.4e!4 3

Q 0.94 2.5 2000 Testing trs1!sub (Pa) 705 1Hg0 (m) 2.4e!4 3

Q 0.94 4.5 4000 Training trs2!sub (Pa) 705 3Hg0 (m) 2.4e!4 6

Inverse Problems in Science and Engineering 11

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 26: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The histograms (actual and estimated) are constructed with equal bin width andnormalized, dividing the height of each bin by the number of realizations. The differencebetween the actual and estimated histograms is calculated as follows. Let S be Q, !sub orHg0 and let ES be the following quantity:

ES ¼Pnb

i¼1 j pSi ! pSij

2, ð25Þ

where pSi is the probability of random variable S for bin i, pSiis its estimated value and nb

is the number of bins. As the histograms are normalized, ES can be interpreted as an indexwhich indicates the similarity between the histograms. Consequently, ES¼ 0 indicates atotal agreement between the estimated and the so-called actual histograms and ES¼ 1indicates that the histograms have no common bin.

The efficiency of the estimator obtained after training the ANN depends on therandom initial value of the neural weights, and on the repartition of the realizationsbetween training set, testing set and validation set. Thus, each independent training,related to the set trs1, results in a different estimator. To ensure that the process used tobuild the estimator is consistent for several independent trainings, the training is repeated100 times.

After each training, a value ES is obtained, and, after 100 training, the meanperformance mES and its relative deviation 'ES are calculated. This value of 100 repetitionsshows to be sufficient for the mean square of the training performance mES to converge toa stable value. The features of the ANN are chosen such that mEHg0

, mEQ and mE!subare

minimized.

5.2.2. Pre-processing of the data

The values of the input and output parameters processed by the ANN vary on a widescale. !sub is of several hundreds of Pa, while Hg0 is of the order of 10!3 meters.

As such discrepancies are not desirable, scaling factors are applied to each of the inputand output parameters. Those scaling factors are calculated from the set trs1 so that anyrealization of the training set lies between !1 and 1. The calculated scaling factors are thenused to normalize any input vector presented to the network, and to denormalize anyoutput vector.

5.2.3. Training methodology

Set trs1 is randomly divided into three parts: 70% is used for training, 15% for testing and15% for validation check. The Levenberg–Marquardt back-propagation algorithm is usedto update the weight and biases of the neurons as it showed to be the fastest, while alsoresulting in good estimates. Actualization is performed after all the realization of thetraining set are processed by the ANN.

For the ANN to have good generalization properties, early stopping is used as aregularization method. The performance indicator used during training is the mean ofsquared error (MSE). After five consecutive epochs with decrease in the performance whenestimating the validation set, the training is stopped. Bayesian regularization have alsobeen implemented and did not bring notable increase in performance while increasing thetraining time.

12 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 27: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The network to be trained has one hidden layer as it was proven it is sufficient to makean arbitrarily accurate approximation of any function [15]. The transfer function used forthe neurons of the hidden layer is a tan-sigmoid function, and, a linear function is used forthe output layer.

The simulations are performed using the Matlab" Neural Networks Toolbox.The number of neurons in the hidden layer is chosen such that mEHg0

, mEQ and mE!sub

are minimized.The results of this process is shown in Figure 8.The number of neurons used in the hidden layer for the ANN is set to 55, which

permits a good trade-off between the network training time and the accuracy of theestimators.

The characteristics of the resulting ANN are summarized in Table 3.

5.3. Results

Two cases will be considered. In the first case, the ANN will be trained with set trs1, and,in the second case, the ANN will be trained with set trs2. In both cases, the ANN will betested with the set tes. One should note that the sets trs1 and tes were generatedconsidering the same statistics. However, trs2 and tes use different statistics. In each case,the ANN will be trained 100 times, resulting 100 values for EHg0 , EQ and E!sub

.

Figure 8. Difference of the histograms as a function of the number of neurons in the hidden layer.EHg0 : dotted line, EQ: dashed line, E!sub

: solid line.

Table 3. Optimized topology and average training time of the ANN.

ANN topology Back-propagation algorithm Training time (s)

[51( 55(tan! sigmoid)( 3(linear)] Levenberg–Marquardt 500

Inverse Problems in Science and Engineering 13

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 28: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The results obtained for the first case, considering the minimum value, the mean valueand the deviation, relative to the mean value, are summarized in Table 4.

The histograms which correspond to the lowest values of EHg0, EQ and E!sub

arepresented in Figures 9–11.

The results shown in Table 4 indicate a consistency of the estimators, as the relativedeviations are about 30% of the mean values. Then, the construction of the estimators forthe inverse mapping using ANN, in this case, is suitable.

The second case is then performed, where the training and experimental sets(respectively, trs2 and tes) have different statistical properties.

The comparison between the histograms of the random variables and their estimatorsare presented in Figures 12–14.

2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9

x10−4 x10−4

0.02

0.04

0.06

0.08

0.1

Hg0(m) Hg0(m)

pH

g0(H

g0)

pHg0(H

g0)

Set tes

(a)

2.3 2.4 2.5 2.6 2.7 2.8 2.9

0.02

0.04

0.06

0.08

0.1

(b)

Figure 9. Realizations repartition for Hg0 (a) and bHg0 (b). Both histograms were constructedconsidering the minimum value of EHg0 .

0.85 0.9 0.95 1 1.05 1.1

0.02

0.04

0.06

0.08

0.1

Q

pQ

(Q)

pQ

(Q)

Set tes

(a)

0.85 0.9 0.95 1 1.05 1.1

0.02

0.04

0.06

0.08

0.1

Q

(b)

Figure 10. Realizations repartition for Q (a) and bQ (b).

Table 4. Statistics for the distance between the histograms aftersimulations.

Minimumvalue (%)

Meanvalue (%)

Relativedeviation (%)

EHg01.5 4.1 27

EQ 1.5 3.7 37E!sub

2.1 4.7 35

14 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 29: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

(a)

670 680 690 700 710 720 730

0.02

0.04

0.06

0.08

0.1

(Pa)

p (

)

(b)

Figure 11. Realizations repartition for !sub (a) and b!sub (b). Both histograms were constructedconsidering the minimum value of E!sub

.

0.85 0.9 0.95 1 1.05 1.1

0.02

0.04

0.06

0.08

0.1

Q

pQ

(Q)

Set tes(a)

0.9 0.95 1 1.05 1.10.85

0.06

0.08

0.1

0.02

0.04

Q

pQ

(Q)

(b)

Figure 13. Realizations repartition for Q (a) and bQ (b).

(a)

680 690 700 710 720 730670

0.02

0.04

0.06

0.08

sub (Pa)

p sub(

sub)

(b)

Figure 14. Realizations repartition for !sub (a) and b!sub (b).

2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9

x10−4 x10−4

0.02

0.04

0.06

0.08

0.1

Hg0(m) Hg0(m)

pH

g0(H

g0)

pHg0(H

g0)

Set tes

(a)

2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9

0.02

0.04

0.06

0.08

0.1

(b)

Figure 12. Realizations repartition for Hg0 (a) and bHg0 (b).

Inverse Problems in Science and Engineering 15

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 30: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

The values for the distance between the histograms (EHg0 , EQ and E!sub) are presented

in Table 5The estimator b!sub presents the largest mean value variation, which is still reasonable.

The estimators bHg0 and bQ show almost the same error rate as when the ANN is trainedwith the set trs1.

6. Conclusion

A methodology to estimate the probability density functions of three random variablesassociated to control parameters in a non-linear model for producing voice was developed.The methodology consists in solving an inverse stochastic problem using an ANN in theplace of considering the model itself.

At first, the mean and the relative deviation values considered for the training and thetest set were the same and the results showed good accuracy when the random variableshistograms were compared with the estimators histograms.

Another case was performed, where the relative deviation values for the randomvariables of the training and the test sets were different, to assess the generalizationcapability of the ANN. For one of the random variables, !sub, the quality of the estimatordecayed, but the error rate is acceptable.

Although the system used was non-linear and stochastic, this aricle showed that it ispossible to identify some parameters using an ANN. The idea is then to use moreparameters extracted from the voice signals in order to improve the quality of theparameters estimation.

Acknowledgements

The authors acknowledge FAPERJ (Fundacao de Amparo a Pesquisa do Rio de Janeiro, CAPES(CAPES/COFECUB project 672/10) and CNPq (Brazilian Agency: Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico) for the financial support they gave to this research.

References

[1] D. Cook, E. Nauman, and L. Mongeau, Ranking vocal fold model parameters by their influence onmodal frequencies, J. Acoust. Soc. Am. 126(4) (2009), p. 2002.

[2] N. Lous, G. Hofmans, R. Veldhuis, and A. Hirschberg, A symmetrical two-mass vocal-fold modelcoupled to vocal tract and trachea, with application to prosthesis design, Acta Acust. United Acust.84(6) (1998), pp. 1135–1150.

Table 5. Optimised network performance for the estimators bHg0 , bQ and b!sub.

Minimumvalue (%)

Meanvalue (%)

Relativedeviation (%)

EHg03.8 5.7 20

EQ 1.9 4.2 32E!sub

9.5 16.2 30

16 J. Mauprivez et al.

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA
Page 31: These rev 2 - PUC-Rio...Bibliography 87 [Cybe89] CYBENKO, G. Approximations by superpositions of sigmoidal func-tions. Mathematics of Control, Signals, and Systems, v.2, n.4, p. 303–314

[3] N. Ruty, X. Pelorson, A. Van Hirtum, I. Lopez-Arteaga, and A. Hirschberg, An in vitro setup totest the relevance and the accuracy of low-order vocal folds models, J. Acoust. Soc. Am. 121(2007), p. 479.

[4] E. Cataldo, C. Soize, R. Sampaio, and C. Desceliers, Probabilistic modeling of a nonlineardynamical system used for producing voice, Comput. Mech. 43(2) (2008), pp. 265–275.

[5] K. Ishizaka and J. Flanagan, Synthesis of voiced sounds from a two-mass model of the vocal cords,Bell Syst. Tech. J. SI 51 (1972), pp. 1233–1268.

[6] X. Pelorson, A. Hirschberg, R. van Hassel, A. Wijnands, and Y. Auregan, Theoretical andexperimental study of quasisteady-flow separation within the glottis during phonation. Applicationto a modified two-mass model, J. Acoust. Soc. Am. 96 (1994), p. 3416.

[7] G. Fant, Acoustic Theory of Speech Production with Calculations Based on X-ray Studies ofRussian Articulations, Mouton, The Hague, 1960.

[8] I.R. Titze, Principles of Voice Production, Prentice-Hall, Englewood Cliffs, NJ, 1994.[9] C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 2005.[10] J. Mauprivez, R. Sampaio, and E. Cataldo, Parameters fitting for a two-mass model for the vocal

folds using neural networks, 30th Iberian-Latin-American Congress on Computational Methodsin Engineering, Armacao de Buzios, Brazil, 2009.

[11] P. Alku, Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, SpeechCommun. 11(2–3) (1992), pp. 109–118.

[12] M. Airas, TKK Aparat: An environment for voice inverse filtering and parameterization,Logopedics Phonatrics Vocol. 33(1) (2008), pp. 49–64.

[13] N. Henrich, G. Sundin, D. Ambroise, C. d’Alessandro, M. Castellengo, and B. Doval, Justnoticeable differences of open quotient and asymmetry coefficient in singing voice, J. Voice 17(4)(2003), pp. 481–494.

[14] D. Sciamarella and C. d’Alessandro, On the acoustic sensitivity of a symmetrical two-mass modelof the vocal folds to the variation of control parameters, Acta Acust. United Acust. 90(4) (2004),pp. 746–761.

[15] K. Hornik, M. Stinchcombe, and H. White, Universal approximation of an unknown mapping andits derivatives using multilayer feedforward networks, Neural Networks 3(5) (1990), pp. 551–560.

Inverse Problems in Science and Engineering 17

Dow

nloa

ded

by [B

iblio

teca

s E D

ocum

Div

], [J

ulie

n M

aupr

ivez

] at 1

1:30

28

Nov

embe

r 201

1

DBD
PUC-Rio - Certificação Digital Nº 0721417/CA