Jacek Wołkowicz - Dalhousie University

Click here to load reader

  • date post

    22-Oct-2021
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Jacek Wołkowicz - Dalhousie University

Microsoft Word - pozytywna.docINSTYTUT INFORMATYKI
Ocena:.................................
Specjalno: Inynieria Systemów Informatycznych
Data rozpoczcia studiów: 1 padziernika 2002 r.
yciorys
Nazywam si Jacek Wokowicz. Urodziem si 8 listopada 1983 r. w Warszawie. Edukacj zaczem w Szkole Podstawowej nr 106 im. Ryszarda Suskiego w Warszawie. Nastpnie zdaem do XIV LO im. Stanisawa Staszica w Warszawie, gdzie przez cztery lata uczyem si w klasie o profilu matematyczno-fizycznym z rozszerzonym programem nauczania informatyki. W czasach licealnych braem udzia w Midzynarodowych Turniejach Modych Fizyków gdzie otrzymaem drug nagrod (Helsinki 2001) oraz pierwsz nagrod (Odessa 2002). Po osigniciu bardzo dobrego wyniku na egzaminie wstpnym na Politechnik Warszawsk (PW) rozpoczem studia informatyczne na Wydziale Elektroniki i Technik Informacyjnych. W roku akademickim 2005/2006 wziem udzia wymianie midzynarodowej midzy PW a Dalhousie University w Kanadzie. Mojej edukacji zawsze towarzyszya muzyka. Poza regularn edukacj, uczszczaem do Szkoy Muzycznej I st. im. Witolda Lutosawskiego w Warszawie oraz Szkoy Muzycznej II st. im. Józefa Elsnera w Warszawie w klasie fortepianu. W ramach studiów braem udzia w ponadprogramowych zajciach zwizanych z zagadnieniami Akustyki.
Egzamin dyplomowy
Z wynikiem ................................................................................................................................
.....................................................................................................................................................
.....................................................................................................................................................
Abstract
The methods of Natural Language Processing can be successfully applied to the musical symbolic contents since music can be treated not as an artificial language, but the natural one. Showing that some of the methods from natural language processing work on music leads to the point where we can apply well known methods, such as clustering, plagiarism detection or information retrieval to musical contents. A method of converting complex musical structure to the features corresponding with words for text was introduced. A mutual correspondence between both representations was shown. As far as composer recognition is concerned, keeping in mind that a successful authorship recognition task using n-grams’ statistical analysis was brought, one can assume, that this method will also work for composer attribution. The aim of the work is to create such a tool. The obtained effectiveness of the method is very high.
Key words: Statistical Computing; Content Analysis and Indexing – Linguistic processing; Sound and Music Computing - Methodologies and Techniques; Natural Language Processing
Streszczenie
Techniki Przetwarzania Jzyka Naturalnego mog zosta skutecznie zastosowane do danych muzycznych jeli tylko muzyk bdzie si traktowa jako jzyk naturalny. Pokazanie, e moliwe jest zastosowanie tych technik w muzyce, prowadzi do rozwizania problemów grupowania, wykrywania plagiatów czy wyszukiwania informacji dla danych muzycznych. Rozwaajc problem rozpoznawania kompozytorów, majc na wzgldzie fakt, i skuteczne rozwizania do bliniaczego problemu dla tekstu zostay ju zaproponowane przez analiz statystyk n-gramów, mona przypuszcza, e ta metoda zadziaa równie przy rozpoznawaniu kompozytorów. Zaproponowano metod przejcia ze zoonej notacji muzycznej na cechy odpowiadajce sowom tekstowym. Zostaa pokazana wzajemna jednoznaczno obu form reprezentacji wiedzy. Uzyskano wysok skuteczno zaproponowanej metody.
Sowa kluczowe: Przetwarzanie muzyki, przetwarzanie jzyka naturalnego, sztuczna inteligencja, wyszukiwanie informacji muzycznej.
ROZSZERZONE STRESZCZENIE:
Przetwarzanie muzyki w dzisiejszych czasach staje si istotn spraw. Coraz
wiksze iloci informacji muzycznej s gromadzone zarówno w prywatnych zasobach
jak i w bibliotekach dostpnych przez sie WWW dla kadego. Z tego powodu
narzdzia efektywnego przeszukiwania i automatycznej identyfikacji utworów
muzycznych s coraz bardziej podane. W powyszej pracy zauwaono, e informacja
muzyczna wykazuje wiele cech podobnych do tekstu i wskazano moliwo aplikacji
znanych ju metod przetwarzania jzyka naturalnego i wydobywania informacji w
analizie muzyki. Przybliono równie podstawowe informacje z zakresu
psychoakustyki oraz przedstawiono róne podejcia do przechowywania cyfrowej
informacji muzycznej: bezporedniego zapisu cyfrowego sygnau akustycznego,
kodowania perceptualnego, protokou MIDI oraz komputerowych metod zapisu notacji
muzycznej.
W drugim rozdziale przybliono standard MIDI, gdy to wanie pliki MIDI
zostay uyte jako obiekt bada do szczegóowej implementacji oraz oceny
testowanych algorytmów i prawidowoci. Pliki MIDI s zapisem cyfrowym sesji
protokou MIDI (zbioru wiadomoci MIDI) i skadaj si z bloków zawierajcych
zdarzenia, czyli wiadomoci MIDI z informacj czasow lub dodatkowych informacji
sterujcych. Wskazano, i biorc pod uwag wydobywanie wiadomoci muzycznej,
interesujce s bloki MThd (reprezentujcy plik) i MTrk (reprezentujcy ciek) oraz
zdarzenia: Note-On, Note-Off oraz Tempo. Na ich podstawie da si wyodrbni ca
informacj o czasie i wysokoci odgrywanych dwików. Korzystajc ze znajomoci
budowy plików MIDI zaimplementowano parser wydobywajcy informacj o
sekwencjach dwików dla caego utworu, które nastpnie brane s jako podstawa do
dalszego przetwarzania. Przedstawiono koncepcj uni-gramów jako najmniejszych
jednostek informacji muzycznej dla tej reprezentacji. Wprowadzono równie pojcie n-
gramów jako podstawowej cechy odpowiadajcej sowom tekstowym oraz moliwo
skrótowego zapisu najprostszych ich form. Porównano ten pomys z istniejcymi
propozycjami, gównie z dziedziny MIR (wyszukiwania informacji muzycznej).
Przedstawiono róne przykady analizy korpusu muzycznego. Do bada uyto
utworów fortepianowych zebranych z wielu rónych publicznych stron WWW
autorstwa piciu kompozytorów klasycznych. Pokazano, e muzyka przy przyjtej
reprezentacji spenia prawo Zipf’a. Korzystajc z pojcia entropii informacji
zlokalizowano pozycj sów kluczowych w dokumentach muzycznych.
Przeprowadzono rozkad na wartoci osobliwe macierzy zebranych dokumentów
jednak wyniki nie okazay si w tym przypadku pomocne dla problemu rozpoznawania.
Nastpnie przedstawiono koncepcj algorytmu rozpoznawania kompozytorów
muzycznych. Metoda ta oparta jest na budowie profili kadego kompozytora a
nastpnie porównywania tych profili do profilu klasyfikowanego dokumentu.
Ostateczny rezultat jest wynikiem caociowej oceny podobiestwa rytmu, melodii oraz
sposobu czenia rytmu z melodi. Nie zakadano cisych wartoci parametrów
systemu, lecz pozostawiono zmiennymi dugo n-gramów, stopie starzenia przy
trenowaniu klasyfikatora, normalizacj, ograniczania wielkoci i metody porównania
profili.
on zaimplementowany w jzyku Perl z uyciem biblioteki graficznej Tk.
Zaproponowano format pliku umoliwiajcego przechowywanie utworzonych profili
kompozytorów. W strukturze systemu wyranie wydzielono warstw logiki aplikacji
(podsystem Engine), prezentacji (podsystem UI), obsugi zdarze (modu UI::cmd.pm)
i biblioteki narzdzi dodatkowych niezalenych od gównego zadania aplikacji
(podsystem Utils).
Nawet, gdy rozstrzygnicie programu jest niepewne, po gbszej analizie mona
znale przyczyn takiego stanu rzeczy w analizie twórczoci danego kompozytora.
Analiza szczegóowa algorytmu wskazuje, e prze przyjciu odpowiednich
parametrów, sprawno systemu dla zebranych danych osiga prawie 90%.
Wyznaczono optymaln dugo n-gramów na poziomie 6, 7.
Acknowledgements
I would like to thank my supervisor prof. dr hab. in. Zbigniew Kulka for his insight and helpful
comments on acoustic matters.
I would like to thank my Canadian advisor, Dr. Vlado Keselj for his help in creating the idea, how to take up the composer
recognition.
This work would not have this shape without the incalculable help of the best of friends: beloved sis Ola Kontkiewicz, and
best crony Kuba Gawryjoek.
1.1 Aim of the work .................................................................................... 14
1.2 Music as a natural language – basic information about NLP................ 15
1.3 Psychoacoustic foundations .................................................................. 17
1.4.1 Waveform...................................................................................... 18
1.4.3 MIDI.............................................................................................. 21
2 MUSIC REPRESENTATION IN ALGORITHMS .................................................. 25
2.1 MIDI on computers ............................................................................... 25
2.2 MIDI parsing ......................................................................................... 26
2.4.2 Existing approaches....................................................................... 36
3.1 Building a musical data corpus ............................................................. 38
3.2 N-gram features..................................................................................... 39
3.4 Entropy analysis .................................................................................... 42
4.1 Related work ......................................................................................... 50
4.2.2 Building profiles............................................................................ 53
5.1 Functionality.......................................................................................... 59
5.2 Project.................................................................................................... 60
5.3 Implementation...................................................................................... 62
5.3.6 Testing plug-in .............................................................................. 69
6.1 Results interpretation............................................................................. 70
6.2.5 Aging factor................................................................................... 75
I. Western music system........................................................................... 80
Figure 1.1 Spectral Analysis of c-moll prelude BWV 846, J.S. Bach................. 20
Figure 1.2 Spectral Analysis of Etude c-moll, op. 25 no 12, F. Chopin.............. 20
Figure 1.3 NLP and Music processing domains.................................................. 24
Figure 2.1 Sample MThd chunk header .............................................................. 27
Figure 2.2 Sample MTrk chunk header ............................................................... 27
Figure 2.3 Sample Tempo event.......................................................................... 28
Figure 2.5 Solving the problem of parallelism.................................................... 30
Figure 2.6 Unigrams extraction ........................................................................... 32
Figure 2.7 Gliding window.................................................................................. 33
Figure 2.9 Two sample melodies......................................................................... 34
Figure 3.2 Zipf’s law for text............................................................................... 41
Figure 3.4 Entropy dist ........................................................................................ 45
Figure 3.5 Eigenvalues for the corpus ................................................................. 47
Figure 3.6 SVD dimensions: 1, 2, and 3.............................................................. 48
Figure 3.7 SVD dimensions: 1, 4, and 6.............................................................. 48
Figure 3.8 SVD dimensions: 4, 7, and 8.............................................................. 49
Figure 4.1 Building profiles ................................................................................ 53
Figure 4.2 Measure components for different n-grams values............................ 55
Figure 4.3 Aging example ................................................................................... 57
Figure 5.1 System scheme................................................................................... 60
Figure 5.4 Adding composer window ................................................................. 65
Figure 5.5 Adding composer window ................................................................. 66
Figure 5.6 Application main window.................................................................. 67
Figure 5.7 Application: recognizing window...................................................... 68
- 12 -
Figure A.1 Clefs ................................................................................................... 81
Figure A.3 Staff layout ......................................................................................... 83
Figure A.4 Plethora of music notation potpourri.................................................. 83
- 13 -
Table 2.1 Variable Length Quantities ................................................................ 28
Table 2.2 Pitch and rhythm quantization ........................................................... 37
Table 3.1 Composer Corpus............................................................................... 38
Table 4.1 Training and testing set split .............................................................. 52
Table 6.1 Evaluation of the Frederic Chopin prelude Op. 28 No. 22 ................ 71
Table 6.2 Evaluation of the Ludwig van Beethoven Sonata Op. 49 No. 2 ........ 71
Table 6.3 Evaluation of the Franz Liszt Concert Etude No. 3 ‘Un sospiro’ ...... 72
Table 6.4 Unknown Composers assignments .................................................... 72
Table 6.5 Algorithm results of aging 0.96 ......................................................... 73
Table 6.6 Algorithm results of aging 0.96 with profiles normalization............. 74
Table 6.7 Maximal accuracies for different aging factors ................................. 75
Table 6.8 Results for representative data ........................................................... 76
Table A.1 Pitches ................................................................................................ 81
- 14 -
1.1 Aim of the work
People store large amounts of music on their computers nowadays. They listen
to it almost all the time in the background or sometimes they even treat computers as
mini sound studios that provide them with a great aural relax. These facts make the
problem of processing musical data more and more important. Musical data are still
treated as unstructured binary data left on the same shelf as images, movies, programs
opposite to textual data – easy to process, search, index, driven by a huge bunch of
available computer aided techniques provided by NLP (Natural Language Processing),
IR (Information Retrieval) or TDM (Text Data Mining) like classification, analysis,
generation, summarization, searching and much more.
The aim of the work is to prove that music can be treated as a natural language
and thus the automatic composer recognition system has been developed. The system
was based on the solution of the same problem concerning text provided by Keselj,
Peng, Cercone and Thomas [29]. The program was implemented in Perl, the language
designed for text processing. In order to show the accuracy of the algorithm a corpus of
MIDI files containing piano pieces of various classical composers has been built. Since
an NLP algorithm was to be applied, it has been shown how one can obtain equivalents
of characters and words for music [14] and apply the comparison algorithm as it is. The
system works and one could find that there are a lot to be done in this area, from sound
processing (in order to manage personal music libraries) through MIR (music IR) to
advanced music semantic analysis (musical NLP). Music recognition software tools can
- 15 -
be very important nowadays since there are a lot of web music repositories, and by
now, all of them had to be indexed manually. With automatic, content-based tools one
can build more sophisticated systems, for instance, the one similar to Google for text
[32].
One can think that music is in the same situation as other fine arts, like painting,
dance or sculpture so why we just not treat other arts like natural languages? In my
opinion, it is not possible. There is a big difference between music and writing
compared to other fine arts mentioned above. Music as well as writing uses a kind of
symbolic notation in order to easily exchange and preserve these artworks for next
generations, which none of the other fine arts do.
1.2 Music as a natural language – basic information about NLP
In order to treat music as a natural language, one has to show that music
processing works on the same classes of problems as NLP does. One distinguishes
certain levels of a text processing, listed in the Table 1.1. NLP, as well as music
processing, tries to convey through all levels, from recording (a voice, speech) to
understanding (the meaning of a discourse). Of course, there is no such tool that does
everything at a time, i.e., understands the meaning and gets knowledge from a raw
waveform. In fact, NLP tools concentrate on a certain level trying to move the problem
to the upper level.
Text processing Music processing
phonetics Recorded voice Recording
syntax Words order N-grams, notes order
semantics Words meaning, POS Harmonic functions
pragmatics The meaning of a sentence Phrase structure
discourse Context of a text Interpretation of a piece
Music, similarly to the natural language, can be recorded and presented
primarily as a waveform. On the ‘phonetics’ level one tries to investigate the structure
of a sound, separate and distinguish between notes and instruments. This task combined
with notes recognition is a well known problem to contemporary sound engineers even
- 16 -
if they do not know that they are involved in NLP tasks. This is the major task in NLP
and many different approaches to this task were proved to be successful. Nevertheless,
music is much more complex and sound recognition tasks regarding musical pieces are
still in music’s infancy. A simple explanation with an example of this fact will be
shown later in this chapter.
The second very important similarity results from the fact that music, as well as
text, has the symbolic representation. The first text script system called Cuneiform was
invented in the ancient world of Mesopotamia by the Sumerians about 3200 b.c. [51].
The origins of music scripting are dated back to the 8th century to Carolingian Empire
when the neumatic system, the first notation for music only, was invented, while the
first inscription that may be treated as a basic music notation is dated back to 2000 b.c.
[58]. It is true, that these two facts are disjoined in time, but both music and writing are
the only human activities that have a symbolic representation. This fact allows and
encourages thinking about the music content analysis as the next step of this so called
MLP (Music Language Processing) and similarly to text, can be analyzed on the
semantic and syntactic level. The music score also consists of characters which are
called notes. Similarly to NLP’s morphology and syntax – music has hidden,
grammar-like structure, hidden rules. In this case it is called the harmony. It determines
how to put words (notes) together, how to build well-formed phrases using them. It also
manages the musical meaning of a piece which is the order of chords. In the first case –
notes – we may talk about the syntax of the music while in the second case – chords,
harmonic functions – about the semantics of the certain chord or the pragmatics of a
phrase. This is very similar in its form to one of the main problems of NLP nowadays,
which is grammars analysis. The method of detecting probabilistic harmony was
introduced by Bod [6]. He makes his investigation based on Essen Folksongs
Collection (collection of folks themes, equivalent of the Penn Tree Bank).
The second main NLP’s course of action, which is statistical NLP, can be as
well applied to musical data. MIR (Music Information Retrieval), which is now a
highly exploited domain of research, is an example of this approach. The other problem
with music is that there are no word boundaries and phrasing is driven by harmony, so
one has to figure out the structure of a piece as well as its harmonic representation in
order to successfully retrieve the musical meaning. However, there are methods of
partitioning pieces into smaller themes [55], [57]. Similar problem can be found in
- 17 -
some natural languages that do not contain whitespaces; like the Thai language, a
language of almost 65 million people.
The highest levels of NLP (pragmatics and discourse) are also common for
music. Pieces can be positive (major) or negative (minor). They may represent human
ideas, desires or aspirations (romantic music) as well as depict real situation and actions
(program music). Paul Dukas’s The Sorcerer's Apprentice is the very good example of
program music and was filmed in 1940 in Walt Disney’s animated film Fantasia, in
which Mickey Mouse plays the role of the apprentice.
1.3 Psychoacoustic foundations
While talking about music as a natural language, especially at those low-level
analyses (phonetics, phonology), one has to point out some basic information from
psychoacoustics of hearing and human aural perception.
Sounds are disturbances of pressure that propagate from the source of a sound
through matter (air) as a longitudinal wave. They are perceived by the eardrum,
transmitted through the middle ear into cochlea where mechanical energy from sound is
converted to neural signals and then carried to the brain in order to create a sound scene
(sound picture).
The nature of the sound results from physical features of air and human aural
system. The shape of ear, its dimensions make us more sensitive for the frequencies
from 1 kHz to 4 kHz. The length of the cochlea in the internal aural system limits
human perception to the sounds form the range of 20 to 20 kHz. This information was
implicitly and unwittingly used by people in building and inventing musical
instruments as well as in developing contemporary musical systems. Nowadays they
are also used more consciously in the matter of sound processing and music storage. An
attention will be paid to some other facts form psychoacoustics in all the sections of
this thesis. Nevertheless, they all ensue from the foundations of human sound
perception.
1.4 Music storage approaches
Music can be represented digitally in various ways. However, there are mainly
two types of storage approaches:
- 18 -
1. Raw (Waveform) – the sound recorded by microphones representing nothing
but the motion of the speaker’s (or microphone’s) membrane. The data are kind of
snapshot of a real recording. It generally does not matter whether it is compressed
(mp3, ogg and more other well known formats) or stored explicité (pcm, wav or
aiff format).
2. Symbolic representation – score notations (mus, sib, abc, xml) and MIDI
protocol, which store information about musical events rather then about the actual
sound.
People got used to raw representations because they like to hear “real” artists’
music, not the symbolic version, which is played differently on every machine. The
other reason of this situation comes from the fact that not everyone understands music
in the way he reads. Musical education and studying scores are not such a common
entertainment compared with what used to be in the past. The ease of off-line listening
to the music comes from the rapid prevalence of vinyl Long-Play discs followed by
compact cassettes (audio tapes) and finally, in 1982 – audio compact discs (CD). These
technologies make music available for all, but also fewer people need to be involved in
active creating of music. The progress in compression methods and the rapid
development of personal computers and the Internet allows sharing music through the
web. People are surrounded by music, knowing nothing about it, about its content.
These formats will be briefly described in the following sections.
1.4.1 Waveform
Waveform is an audio format in which music is stored as a digital audio signal.
Analog sound signal is a variation of acoustic pressure usually represented by a
continuous-time voltage signal at the output of microphone, which is then low-pass
filtered, sampled, quantized and binary coded. As an output of this process, a digital
PCM (Pulse Code Modulation) signal is then stored in a file. There are plenty of
possible configurations of sampling rate and quantization depth, however, only one
became more popular then others. The human ear is sensible to the sounds up to 20
kHz. According to Shannon-Kotielnikow theorem, in order to encode a signal with a
maximum component frequency of 20 kHz one has to sample it with the frequency
greater than 40 kHz. Then the information will not be lost and the analog signal will be
able to be reconstructed. In order to leave a safety margin it was decided to sample
- 19 -
sound with a standard of 44.1 kHz.…