Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.
-
Upload
vivien-gibson -
Category
Documents
-
view
226 -
download
1
Transcript of Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.
Music Database QueryMusic Database Queryby Audio Inputby Audio Input
Zvika Ben-HaimZvika Ben-Haim
Advisor: Gal AshourAdvisor: Gal Ashour
Purpose of the ProjectPurpose of the Project
Software
Song nameRecorded melody
Presentation OverviewPresentation Overview
DemonstrationDemonstration InternalsInternals ResultsResults ConclusionsConclusions
Program DemonstrationProgram Demonstration
Inside the ProgramInside the Program
Vocal Input
Segmentation
Database Search
List of Best Matches
Pitch Detection
Volume Detection
Definition of InputDefinition of Input
The input is sung by a human, who The input is sung by a human, who does not need to have any does not need to have any knowledge of music.knowledge of music.
The program was optimized for The program was optimized for singing using the syllables “da-da-singing using the syllables “da-da-da” or “ti-ti-ti”. All testing was da” or “ti-ti-ti”. All testing was performed on this type of input.performed on this type of input.
Input
Pitch Detection
Segmentation
Search
Pitch DetectionPitch Detection
The super-resolution pitch detection The super-resolution pitch detection algorithm achieves accurate detection algorithm achieves accurate detection values without increasing CPU time, values without increasing CPU time, by performing linear interpolation on by performing linear interpolation on aalow sampling rate recording.low sampling rate recording.
Detection is performed in a pitch-Detection is performed in a pitch-synchronous fashion (one pitch value synchronous fashion (one pitch value for each cycle).for each cycle).
Input
Pitch Detection
Segmentation
Search
40
50
60
70
80
90
100
5 6 7 8 9 10
Time (Sec)
Fre
quen
cy (Sem
iton
es)
Vol
ume
Volume
Pitch
Pitch/Volume DetectionPitch/Volume Detection
Input
Pitch Detection
Segmentation
Search
Segmentation (1/3)Segmentation (1/3)
Sequence of Pitches and Volumes
Sequence of Notes
Volume-Based Segmentation
Pitch-Based Segmentation
VoiceNoise
Note IdentificationIgnore
Input
Pitch Detection
Segmentation
Search
Decision
Segmentation (2/3)Segmentation (2/3)
Volume Segmentation:Volume Segmentation: Possible Possible notes are identified as a region in notes are identified as a region in which the volume is higher than a which the volume is higher than a trigger value.trigger value.
Thus, it’s important to separate Thus, it’s important to separate each note by a short quiet period, each note by a short quiet period, e.g. by pronouncing “ta-ta-ta” e.g. by pronouncing “ta-ta-ta” rather thanrather than“la-la-la”.“la-la-la”.
Input
Pitch Detection
Segmentation
Search
Segmentation (3/3)Segmentation (3/3)
Pitch Segmentation:Pitch Segmentation: Within each Within each segment, find the longest region in segment, find the longest region in which the pitch is relatively constant.which the pitch is relatively constant.
Noise Removal:Noise Removal: If this region is very If this region is very short, then the segment is assumed to short, then the segment is assumed to be noise, and it is ignored.be noise, and it is ignored.
Conversion to Notes:Conversion to Notes: The frequency of The frequency of the note is identified by an iterative the note is identified by an iterative averaging technique.averaging technique.
Input
Pitch Detection
Segmentation
Search
Segmentation ExampleSegmentation Example
Input
Pitch Detection
Segmentation
Search
Database SearchDatabase Search
Sequence of Notes
Convert to relative frequencies and durations
Find edit distance for each database entry
Sort by increasing edit cost
List of Best Matches
Input
Pitch Detection
Segmentation
Search
Edit Distance (1/3)Edit Distance (1/3)
Purpose: Correction of errors in singing Purpose: Correction of errors in singing and in previous identification steps.and in previous identification steps.
Mechanism: The edit distance is the Mechanism: The edit distance is the minimum cost required to transform minimum cost required to transform one string into another. The following one string into another. The following changes can be applied at given costs:changes can be applied at given costs:• Change one character into anotherChange one character into another• Insert one characterInsert one character• Delete one characterDelete one character
Input
Pitch Detection
Segmentation
Search
Edit Distance (2/3)Edit Distance (2/3)
Input
Pitch Detection
Segmentation
Search
How to make an elephant become elegant:
elephant
eleghantReplace
elegantDelete
Example:
Total edit distance is the cost of replacing‘p’ with ‘g’, plus the cost of deleting ‘h’.
Edit Distance (3/3)Edit Distance (3/3)
Algorithms differ by the content of the Algorithms differ by the content of the strings being compared. Three strings being compared. Three algorithms were checked:algorithms were checked:• Parsons code: Only the direction of pitch Parsons code: Only the direction of pitch
change is compared (up, down, or repeat).change is compared (up, down, or repeat).• Frequency similarity: The direction and size Frequency similarity: The direction and size
of pitch change (e.g., up 3 semitones).of pitch change (e.g., up 3 semitones).• Frequency/Duration similarity: Both pitch Frequency/Duration similarity: Both pitch
change and relative duration of notes (e.g., change and relative duration of notes (e.g., up 3 semitones, and a longer note).up 3 semitones, and a longer note).
Input
Pitch Detection
Segmentation
Search
ResultsResults
SimulationSimulation
Simulations of the search engine Simulations of the search engine were performed in order to have a were performed in order to have a larger ensemble, from which a larger ensemble, from which a detection probability was calculated.detection probability was calculated.
Random noise was added to the first Random noise was added to the first few notes of a tune. The tune was few notes of a tune. The tune was then applied to the search engine.then applied to the search engine.
Comparison ofComparison ofSearch AlgorithmsSearch Algorithms
0
10
20
30
40
50
60
70
80
90
100
3 4 5 6 7 8 9 10
Number of Notes in Query
Pro
bab
ilit
y o
f C
orr
ect
Iden
tifi
cati
on
(%
)
Parsons Frequency Frequency/Duration
Empirical TestEmpirical Test
Subjects listened to a sample Subjects listened to a sample query.query.Then, they chose a song from the Then, they chose a song from the database, and were told to sing it database, and were told to sing it in a similar manner.in a similar manner.
Number of test subjects: 14Number of test subjects: 14Number of recorded songs: 64Number of recorded songs: 64Number of songs in database: 197Number of songs in database: 197
Empirical ResultsEmpirical Results
Algorithm Identified asTop Match
Identified asTop Five
Freq/Dur 80% 86%
Frequency 77% 88%
Parsons 52% 73%
Human 45%-65%
ConclusionsConclusions
Combined frequency/duration Combined frequency/duration search is the most robust search search is the most robust search algorithm tested, and outperforms algorithm tested, and outperforms the Parsons code search by a wide the Parsons code search by a wide margin.margin.
The program performs better than The program performs better than an average human under the an average human under the tested conditions.tested conditions.
SummarySummary
A successful melody search engine A successful melody search engine has been created.has been created.
Real-time software implementation Real-time software implementation is possible.is possible.
The new frequency/duration search The new frequency/duration search algorithm was found more algorithm was found more effective than the existing Parsons effective than the existing Parsons code search.code search.
The EndThe End