Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Music Database QueryMusic Database Queryby Audio Inputby Audio Input

Zvika Ben-HaimZvika Ben-Haim

Advisor: Gal AshourAdvisor: Gal Ashour

Purpose of the ProjectPurpose of the Project

Software

Song nameRecorded melody

Presentation OverviewPresentation Overview

DemonstrationDemonstration InternalsInternals ResultsResults ConclusionsConclusions

Program DemonstrationProgram Demonstration

Inside the ProgramInside the Program

Vocal Input

Segmentation

Database Search

List of Best Matches

Pitch Detection

Volume Detection

Definition of InputDefinition of Input

The input is sung by a human, who The input is sung by a human, who does not need to have any does not need to have any knowledge of music.knowledge of music.

The program was optimized for The program was optimized for singing using the syllables “da-da-singing using the syllables “da-da-da” or “ti-ti-ti”. All testing was da” or “ti-ti-ti”. All testing was performed on this type of input.performed on this type of input.

Input

Pitch Detection

Segmentation

Search

Pitch DetectionPitch Detection

The super-resolution pitch detection The super-resolution pitch detection algorithm achieves accurate detection algorithm achieves accurate detection values without increasing CPU time, values without increasing CPU time, by performing linear interpolation on by performing linear interpolation on aalow sampling rate recording.low sampling rate recording.

Detection is performed in a pitch-Detection is performed in a pitch-synchronous fashion (one pitch value synchronous fashion (one pitch value for each cycle).for each cycle).

Input

Pitch Detection

Segmentation

Search

40

50

60

70

80

90

100

5 6 7 8 9 10

Time (Sec)

Fre

quen

cy (Sem

iton

es)

Vol

ume

Volume

Pitch

Pitch/Volume DetectionPitch/Volume Detection

Input

Pitch Detection

Segmentation

Search

Segmentation (1/3)Segmentation (1/3)

Sequence of Pitches and Volumes

Sequence of Notes

Volume-Based Segmentation

Pitch-Based Segmentation

VoiceNoise

Note IdentificationIgnore

Input

Pitch Detection

Segmentation

Search

Decision


Volume Segmentation:Volume Segmentation: Possible Possible notes are identified as a region in notes are identified as a region in which the volume is higher than a which the volume is higher than a trigger value.trigger value.

Thus, it’s important to separate Thus, it’s important to separate each note by a short quiet period, each note by a short quiet period, e.g. by pronouncing “ta-ta-ta” e.g. by pronouncing “ta-ta-ta” rather thanrather than“la-la-la”.“la-la-la”.

Input

Pitch Detection

Segmentation

Search


Pitch Segmentation:Pitch Segmentation: Within each Within each segment, find the longest region in segment, find the longest region in which the pitch is relatively constant.which the pitch is relatively constant.

Noise Removal:Noise Removal: If this region is very If this region is very short, then the segment is assumed to short, then the segment is assumed to be noise, and it is ignored.be noise, and it is ignored.

Conversion to Notes:Conversion to Notes: The frequency of The frequency of the note is identified by an iterative the note is identified by an iterative averaging technique.averaging technique.

Input

Pitch Detection

Segmentation

Search

Segmentation ExampleSegmentation Example

Input

Pitch Detection

Segmentation

Search

Database SearchDatabase Search

Sequence of Notes

Convert to relative frequencies and durations

Find edit distance for each database entry

Sort by increasing edit cost

List of Best Matches

Input

Pitch Detection

Segmentation

Search

Edit Distance (1/3)Edit Distance (1/3)

Purpose: Correction of errors in singing Purpose: Correction of errors in singing and in previous identification steps.and in previous identification steps.

Mechanism: The edit distance is the Mechanism: The edit distance is the minimum cost required to transform minimum cost required to transform one string into another. The following one string into another. The following changes can be applied at given costs:changes can be applied at given costs:• Change one character into anotherChange one character into another• Insert one characterInsert one character• Delete one characterDelete one character

Input

Pitch Detection

Segmentation

Search


Input

Pitch Detection

Segmentation

Search

How to make an elephant become elegant:

elephant

eleghantReplace

elegantDelete

Example:

Total edit distance is the cost of replacing‘p’ with ‘g’, plus the cost of deleting ‘h’.


Algorithms differ by the content of the Algorithms differ by the content of the strings being compared. Three strings being compared. Three algorithms were checked:algorithms were checked:• Parsons code: Only the direction of pitch Parsons code: Only the direction of pitch

change is compared (up, down, or repeat).change is compared (up, down, or repeat).• Frequency similarity: The direction and size Frequency similarity: The direction and size

of pitch change (e.g., up 3 semitones).of pitch change (e.g., up 3 semitones).• Frequency/Duration similarity: Both pitch Frequency/Duration similarity: Both pitch

change and relative duration of notes (e.g., change and relative duration of notes (e.g., up 3 semitones, and a longer note).up 3 semitones, and a longer note).

Input

Pitch Detection

Segmentation

Search

ResultsResults

SimulationSimulation

Simulations of the search engine Simulations of the search engine were performed in order to have a were performed in order to have a larger ensemble, from which a larger ensemble, from which a detection probability was calculated.detection probability was calculated.

Random noise was added to the first Random noise was added to the first few notes of a tune. The tune was few notes of a tune. The tune was then applied to the search engine.then applied to the search engine.

Comparison ofComparison ofSearch AlgorithmsSearch Algorithms

0

10

20

30

40

50

60

70

80

90

100

3 4 5 6 7 8 9 10

Number of Notes in Query

Pro

bab

ilit

y o

f C

orr

ect

Iden

tifi

cati

on

(%

)

Parsons Frequency Frequency/Duration

Empirical TestEmpirical Test

Subjects listened to a sample Subjects listened to a sample query.query.Then, they chose a song from the Then, they chose a song from the database, and were told to sing it database, and were told to sing it in a similar manner.in a similar manner.

Number of test subjects: 14Number of test subjects: 14Number of recorded songs: 64Number of recorded songs: 64Number of songs in database: 197Number of songs in database: 197

Empirical ResultsEmpirical Results

Algorithm Identified asTop Match

Identified asTop Five

Freq/Dur 80% 86%

Frequency 77% 88%

Parsons 52% 73%

Human 45%-65%

ConclusionsConclusions

Combined frequency/duration Combined frequency/duration search is the most robust search search is the most robust search algorithm tested, and outperforms algorithm tested, and outperforms the Parsons code search by a wide the Parsons code search by a wide margin.margin.

The program performs better than The program performs better than an average human under the an average human under the tested conditions.tested conditions.

SummarySummary

A successful melody search engine A successful melody search engine has been created.has been created.

Real-time software implementation Real-time software implementation is possible.is possible.

The new frequency/duration search The new frequency/duration search algorithm was found more algorithm was found more effective than the existing Parsons effective than the existing Parsons code search.code search.

The EndThe End

Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Documents

Transcript of Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.