Download - Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Transcript
Page 1: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Music Database QueryMusic Database Queryby Audio Inputby Audio Input

Zvika Ben-HaimZvika Ben-Haim

Advisor: Gal AshourAdvisor: Gal Ashour

Page 2: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Purpose of the ProjectPurpose of the Project

Software

Song nameRecorded melody

Page 3: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Presentation OverviewPresentation Overview

DemonstrationDemonstration InternalsInternals ResultsResults ConclusionsConclusions

Page 4: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Program DemonstrationProgram Demonstration

Page 5: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Inside the ProgramInside the Program

Vocal Input

Segmentation

Database Search

List of Best Matches

Pitch Detection

Volume Detection

Page 6: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Definition of InputDefinition of Input

The input is sung by a human, who The input is sung by a human, who does not need to have any does not need to have any knowledge of music.knowledge of music.

The program was optimized for The program was optimized for singing using the syllables “da-da-singing using the syllables “da-da-da” or “ti-ti-ti”. All testing was da” or “ti-ti-ti”. All testing was performed on this type of input.performed on this type of input.

Input

Pitch Detection

Segmentation

Search

Page 7: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Pitch DetectionPitch Detection

The super-resolution pitch detection The super-resolution pitch detection algorithm achieves accurate detection algorithm achieves accurate detection values without increasing CPU time, values without increasing CPU time, by performing linear interpolation on by performing linear interpolation on aalow sampling rate recording.low sampling rate recording.

Detection is performed in a pitch-Detection is performed in a pitch-synchronous fashion (one pitch value synchronous fashion (one pitch value for each cycle).for each cycle).

Input

Pitch Detection

Segmentation

Search

Page 8: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

40

50

60

70

80

90

100

5 6 7 8 9 10

Time (Sec)

Fre

quen

cy (Sem

iton

es)

Vol

ume

Volume

Pitch

Pitch/Volume DetectionPitch/Volume Detection

Input

Pitch Detection

Segmentation

Search

Page 9: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Segmentation (1/3)Segmentation (1/3)

Sequence of Pitches and Volumes

Sequence of Notes

Volume-Based Segmentation

Pitch-Based Segmentation

VoiceNoise

Note IdentificationIgnore

Input

Pitch Detection

Segmentation

Search

Decision

Page 10: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Segmentation (2/3)Segmentation (2/3)

Volume Segmentation:Volume Segmentation: Possible Possible notes are identified as a region in notes are identified as a region in which the volume is higher than a which the volume is higher than a trigger value.trigger value.

Thus, it’s important to separate Thus, it’s important to separate each note by a short quiet period, each note by a short quiet period, e.g. by pronouncing “ta-ta-ta” e.g. by pronouncing “ta-ta-ta” rather thanrather than“la-la-la”.“la-la-la”.

Input

Pitch Detection

Segmentation

Search

Page 11: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Segmentation (3/3)Segmentation (3/3)

Pitch Segmentation:Pitch Segmentation: Within each Within each segment, find the longest region in segment, find the longest region in which the pitch is relatively constant.which the pitch is relatively constant.

Noise Removal:Noise Removal: If this region is very If this region is very short, then the segment is assumed to short, then the segment is assumed to be noise, and it is ignored.be noise, and it is ignored.

Conversion to Notes:Conversion to Notes: The frequency of The frequency of the note is identified by an iterative the note is identified by an iterative averaging technique.averaging technique.

Input

Pitch Detection

Segmentation

Search

Page 12: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Segmentation ExampleSegmentation Example

Input

Pitch Detection

Segmentation

Search

Page 13: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Database SearchDatabase Search

Sequence of Notes

Convert to relative frequencies and durations

Find edit distance for each database entry

Sort by increasing edit cost

List of Best Matches

Input

Pitch Detection

Segmentation

Search

Page 14: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Edit Distance (1/3)Edit Distance (1/3)

Purpose: Correction of errors in singing Purpose: Correction of errors in singing and in previous identification steps.and in previous identification steps.

Mechanism: The edit distance is the Mechanism: The edit distance is the minimum cost required to transform minimum cost required to transform one string into another. The following one string into another. The following changes can be applied at given costs:changes can be applied at given costs:• Change one character into anotherChange one character into another• Insert one characterInsert one character• Delete one characterDelete one character

Input

Pitch Detection

Segmentation

Search

Page 15: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Edit Distance (2/3)Edit Distance (2/3)

Input

Pitch Detection

Segmentation

Search

How to make an elephant become elegant:

elephant

eleghantReplace

elegantDelete

Example:

Total edit distance is the cost of replacing‘p’ with ‘g’, plus the cost of deleting ‘h’.

Page 16: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Edit Distance (3/3)Edit Distance (3/3)

Algorithms differ by the content of the Algorithms differ by the content of the strings being compared. Three strings being compared. Three algorithms were checked:algorithms were checked:• Parsons code: Only the direction of pitch Parsons code: Only the direction of pitch

change is compared (up, down, or repeat).change is compared (up, down, or repeat).• Frequency similarity: The direction and size Frequency similarity: The direction and size

of pitch change (e.g., up 3 semitones).of pitch change (e.g., up 3 semitones).• Frequency/Duration similarity: Both pitch Frequency/Duration similarity: Both pitch

change and relative duration of notes (e.g., change and relative duration of notes (e.g., up 3 semitones, and a longer note).up 3 semitones, and a longer note).

Input

Pitch Detection

Segmentation

Search

Page 17: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

ResultsResults

Page 18: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

SimulationSimulation

Simulations of the search engine Simulations of the search engine were performed in order to have a were performed in order to have a larger ensemble, from which a larger ensemble, from which a detection probability was calculated.detection probability was calculated.

Random noise was added to the first Random noise was added to the first few notes of a tune. The tune was few notes of a tune. The tune was then applied to the search engine.then applied to the search engine.

Page 19: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Comparison ofComparison ofSearch AlgorithmsSearch Algorithms

0

10

20

30

40

50

60

70

80

90

100

3 4 5 6 7 8 9 10

Number of Notes in Query

Pro

bab

ilit

y o

f C

orr

ect

Iden

tifi

cati

on

(%

)

Parsons Frequency Frequency/Duration

Page 20: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Empirical TestEmpirical Test

Subjects listened to a sample Subjects listened to a sample query.query.Then, they chose a song from the Then, they chose a song from the database, and were told to sing it database, and were told to sing it in a similar manner.in a similar manner.

Number of test subjects: 14Number of test subjects: 14Number of recorded songs: 64Number of recorded songs: 64Number of songs in database: 197Number of songs in database: 197

Page 21: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

Empirical ResultsEmpirical Results

Algorithm Identified asTop Match

Identified asTop Five

Freq/Dur 80% 86%

Frequency 77% 88%

Parsons 52% 73%

Human 45%-65%

Page 22: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

ConclusionsConclusions

Combined frequency/duration Combined frequency/duration search is the most robust search search is the most robust search algorithm tested, and outperforms algorithm tested, and outperforms the Parsons code search by a wide the Parsons code search by a wide margin.margin.

The program performs better than The program performs better than an average human under the an average human under the tested conditions.tested conditions.

Page 23: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

SummarySummary

A successful melody search engine A successful melody search engine has been created.has been created.

Real-time software implementation Real-time software implementation is possible.is possible.

The new frequency/duration search The new frequency/duration search algorithm was found more algorithm was found more effective than the existing Parsons effective than the existing Parsons code search.code search.

Page 24: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.

The EndThe End