Video-based Vibrato Detection and Analysisfor Polyphonic ...€¦ · Pitch Vibrato Detection &...
Transcript of Video-based Vibrato Detection and Analysisfor Polyphonic ...€¦ · Pitch Vibrato Detection &...
Video-based Vibrato Detection and Analysis for Polyphonic String Music
Bochen Li, Karthik Dinesh, Gaurav Sharma, Zhiyao Duan
Audio Information Research LabUniversity of Rochester
The 18th International Society for Music Information Retrieval
Oct 23-27, 2017Suzhou, China
118TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Introduction: Vibrato in Music
218TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Important artistic effect• Pitch modulation of a note in a periodic fashion• Characterized by Rate & Extent Spectrogram
Audio
Vibrato
Non-vibrato
Applications of Vibrato Analysis
• Musicological studies• Sound synthesis• Voice extraction
Introduction: Problem Statement
318TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Note-level vibrato/non-vibrato classification
Vibrato Detection
Vibrato Analysis
• Vibrato rate: speed of pitch variation (1/T Hz)
• Vibrato extent: amount of pitch variation (A cents)
T
A
Pitch
Time
Time
Pitch
Vibrato Detection & Analysis for polyphonic music played by string instruments
Introduction: Prior Audio-based Methods
418TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Score-informed [Abeßer et al. 2015] (Baseline)
• Template-based [Driedger et al. 2016]
• Harmonic partial [Hsu et al. 2010]Major drawbacks
• One source from mixture• Fails in high polyphony
Proposed Method Overview and Key Contribution
518TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Pitch
PitchSpec
Hand Hand Displacement
Ground-truth
Audio-based, Poly
Video-based
0.2 0.4 0.6 0.8 1.0 1.2 sec
0 0.2 0.4 0.6 0.8 1.0 1.2 sec
0 0.2 0.4 0.6 0.8 1.0 1.2 sec
Proposed Method Overview
6
Score Alignment
Motion Feature
Extraction
Track Association
Vibrato Detection
VibratoAnalysis
Video-based Method
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Extent
Rate
7
Score Alignment
Motion Feature
Extraction
Track Association
Vibrato Detection
VibratoAnalysis
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Extent
Rate
Proposed Method
Score Alignment
818TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Proposed Method
Score Alignment• Chroma feature
• Dynamic Time Warping
9
Score Alignment
Motion Feature
Extraction
Track Association
Vibrato Detection
VibratoAnalysis
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Extent
Rate
Proposed Method
Track-player Association
1018TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Proposed Method
Track-player Association• Bow motion <--> Score onset
• Previous work [Li et al. 2017]
11
Score Alignment
Motion Feature
Extraction
Track Association
Vibrato Detection
VibratoAnalysis
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Extent
Rate
Proposed Method
Track-player Association
Proposed Method
12
Motion Feature Extraction• Hand tracking
- KLT tracker with 30 feature points
- Bounding box: 70 x 70 pixels
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Proposed Method
Motion Feature Extraction• Fine-grained motion capture
- Optical flow estimation à pixel-level motion velocities
- Frame-wise average:
- Subtract moving mean:
Original Frame Color-encoded Optical Flow v(t)18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
14
Score Alignment
Motion Feature
Extraction
Track Association
Vibrato Detection
VibratoAnalysis
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Extent
Rate
Proposed Method
Track-player Association
Proposed Method
15
Vibrato DetectionMethod 1: Supervised framework
• Support Vector Machine (SVM)
• 8-D featureZero-crossing rate (4-D)Frequency (2-D)Auto-correlation peaks (2-D)
• Leave-one-out training strategy
Classifier
Vibrato / Non-vibrato
8-D
t
Note segment
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Proposed Method
16
Vibrato DetectionMethod 2: Unsupervised framework
• Principal Component Analysis (PCA)
• 1-D Motion Velocity Curve:
• Integration à Motion Displacement Curve:
X (t)
0.2 0.4 0.6 0.8 1.0 1.2 Time
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
17
Score Alignment
Motion Feature
Extraction
Track Association
Vibrato Detection
VibratoAnalysis
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Extent
Rate
Proposed Method
Vibrato Analysis
Proposed Method
18
Vibrato Analysis
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Motion rate = Vibrato rate
• Quadratic interpolation
• Peak distance on auto-correlation of motion curve X(t)
Rate
Ground-truth pitch contour
Motion displacement
Curve X(t)
0 0.2 0.4 0.6 0.8 1.0 1.2 sec
0 0.2 0.4 0.6 0.8 1.0 1.2 sec
Proposed Method
19
Vibrato Analysis
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
Extent • Motion extent ≠ Vibrato extent
• Pixel à Musical cents
• Scale motion curve X(t) to fit pitch contour
Estimated vib extent
Pitch contour
Motion extent
Estimated pitch contour Motion displacement Curve X(t)
Ground-truth pitch contour
Demo of DatasetDataset: URMP Dataset
2018TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Individually recorded in sound booth
• Annotated frame-level / note-level pitch
Demo of DatasetDataset: URMP Dataset
2118TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Assembled together with concert stage background
Experiments: Vibrato Detection Results
22
Overall Evaluation
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Video-based method à 92% F-measure• Improvement over audio-based method• SVM > PCA
Proposed
Experiments: Vibrato Detection Results
23
Impact of Polyphony Number
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Audio-based method: Poly ↗ Performance ↘• Proposed video-based method: Robust
2 3 4 5Poly No.
Baseline Proposed
Experiments: Vibrato Detection Results
24
Variation Based on Type of Instrument
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Audio-based method: Pitch range ↘ Performance ↘• Proposed Video-based method: Robust
Violin Viola Cello BassInstr.
Baseline Proposed
Experiments: Vibrato Analysis Results
25
Vibrato Rate / Extent
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• 2290 vibrato notes• Rate error: 0.38 Hz• Extent error: 3.47 cents
Conclusions
2618TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Proposed video-based vibrato detection/analysis offers significant improvement over conventional audio-only analysis
• Compared to audio-based methods, proposed video-based method is
• Robust for polyphonic sources
• Robust for different types of instruments• Proposed method provides good estimates for vibrato rate
and extent
• A powerful tool for analyzing string ensembles
Thank you!
Experiments: DatasetURMP Dataset• 19 string ensembles (57 tracks)
• 5 duets, 4 trios, 7 quartets, 3 quintets
• Audio: 48k Hz
• Video: 1080P, 29.97 fps
2818TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
URMP Dataset
Demo of DatasetDataset: URMP Dataset• 14 instruments, 44 piece arrangements
2918TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
ExperimentsResults
30
Potential Application on Musicologies
18TH INTERNATIONAL SOCIETY FOR MUSIC INFORMATION RETRIEVAL, OCT 23-27, 2017, SUZHOU, CHINA
• Test on TPs from Vid-PCA method: 2290 vibrato notes• Average error: 0.38 Hz / 3.47 cents• Double bass à lower rate / extent [1]
Vibrato characteristics for different instruments
[1] James Paul Mick. An analysis of double bass vibrato: Rates, widths, and pitches as influenced by pitch height, fingers used, andtempo. PhD thesis, The Florida State University, 2012.