Post on 16-Jan-2016
Similarity Matrix Processing for Music Structure Analysis
Yu Shiu, Hong Jeng
C.-C. Jay Kuo
ACM Multimedia 2006
System Framework
Pitch Class Profile (PCP)
• The PCP vector is a 12-dimensional vector, which shows the relative intensities of the 12 pitch classes, {C, C#, D, D#, E, F, F#, G, G#, A, A#,B}
• Normalized to a unit vector
Pitch Class Profile (PCP)
Measure-based Similarity Matrix
• Previous similarity matrix– Pre-defined window size– results in a similarity matrix of a large
size that makes further processing more expensive
• In this paper– Use measure as the element of
similarity matrix
Measure-based Similarity Matrix
• PCP Vector generation– choose a window size that is equal to
the duration of one half beat– Detect onset signal
• compute the change of the spectral content between two adjacent shifting windows of 20ms long and with 50% overlap
Measure-based Similarity Matrix
– the autocorrelation function (ACF) of the onset signal is calculated to determine the beat period
– Example:• 100BPM → length of half beat is 300 ms• Longer than the window size commonly
use in previous work
Measure-based Similarity Matrix
• Grouping N successive PCP vectors
• Since PCP vectors are unit vectors, 0 <= sij <= 1
• dynamic time warping (DTW) can be used to enhance the sij value
Dynamic Time Warping
Measure-based Similarity Matrix
• After the simplification, a 3-minute song with a tempo of 100BPM can form a 75 × 75 similarity matrix
• MSM reveals more the chord similarity rather than the melody similarity
• Johnny Cash’s Hurt repeatedly uses the chord succession {Am, Am, C, D} in the 1st and 3rd sections while {G, A, F, C} in the 2nd and 4th sections.
• Beatles’ Yesterday does not have chord succession of short periods. Its music form structure is P = {I V V C V C V O}
Two MSM Examples
Detection of Local Similarity
• Using a 2D moving window
Detection of Local Similarity• move the 2D moving window along
the diagonal line of the MSM
Detection of Long Range Similarity
• The Viterbi algorithm is used to find segments with consecutive large similarity values along the 45-degree direction
• we can exploit the output from the second module that provides the chord succession similarity to enhance the long range similarity detection.
Detection of Long Range Similarity
• interpret the x-axis as the “time”, the y-axis as the “state”
Detection of Long Range Similarity
• use “scores” instead of “probabilities”
• The score of a path is defined as the product of similarity value of all states and scores of all state transitions
Detection of Long Range Similarity
• PT0 > PT1 to guarantee the preference along the 45-degree direction.– The larger the ratio, the more favorable
the path will proceed along the 45-degree direction.
– In our experiment, the ratio PT0/PT1 is chosen to be 1.5
Detection of Long Range Similarity
• Pruning with Chord Succession Information– sections with repetitive chord
successions of a certain period should be similar to sections of same period
– A period value p is tagged to a measure
Detection of Long Range Similarity
Post-processing
• we begin with the state j that gives the highest Q(L, j) at time L, and perform a back-tracking process.
• Segments with length smaller than φ measures are removed– In our implementation, φ = 8.
• Segments whose mean similarity value is less than a threshold, τ , are removed– τ = mean + standard deviation (for all sij)
Post-processing
• Each segment should be divided– if their two corresponding sections in the song
overlap with each other– if there is a significant difference between
similarity values before and after a certain point in the segment.
• If there are conflicts on sections, the one with a higher similarity value has the priority to keep the boundaries
• For those songs in verse-chorus form, similarity values are clustered into two classes– high similarity values are claimed to be the chorus
Experiment
• collection of 120 pop, country and rock songs after 60’s.
• 100 of them are of the verse-chorus form and 20 are of the AAA or other form
• mono audio sampled at a rate of 22,050Hz, with 16 bits per sample.
Experimental Results
• The pattern extraction of a song is claimed to be correct if all patterns in the song are extracted without distinguishing between verse and chorus
• The accurate detection rate is 112/120 = 93.33%.
Experimental Results