Learning Musicianship for Automatic Accompaniment
Transcript of Learning Musicianship for Automatic Accompaniment
Carnegie Mellon
Learning Musicianship for Automatic Accompaniment
Gus (Guangyu) Xia Roger Dannenberg School of Computer Science Carnegie Mellon University
Carnegie Mellon
Introduction: Musical background
2
Interaction
Expression Rehearsal
Carnegie Mellon
Introduction: Technical background
3
Interaction Musicianship Score following and automatic
accompaniment
Expressive Performance
Expressive Interactive Performance
Carnegie Mellon
Introduction: Problem definition
4
¢ How to interpret the music based on the expression of human musicians?
¢ How to distill models from rehearsals? ¢ What are the limits of validity of the learned models? ¢ How many rehearsals are needed?
For interactive music performance, how can we build artificial performers that automatically improve their ability to sense and coordinate with human musicians’ expression with rehearsal experience?
We start from piano duets, focusing on expressive timing and expressive dynamics.
Carnegie Mellon
5
Outline
¢ Introduction ¢ Data Collection ¢ Methods ¢ Demos ¢ Conclusion & Future Work
Carnegie Mellon
Current Data Collection
¢ Musicians: § 10 music master students play duet pieces in 5 pairs.
¢ Music pieces: § 3 pieces of music are selected, Danny boy, Serenade
(by Schubert), and Ashokan Farewell. § Each pair performs every piece of music 7 times.
¢ Recording settings: § Recorded by electronic pianos with MIDI output.
6
Carnegie Mellon
7
Outline
¢ Introduction ¢ Data Collection ¢ Methods ¢ Demos ¢ Conclusion & Future Work
Carnegie Mellon
8
Method Overview ¢ From “local” to “general”
§ Local: low-dimensional feature space, only apply to certain notes
§ General: high-dimensional feature space, apply to the whole piece of music
¢ Base line:
Reference time
Rea
l tim
e
Score Following Automatic Accompaniment
Reference time
Rea
l tim
e
next note’s reference time
predicted next note’s performance time
Carnegie Mellon
Method (1): Note-specific approach
¢ Idea: § Expressive timings of the notes are linearly correlated. § Predict the expressive timing of 2nd piano by the
expressive timing of 1st piano.
9
¢ Model:
! = [!! , !! ,… , !!]!
! = !! ,!! ,… ,!! !
!! = !!!! + !!!!!!"#$(!!)!!!
!!!!
Carnegie Mellon
Result: Note-specific approach
10
0 10 20 30 40 50 600
0.05
0.1
0.15
0.2
0.25
Score time (sec)
Tim
e re
sidu
al (s
ec)
BLNote−8Note−34
Mean Absolute Error: BL: 0.098 Note-8: 0.087 Note-34: 0.060
Carnegie Mellon
Method (2): Rhythm-specific approach
¢ Idea: § Notes with same score rhythm context share parameters. § Introduces an extra dummy variable to encode the score
rhythm context of each note .
11
¢ Model:
! = [!! , !! ,… , !!]!
! = !! ,!! ,… ,!! !
!! = !!!!!"!!(!!,!) + !!!!!"!!(!!,!)!!"#$(!!)!!!
!!!!
!
Carnegie Mellon
Result: Rhythm-specific approach
12
Mean Absolute Error: BL: 0.098 Rhythm-4: 0.084 Rhythm-8: 0.067
0 10 20 30 40 50 600
0.05
0.1
0.15
0.2
0.25
Score time (sec)
Tim
e re
sidu
al (s
ec)
BLRhythm−4Rhythm−8
Carnegie Mellon
Method (3): General feature approach
¢ Idea: § Make the model more general. § Predict the expressive timing by considering more
than score rhythm context .
13
¢ Model: ! = !!!
! = !! ,!! ,… ,!! !
! = !! ,!! ,… ,!! !
Carnegie Mellon
Regularization: Group Lasso
¢ Idea: § Reduces the burden for training. § Discover the dominant features that could predict the
expressive timings.
14
¢ Solve:
min!∈!!
! − !" !! + ! !! !! !
!
!!!!
Carnegie Mellon
Result: General feature approach
15
Mean Absolute Error: (ONLY 4 training pieces!) BL: 0.098 LR: 0.072 Glasso: 0.059
0 10 20 30 40 50 600
0.05
0.1
0.15
0.2
0.25
Score time (sec)
Tim
e re
sidu
al (s
ec)
BLLRGlasso
Carnegie Mellon
Method (4): LDS approach
¢ Idea: § Add another regularization by adjacent notes. § Lower dimensional hidden mental states that
control the expressive timings.
16
¢ Model:
zt zt+1 zt-‐1
ut-‐1 ut ut+1
yt-‐1 yt yt+1
… …
!! = !!!!! + !!! + !! !!!!!!!~!(0,!)!!! = !!! + !!! + !! !!!!!!!!~!(0,!)!
Carnegie Mellon
Result: LDS (horizontal regularization)
17
Mean Absolute Error: (ONLY 4 training pieces!) BL: 0.085 LR: 0.072 LDS: 0.067
0 10 20 30 40 50 600
0.05
0.1
0.15
0.2
0.25
Score time (sec)
Tim
e re
sidu
al (s
ec)
BLLRLDS
Carnegie Mellon
A Global View
18
BL Note−4 Rhythm−4 Glasso−4 Note−8 Rhythm−8 Glasso−8 Note−34 Rhythm−34Glasso−340
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Methods and training size
Tim
e re
sidu
al(s
ec)
SerenadeDanny boyAshokan Farewell
Carnegie Mellon
19
Outline
¢ Introduction ¢ Data Collection ¢ Methods ¢ Demos ¢ Conclusion & Future Work
Carnegie Mellon
Some Initial Audio Demo ¢ Base Line:
¢ Note-specific approach: 34 training examples
¢ General feature approach, group lasso: 4 training
examples
20
Carnegie Mellon
21
Future Work
¢ Cross-piece models ¢ Performer-specific models ¢ Online learning and decoding ¢ Plugin with music robots
Carnegie Mellon
22
Conclusion
¢ An artificial performer for interactive performance ¢ Learn musicianship from rehearsal experience ¢ A combination of expressive performance and
automatic accompaniment ¢ Much better prediction just based on 4 rehearsals
Carnegie Mellon
23
Q&A
Carnegie Mellon
Spectral Learning(1): Oblique projections
24
zt zt+1 zt-‐1
ut-‐1 ut ut+1
yt-‐1 yt yt+1
… …
!(!!) = [!!! !!!! !!!!] !!!!!!!
!
¢ We don’t know the future. ¢ Partially explain future observations based on the
history !! ≝ [!!! !!!! !0] !
!!!!0!
Carnegie Mellon
Spectral Learning(2): state estimation
25
zt zt+1 zt-‐1
ut-‐1 ut ut+1
yt-‐1 yt yt+1
… … !! = !!!! ≝!!"!!!!!⋮! !!! !
!! = !Σ!! = (!Σ!!)(Σ
!!!!!)!
¢ States estimation by SVD
¢ Moreover, enforce a bottleneck by throwing out near-zero singular values and corresponding columns in U and V.
Carnegie Mellon
Spectral Learning(3): Estimate parameter
26
¢ Based on estimated hidden states, the parameters could be estimated from the following equation:
zt zt+1 zt-‐1
ut-‐1 ut ut+1
yt-‐1 yt yt+1
… …
!!!! = !!! + !!! + !! !!!!!!~!(0,!)!
!! = !!! + !!! + !! !!!!!~!(0,!)!
!!!!!! = ! !
! !!!!! + !!
!! !