Harmonic Analysis of Music Using Combinatory Categorial...

Harmonic Analysis of Music Using

Combinatory Categorial Grammar

Mark Granroth-WildingT

U N I V E RS

ED I N B U

Doctor of Philosophy

Institute for Language, Cognition and Computation

School of Informatics

University of Edinburgh

Abstract

Various patterns of the organization of Western tonal music exhibit hierarchical struc-

ture, among them the harmonic progressions underlying melodies and the metre under-

lying rhythmic patterns. Recognizing these structures is an important part of uncon-

scious human cognitive processing of music. Since the prosody and syntax of natural

languages are commonly analysed with similar hierarchical structures, it is reasonable

to expect that the techniques used to identify these structures automatically in natural

language might also be applied to the automatic interpretation of music.

In natural language processing (NLP), analysing the syntactic structure of a sen-

tence is prerequisite to semantic interpretation. The analysis is made difficult by the

high degree of ambiguity in even moderately long sentences. In music, a similar sort of

structural analysis, with a similar degree of ambiguity, is fundamental to tasks such as

key identification and score transcription. These and other tasks depend on harmonic

and rhythmic analyses. There is a long history of applying linguistic analysis tech-

niques to musical analysis. In recent years, statistical modelling, in particular in the

form of probabilistic models, has become ubiquitous in NLP for large-scale practical

analysis of language. The focus of the present work is the application of statistical

parsing to automatic harmonic analysis of music.

This thesis demonstrates that statistical parsing techniques, adapted from NLP with

little modification, can be successfully applied to recovering the harmonic structure

underlying music. It shows first how a type of formal grammar based on one used

for linguistic syntactic processing, Combinatory Categorial Grammar (CCG), can be

used to analyse the hierarchical structure of chord sequences. I introduce a formal

language similar to first-order predicate logical to express the hierarchical tonal har-

monic relationships between chords. The syntactic grammar formalism then serves as

a mechanism to map an unstructured chord sequence onto its structured analysis.

In NLP, the high degree of ambiguity of the analysis means that a parser must

consider a huge number of possible structures. Chart parsing provides an efficient

mechanism to explore them. Statistical models allow the parser to use information

about structures seen before in a training corpus to eliminate improbable interpreta-

tions early on in the process and to rank the final analyses by plausibility. To apply the

same techniques to harmonic analysis of chord sequences, a corpus of tonal jazz chord

sequences annotated by hand with harmonic analyses is constructed. Two statistical

parsing techniques are adapted to the present task and evaluated on their success at

recovering the annotated structures. The experiments show that parsing using a sta-

tistical model of syntactic derivations is more successful than a Markovian baseline

model at recovering harmonic structure. In addition, the practical technique of statis-

tical supertagging serves to speed up parsing without any loss in accuracy.

This approach to recovering harmonic structure can be extended to the analysis of

performance data symbolically represented as notes. Experiments using some simple

proof-of-concept extensions of the above parsing models demonstrate one probabilistic

approach to this. The results reported provide a baseline for future work on the task of

harmonic analysis of performances.

Acknowledgements

The supervision of Mark Steedman has been immensely valuable to me, not only in

bringing the present thesis to fruition, but also for all that I have learned from it regard-

ing academic research, both in the field and more generally. It has, furthermore, been a

great pleasure to work with Mark over the past few years and I shall be sad to lose the

benefit of his vast experience and his deep understanding of computational linguistics

and cognitive science in general. I am also grateful to Sharon Goldwater for valuable

input at various stages of the project and for comments on this thesis.

I have gained a huge amount from meetings and frequent in-depth discussions with

the other members of the research group. Many thanks to Emily Thomforde, Tom

Kwiatkowski, Kira Mourao, Prachya Boonkwan, Michael Auli, Luke Zettlemoyer,

Christos Christodoulopoulos, Aciel Eshky, Greg Coppola, Mike Lewis, Alexandra

Birch, Tejaswini Deoskar, Nathaniel Smith and Bharat Ambati for discussions and ad-

vice. Special thanks must go to Christos, firstly, for all the extensive daily exchanges

over coffee, which have had a substantial impact on my research; and, secondly, for

reading this thesis and providing valuable and detailed comments at a time when he

really should have been concentrating on his own. The parts of this thesis relating to

chord dependencies and dependency evaluation owe much to ideas and insights coming

out of conversations with Joakim Nivre.

My research has benefitted greatly from presenting the work at the following con-

ferences and workshops: Neuromusic IV, SICSA 2011, MML 2012, ICMC 2012. I am

grateful to all concerned for the useful feedback on my work that I received on these

occasions and for the interesting connections that they permitted me to make between

my own work and others’.

I owe a great deal to many of my friends and family members. Two family mem-

bers in particular deserve a special mention. I am immensely grateful to Leonie Wild-

ing for truly remarkable proofreading of this thesis: clear, accurate and amazingly

thorough, not to mention speedy. Hanna Granroth-Wilding has provided invaluable

support through the duration of my PhD and, in particular, during the most stressful

times in the process of writing this thesis. As well as giving me the benefit of the

viewpoint of a quite different scientific discipline, she has helped me to keep things in

perspective and work steadily on, particularly during the final stages of writing.

I gratefully acknowledge the funding I have received to carry out this work from

the Engineering and Social Research Council and FP7 grant 249520 (GRAMPLUS).

Declaration

I declare that this thesis was composed by myself, that the work contained herein is

my own except where explicitly stated otherwise in the text, and that this work has not

been submitted for any other degree or professional qualification except as specified.

(Mark Granroth-Wilding)

Table of Contents

1 Introduction 1

2 Structure in Language and Music 52.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Literature Review: Formal Grammars in the Analysis of Music . . . . 6

2.2.1 Formalizing Music Theory . . . . . . . . . . . . . . . . . . . 7

2.2.2 Syntax and Semantics . . . . . . . . . . . . . . . . . . . . . 8

2.2.3 Musical Grammars . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.4 Probabilistic Music and NLP . . . . . . . . . . . . . . . . . . 20

2.3 Syntax in Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Combinatory Categorial Grammar . . . . . . . . . . . . . . . . . . . 24

2.5 Functional Structure in Harmony . . . . . . . . . . . . . . . . . . . . 27

2.5.1 The Cadence and Harmonic Function . . . . . . . . . . . . . 28

2.5.2 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5.3 Jazz Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6 Tonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.6.1 Consonance and Harmony . . . . . . . . . . . . . . . . . . . 31

2.6.2 Just Intonation . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.6.3 Equal Temperament . . . . . . . . . . . . . . . . . . . . . . 35

2.6.4 Harmonic Interpretation . . . . . . . . . . . . . . . . . . . . 36

2.6.5 Example Interpretations . . . . . . . . . . . . . . . . . . . . 39

2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 A Grammar for Tonal Harmony 453.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Tonal Harmonic Interpretation Semantics . . . . . . . . . . . . . . . 46

3.2.1 The Lambda Calculus as Notation . . . . . . . . . . . . . . . 47

3.2.2 Interpretation of Tonics . . . . . . . . . . . . . . . . . . . . . 48

3.2.3 Interpretation of Cadences . . . . . . . . . . . . . . . . . . . 50

3.2.4 Coordination of Cadences . . . . . . . . . . . . . . . . . . . 51

3.2.5 Multiple Cadences: Development . . . . . . . . . . . . . . . 53

3.2.6 Colouration: Empty Semantics . . . . . . . . . . . . . . . . . 54

3.2.7 Extracting the Tonal Space Path . . . . . . . . . . . . . . . . 55

3.2.8 Representation as Dependency Graphs . . . . . . . . . . . . . 58

3.3 CCG for Harmonic Syntax . . . . . . . . . . . . . . . . . . . . . . . 59

3.4 A Grammar for Jazz . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4.1 The Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4.2 The Lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.5 Key Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4 Building an Annotated Corpus 734.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2 Chord Sequence Data . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2.1 Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2.2 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3 Annotating a Unique Gold-Standard Interpretation . . . . . . . . . . 77

4.4 Annotation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.4.1 Analysis Decisions . . . . . . . . . . . . . . . . . . . . . . . 85

4.5 Omissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.5.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.6 Lexical Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5 Statistical Parsing of Chord Sequences 935.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.2 Supertagging Experiments . . . . . . . . . . . . . . . . . . . . . . . 95

5.2.1 Supertagging . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.2.2 N-Gram Supertagging Models . . . . . . . . . . . . . . . . . 97

5.2.3 Using the C&C Supertagger . . . . . . . . . . . . . . . . . . 102

5.2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.3 Statistical Parsing Experiments . . . . . . . . . . . . . . . . . . . . . 107

5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.3.2 CKY Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.3.3 Hockenmaier’s Parsing Model for CCG . . . . . . . . . . . . 109

5.3.4 PCCG for Parsing with the Jazz Grammar . . . . . . . . . . . 112

5.3.5 Tonal Space Path HMM Baseline . . . . . . . . . . . . . . . 116

5.3.6 Aggressive Backoff . . . . . . . . . . . . . . . . . . . . . . . 119

5.3.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.3.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . 126

6 Parsing Note-Level Performance Data 1296.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.2 Data Representation: MIDI . . . . . . . . . . . . . . . . . . . . . . . 131

6.3 Adding MIDI to the Jazz Corpus . . . . . . . . . . . . . . . . . . . . 131

6.4 Pipeline Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.4.1 Chord Recognizer as a Frontend to the Supertagger . . . . . . 132

6.4.2 Supertagging a Chord Lattice . . . . . . . . . . . . . . . . . 135

6.4.3 HMM Baseline . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7 Conclusion 155

Bibliography 159

Table of Examples

2.1 Linguistic CCG function application . . . . . . . . . . . . . . . . . . . 25

2.2 Linguistic CCG derivation with semantics . . . . . . . . . . . . . . . . 26

2.3 Linguistic composition rule . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Linguistic coordination rule . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5 Tonal space analysis of the Basin Street Blues . . . . . . . . . . . . . . 39

2.6 Tonal space analysis of the opening of BWV 553 . . . . . . . . . . . . 41

2.7 Tonal space analysis of Autumn Leaves . . . . . . . . . . . . . . . . . . 42

3.1 Function application with the lambda calculus . . . . . . . . . . . . . . 47

3.2 Function application for the semantics of a passive verb . . . . . . . . . 48

3.3 Logical forms for the interpretation of tonics . . . . . . . . . . . . . . . 50

3.4 Derivation of a harmonic interpretation from its constituents . . . . . . 51

3.5 Derivation of a harmonic interpretation using composition . . . . . . . 51

3.6 Conjunction of cadence logical forms by coordination . . . . . . . . . . 52

3.7 Coordinated cadence logical form applied to its resolution . . . . . . . . 52

3.8 Coordinating more than two cadences . . . . . . . . . . . . . . . . . . 53

3.9 Resolution of a dominant onto a coordination . . . . . . . . . . . . . . 53

3.10 Concatenation of two resolved cadences by the development rule . . . . 54

3.11 Concatenation of a tonic with a resolved cadence . . . . . . . . . . . . 54

3.12 Colouration chord with empty semantics . . . . . . . . . . . . . . . . . 54

3.13 Derivation of a cadence logical form constrained by syntactic types . . . 61

3.14 Syntactic derivation of a cadence by composition . . . . . . . . . . . . 62

3.15 Syntactic derivation of a coordinated cadence . . . . . . . . . . . . . . 62

3.16 Syntactic types for a cadence from Can’t Help Lovin’ Dat Man . . . . . 63

3.17 Repeated tonic chords, including a substitution, interpreted by a category

produced by the tonic repetition rule . . . . . . . . . . . . . . . . . . . 65

3.18 Extended cadence derivation with semantics . . . . . . . . . . . . . . . 69

3.19 Derivation of cadence from Can’t Help Lovin’ Dat Man . . . . . . . . . 69

3.20 Rough sketch of a syntactic derivation permitting interpretation of hier-

archical modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.1 The order of composition and coordination in a derivation does not affect

the result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2 Categories assigned to a cadence from Alfie . . . . . . . . . . . . . . . 82

4.3 Categories assigned to a cadence from Alfie, with schema names . . . . 85

5.1 Chord sequence for a cadence from Can’t Help Lovin’ Dat Man (repeated) 93

5.2 Derivation of an implausible interpretation of Can’t Help Lovin’ Dat

Man cadence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Table of Acronyms

AA anotation accuracy 89

AST adaptive supertagging (Clark & Curran, 2004a) 97, 103, 125, 127, 135, 137,

CCG Combinatory Categorial Grammar (Steedman, 2000) iii, 2, 5, 9, 15, 20, 24–26,

44–46, 59, 63, 65, 71, 77, 90, 95, 96, 102, 107–109, 113, 114, 155–157

CKY Cocke-Kasami-Younger (Harrison, 1978) 107, 108, 111

DR dependency recovery 88, 89, 123–125, 140, 141, 146, 147, 157

EM expectation-maximization 101, 134

GTTM A Generative Theory of Tonal Music (Lerdahl & Jackendoff, 1983; Jackend-

off, 1991; Lerdahl, 2001) 11–18

HMM hidden Markov model (Rabiner & Juang, 1986) 20, 98–101, 104, 114–118,

127, 133–135, 139, 148–150

MIDI Musical Instrument Digital Interface (MIDI Manufacturers Association, 1996)

131–134, 136–142, 146–153, 157

MIR music information retrieval 21, 130

NLP natural language processing iii, 1–3, 5, 7, 20, 21, 23, 58, 73, 74, 94, 96–98,

100–102, 106, 120, 123, 130, 140, 156, 158

ODR optimized dependency recovery 141, 142, 146–148

PCCG probabilistic CCG (Hockenmaier, 2001) 109, 110, 112, 125–127

TSED tonal space edit distance 88, 89, 123–125, 140, 148, 157

CHAPTER 1Introduction

Hierarchical structure can be identified in the organization of Western tonal music, for

example in the rhythmic patterns of the melodies and the harmonic progressions that

underly them. The prosody and syntax of natural languages are commonly analysed

as being organized according to similar hierarchical structures, represented as tree dia-

grams that divide a passage of speech or text recursively into its constituents, down to

the level of individual words. It is, therefore, reasonable to expect that the techniques

used to identify and process these structures automatically in natural language might

profitably be applied to the automatic interpretation of music.

In natural language processing (NLP), analysing the syntactic structure of a sen-

tence is usually a prerequisite to semantic interpretation. The analysis is a non-trivial

task, as a result of the high degree of ambiguity in even moderately long sentences. In

music, a similar sort of structural analysis, over sequences exhibiting a similar degree

of ambiguity, is fundamental to tasks such as key identification and score transcrip-

tion. These and other tasks depend on both a harmonic (tonal) analysis and a rhythmic

(metrical) analysis.

There is a long history of the application of linguistic analysis techniques to musical

analysis (among others, Meyer, 1956; Cooke, 1959; Bernstein, 1976; Smoliar, 1976;

Roads & Wieneke, 1979; Baroni et al., 1983), with varying degrees of formality. Some

of this work has explored various applications of formal grammars to the analysis

2 Chapter 1. Introduction

of hierarchical structure (Winograd, 1968; Keiler, 1981; Lerdahl & Jackendoff, 1983;

Steedman, 1996; Rohrmeier & Cross, 2009). Meanwhile, in the field of NLP, statistical

modelling, in particular in the form of probabilistic models, has become ubiquitous for

large-scale practical analysis of language (for example Collins, 1997; Hockenmaier

& Steedman, 2002; Clark & Curran, 2004b; Auli & Lopez, 2011). The focus of the

present thesis is on the application of formal grammars and related statistical models

of language to the task of automatically analysing music. I argue that the structures

that underly the harmonic progressions of Western tonal music have a syntax similar

to that of natural languages and that the unconscious processing of these structures by

listeners can be modelled using a formalism similar to those used to model linguistic

syntactic processing. I address the question that naturally follows of to what extent the

statistical parsing techniques used to perform automatic linguistic analysis in NLP can

be applied to automatic harmonic analysis of music.

This thesis demonstrates that statistical parsing techniques, adapted from NLP with

little modification, can be successfully applied to recovering harmonic structure. It

shows first how a type of formal grammar similar to one used for linguistic syntactic

processing, Combinatory Categorial Grammar (CCG, Steedman, 2000), can be used to

analyse the hierarchical structure of chord sequences. Harmonic structure can be anal-

ysed in terms of relationships between chords expressed in the tonal space of Longuet-

Higgins (1962a,b). Several of the authors cited above have proposed grammars to for-

malize the structure of tonal relationships between chords and the analyses produced

by the grammar presented here bear considerable similarity to these. The proposed

grammar differs from previous work in two main respects. Firstly, it makes the distinc-

tion advocated by Steedman (2000) between semantics – the formal structural analysis

of interest – and syntax – the rules that govern the process of deriving the structure

from an unstructured input. Though the distinction has in general not been made ex-

plicitly, previous work has focused primarily on the former: the structures that underly

harmony as unconsciously understood by a listener. The explicit separation of these

components of the analysis made by CCG permits an account of the process of syntac-

tic derivation in which the structure of a derivation need not strictly follow the structure

that is derived. As a result, a parser is able to perform a more incremental (that is, left-

to-right) analysis and the grammar may use a less constrained notion of constituency.

Secondly, taking advantage of this latter feature of the formalism, the grammar treats

unfinished cadences (or ‘half-cadences’) in a new way. Whilst maintaining an analy-

sis of extended cadences as structures with a right-branching embedding, it permits a

left-branching embedding of the constituents built during a derivation, allowing unfin-

ished cadences to be treated as constituents. The interpretation of multiple unfinished

cadences as sharing an eventual resolution is structurally analogous to a type of coordi-

nation common in natural language. Together with a handling of modulation similar to

some of the previous work, the result is a grammar capable of analysing a wide range

of musical structures within a particular genre and easily adaptable to other genres.

The present work introduces a formal language similar to first-order predicate logic to

express tonal relationships. The syntactic grammar formalism then serves as a mech-

anism to map unstructured chord sequences onto their semantics represented in this

language. A grammar using the formalism is presented for analysing chord sequences

of jazz standards, which tend to feature particularly complex patterns of structural

embedding in their harmony. Both the formal language of harmonic analysis and the

grammar formalism are further developments of the previous work of Steedman (1996)

and the author (Wilding, 2008). They are described in full in chapter 3.

In NLP, the high degree of ambiguity of syntactic analysis means that a parser must

consider a huge number of possible structures. Chart parsing provides an efficient

mechanism by which to explore them. The addition of statistical models allows the

parser to use information about the frequency of structures seen before in a training

corpus to eliminate improbable interpretations early on in the process and to rank the

final analyses by plausibility. To apply the same techniques to harmonic analysis,

a corpus of chord sequences annotated with good analyses is required. Chapter 4

documents the construction of such a corpus of analyses using the grammar and some

of the difficulties encountered in the process.

The present work follows in a long tradition of linguistic-style grammatical analy-

ses of music. It is, however, the first to apply statistical models of grammatical structure

to wide-coverage automatic harmonic analysis. Chapter 5 describes the adaptation of

two statistical parsing techniques and their evaluation: a probabilistic model of gram-

matical derivations and a supertagger, allowing fast approximate parsing. The corpus

is small, which puts limitations on the statistical models that can be trained using it.

The experiments in chapter 5 with supertaggers make it clear that the type of history-

based sequence models tested can only use a small amount of contextual information.

Parsing experiments in chapter 5 show that parsing using the model of derivations

is successful at recovering harmonic analyses, improving substantially over a baseline

Markovian model. Using the derivation model together with the supertagger, the parser

achieves a similar improvement over the baseline with much shorter processing time.

4 Chapter 1. Introduction

Chord sequences provide a useful abstraction of musical input to demonstrate the

effectiveness of the parsing techniques and it is an assumption of this analysis tech-

nique that segmentation of the input into passages of similar underlying harmony must

feature in any harmonic analysis of musical data at the level of performed or written

notes. Transcribed chord symbols, of the sort used as input to a parser in the experi-

ments of chapter 5, approximate a segmentation into harmonic units that is required for

a harmonic analysis of the notes of a performance or a score. It should, therefore, be

possible in principle at least to extend the same analysis techniques to the interpreta-

tion of data representing an actual musical performance. Such an extension would have

useful applications in many practical tasks, for example in the field of music informa-

tion retrieval, as well as being of interest to constructing a more convincing account

of human cognition of musical structure. Chapter 6 introduces some extensions of

the parsing techniques to handle musical performance data, symbolically represented

as a stream of notes. The models presented are simple and theoretically rather unsat-

isfactory, but serve to demonstrate one manner in which the models previously used

for chord sequence analysis can be extended to this harder task. The results reported

provide a baseline for future work on the task of harmonic analysis of performance

CHAPTER 2Structure in Language and Music

2.1 Introduction

Many tasks in NLP, such as query understanding and sentiment analysis, are dependent

on analysis of the predicate-argument structure of sentences, performed by parsing.

In this thesis, I argue that the task of natural language parsing is analogous to the

musical task of analysis of the tension-resolution structures found in tonal harmony.

This analogy has previously been exploited for harmonic and other types of musical

analysis. The analogy leads naturally to the question of whether the well-developed

techniques applied to the language parsing task in NLP can equally be applied to a

similar parsing task defined for harmony.

This chapter presents an overview of the important background to the present goal

of adapting parsing techniques to harmonic analysis. Section 2.2 surveys previous

work in this field and discusses the present work’s relation to other grammatical ap-

proaches to musical analysis. Sections 2.3 and 2.4 introduce the concept of syntactic

parsing and the grammatical formalism, CCG, which I later take as the basis for a

grammar of harmony. Section 2.5 describes informally the sorts of structures found

in tonal harmony that this work is concerned with analysing. Section 2.6 outlines a

theory of tonal music which provides a formal model for harmonic analysis of those

structures.

6 Chapter 2. Structure in Language and Music

2.2 Literature Review: Formal Grammars in the Analy-

sis of Music

The nature of human response to music, its relationship to the musical signal and the

mechanisms by which a signal is interpreted, internally represented and remembered

by a listener have long been a subject of wide-ranging investigation (Euler, 1739;

Helmholtz, 1862; Meyer, 1956; Cooke, 1959; Desain & Honing, 1992). Music as a

communicative system resembles natural languages in that it requires the unconscious

inference of structures ambiguous in the signal in order to be understood by a listener

(Keiler, 1981). Relating these cognitive structures to the meaning conveyed by the

music is a critical part of understanding the nature of musical communication. Besides

studying the sorts of meaning that music is capable of conveying, it is prerequisite to

such an endeavour to explain the cognitive structures that support communication of

musical meaning – the structures underlying perception of, for example, melody, har-

mony and rhythm – in much the same terms as the corresponding question for language

(Longuet-Higgins, 1978).

A listener hearing a sentence in English must be aware of certain linguistic struc-

tures underlying it in order to derive the speaker’s meaning. Inference of the logical

relationships between the entities, actions and events denoted by the words is an es-

sential part of the semantic interpretation of the sentence. Identifying these relations

requires connections to be made between arbitrarily distant words in the sentence.

Similar close relationships exist between musical elements linearly (that is, chronolog-

ically) distant in the signal processed by a human listener. In both music and language,

the structural organization underlying the signal plays an essential role in interpreting

and memorizing it and, in both cases, this is performed unconsciously by the listener.

This observation is fundamental to much work that has drawn on the links between

music and language (Meyer, 1956; Longuet-Higgins, 1962a,b; Smoliar, 1976; Lerdahl

& Jackendoff, 1983). Cooke (1959, p. 33) takes it as the basis for an attempt to explain

the nature of musical emotional expression. He describes the technical construction of

music as the ‘magnificent craftsmanship whereby composers express their emotions’,

claiming that it is ‘unintelligible to the layman, except emotionally’. In other words, he

recognizes that structural organization must play a part even for an untrained listener

in the emotional effects of music, albeit unconsciously.

2.2. Literature Review: Formal Grammars in the Analysis of Music 7

In this section, I examine some formal approaches to characterizing the struc-

tural processing of music, and in particular harmony, performed by a listener. I relate

them to a particular approach to a related task in NLP – the analysis of the semantic

predicate-argument relations in sentences by syntactic parsing using formal grammars.

2.2.1 Formalizing Music Theory

There is a long history in Western music theory of informal description of musi-

cal structure, most commonly as an aid to composers (for example, Cooke, 1959;

Schenker, 1906). Many authors have advocated the formalization of intuitions regard-

ing musical organization, some drawing on formal tools from linguistics (see below)

and others on other means of formal description, including imperative programming

languages (Smoliar, 1976; Forte, 1967; Longuet-Higgins & Steedman, 1971; Baroni

et al., 1983; Temperley & Sleator, 1999). Baroni et al. (1983) define the fundamen-

tal role of scholarly discussion of music as providing a theoretical framework which

may serve ‘as a formal model for the phenomena it describes and thus be capable of

“explaining” the complexity of music as an instrument of communication and culture.’

Temperley (2007) advocates approaching the theory of music cognition by proposing

computational models on the basis that, whilst the ability of a model to support auto-

matic computation does not prove its suitability as a model of human cognition, it does

satisfy an important requirement of a plausible model.

In his series of six Norton Lectures, Bernstein (1976) proposed the application of

Chomsky’s (1965) formal grammars to the analysis of music. Although the specifics of

Bernstein’s proposal have been widely rejected for a range of reasons, his idea served

as the inspiration for several lines of research exploring the formal and cognitive corre-

spondences between music and language. In a response to Bernstein’s proposal, Keiler

(1978), whilst supporting the exploration of application of formal grammars to musi-

cal analysis, urges caution in drawing correspondences. The approach he proposes is

to look for connections between linguistic and musical analysis only where they are

dictated by characteristics that arise independently in the two domains, emphasizing

the dangers of beginning with assumptions regarding the specific correspondences we

expect to find. Lerdahl & Jackendoff (1983) make a similar point, observing that their

own analysis turns out not to resemble linguistic theories very closely. Katz & Peset-

sky’s (2011) recent approach, on the other hand, is quite different. Although they take

Lerdahl & Jackendoff’s (1983) analysis as their starting point, they attempt to show that

it can after all be re-expressed in terms very similar to linguistic theories. Contrary to

Keiler’s argument, they begin with the hypothesis that music and language are prod-

ucts of a single cognitive system and that, therefore, theories of musical and linguistic

structure should be maximally aligned. Bernstein’s was one of several early propos-

als for the formal analysis of musical cognition using grammars (Winograd, 1968;

Lindblom & Sundberg, 1969) and subsequently a variety of approaches have been ex-

plored (Longuet-Higgins, 1978; Keiler, 1981; Lerdahl & Jackendoff, 1983; Steedman,

1984; Johnson-Laird, 1991; Steedman, 1996; Pachet, 2000; Chemillier, 2004; de Haas

et al., 2009; Rohrmeier & Cross, 2009), to which much of the remainder of this review

will be devoted. Longuet-Higgins & Lisle (1989) sketch an application of Chomskian

grammars to music in which a language corresponds to a musical idiom, an utterance

to a composition in that idiom and logical meaning to affective meaning. The corre-

spondence is the same that is used by Lerdahl & Jackendoff (1983) (though Longuet-

Higgins & Lisle claim their generative theory of music to be closer to the Chomskian

paradigm).

2.2.2 Syntax and Semantics

Steedman (2002) makes a connection between certain fundamental operations involved

in syntactic processing and operations in the reasoning that underlies planning a series

of actions in order to achieve a goal. The importance of this connection is that it sup-

ports a theory of human linguistic processing which is deeply connected to the more

general human capacity to represent and reason about actions and their consequences,

a connection suggested by neurological literature on language processing and child

language acquisition. If the unconscious structural organization of a linguistic signal

can be explained in terms of a set of operations having their origin in motor planning, it

is likely that a similar explanation can be provided for the organization of structurally

similar musical relationships, such as those formalized by Keiler (1981) or Johnson-

Laird (1991). Furthermore, the possibility that operations from the same evolved ca-

pacity for planning can offer an explanation of cognition of both language and music

provides new grounds for Katz & Pesetsky’s (2011) programme searching for a theory

of musical competence that resembles linguistic theory as closely as possible in the

hope that the result may shed light on the cognitive capacity common to the two1.

1 Honing (2011a,b) even suggests, on the basis of experiments with young babies, that certain typesof cognitive processing of music might be more primitive than language processing, although his exper-iments concern perception of pitch and metre at a level which has little impact on an explanation of our

Steedman (2000) argues for a theory of syntax that differs from that underlying

many earlier treatments of linguistic grammar. Rather than being seen as a level of

representation of linguistic structure, it is treated as a characterization of the process

by which a semantic interpretation is derived compositionally from the language’s spo-

ken form. The semantic interpretation includes a logical form, a representation of the

predicate-argument relationships expressed in the surface form. The syntactic com-

ponent of a grammar serves to enforce language-specific constraints on the ordering

and combination of constituents in mapping the linguistic signal to its interpretation.

Under this approach, the meaning representation is quite separate to the syntactic con-

structs available to derive it from sentences. The result is that an account of syntac-

tic processing need not strictly reflect the structure of the meaning representation in

the intermediate structures it builds. For example, the fact that the logical form of

a particular sentence takes the form of a right-branching structure need not prevent

the left-branching syntactic derivation that better explains a hearer’s ability to perform

an incremental analysis. Steedman presents the grammar formalism of CCG, which

expresses a transparent connection between a compositional semantics and the syntac-

tic constituents by which it is induced from the surface form and which permits the

required non-standard derivational structures.

A feature by no means unique to this breed of linguistic theory, but made more

explicit by the transparent pairing of syntax and semantics, is that the purpose of spec-

ifying a grammar for a natural language is not to define a set of sentences permissible

in a language, nor to provide a test for the permissibility of a particular sentence, but to

explain the relationship between the elements of a sentence and the structure of the sen-

tence’s meaning. Lerdahl & Jackendoff (1983) note that a misunderstanding of the role

of a formal grammar in this way has led some to a mistaken understanding, or indeed a

complete rejection, of the applicability of formal grammars to a theory of music. This

separation also provides an answer to Narmour’s rejection of a theory of music based

on Chomsky’s (1965) transformational grammars. Narmour (1977, pp. 116–119) ar-

gues against a transformational grammar approach to music (in particular, against a

grammatical formalization of Schenkerian theory) on the grounds that it is impossi-

ble to separate the structure of the interpretation from the structure that derives it. He

argues about Schenkerism in general that it fails to separate analysis from methodol-

ogy. Steedman’s theory of grammar, however, does just that in its explicit separation

of logical semantics from the rules of syntax constraining its derivational process.

capacity to process complex musical structures.

Longuet-Higgins (1978) points out that, whilst many authors had previously at-

tempted to define the meaning expressed by music (Hindemith, 1942; Tovey, 1949;

Meyer, 1956; Cooke, 1959) and its relation to musical structure, none had addressed

the issue of providing a theory of the cognitive structures that underly our ability to

interpret, remember and recognize music. That is, they do not explain the structures

that we perceive when we hear a piece of music and how they relate to the sound sig-

nal. This question corresponds to the question in the linguistic domain of what logical

structures are conveyed by a speech signal (for example the logical forms discussed

above) that allow us to interpret it with respect to some notion of the real world and

how the speech signal is processed to retrieve them. In formalizing musical structure

and the syntax that relates it to a musical surface form, it is not particularly impor-

tant that the meaning expressed by music is of a very different kind to that typically

expressed by language. London (2011) rejects a linguistic-style treatment of musical

syntax for several reasons, among them its lack of referential-semantic content. This

criticism is put in a different light by Steedman’s (2000) view of syntax. A logical

form such as eats′(keats′,beets′) for the sentence Keats eats beets cannot serve as an

interpretation of the sentence’s meaning with respect to a model of the world without

some connection between the predicate eats′ and the familiar concept of consuming

food, and so on. Such a connection is assumed to be available in the interpretation

of the sentence as far as the logical form is concerned in the form of a model theory.

Regardless of the meaning associated with eats′, it is essential to understanding the

denotation of the sentence that the listener recognize the particular predicate-argument

structure that the logical form expresses. Likewise, regardless of the affective or in-

dexical meaning that might follow (in the style of Meyer, 1956 or Cooke, 1959), in

order to build the prerequisite cognitive structure to interpret the chord sequence D7

G7 C in the key of C major it is essential to recognize the dominant relation of the G7

with respect to the C and the secondary dominant relation of the D7 to the G7. Despite

the absence of a full account of musical meaning, we do have a fairly good knowledge

of many of the structures essential to perception of music: phrases, metre, polyphony,

harmony, etc. This review focuses on work whose goal is to characterize the cognitive

structures and processes that support the perception of musical meaning. I, therefore,

will not talk here about work on the types of meaning conveyed by music or how they

relate to these structures. (For a thorough review, see Monelle, 1992.)

2.2.3 Musical Grammars

In this section, I discuss in more detail several accounts of structure in tonal music that

directly relate to the present thesis. In particular, I consider applications of linguistic-

style grammars in the light of the above view of the role of syntax. In these terms, most

of the works discussed are primarily concerned with a theory of the structures that con-

stitute the semantics of music (in the same sense in which a logical form expresses the

semantics of a sentence) and less with a theory of the computational process required

to derive the structures from a musical surface.

2.2.3.1 Lerdahl and Jackendoff

A Generative Theory of Tonal Music (GTTM, Lerdahl & Jackendoff, 1983; Jackend-

off, 1991; Lerdahl, 2001) has been one of the most influential works on the application

of theories inspired by linguistics to music. GTTM sets out a thorough account of cer-

tain types of structure underlying music. Their analysis begins with two independent

structures: grouping structure, representing the hierarchical segmentation of the music

into phrases, and metrical structure, representing the organization of rhythm to align

with a number of levels of regular patterns formalized as a metrical grid. Both of these

precede and contribute to two further structures: time-span reduction, denoting the rel-

ative structural importance of notes, and prolongation reduction, a structure of tension

and resolution in melody. The structures are derived by a collection of semi-formal

preference rules governing the order in which notes are connected to the structures and

the type of relationship expressed by the connection. Lerdahl & Jackendoff (1983)

state that the theory is concerned not with the cognitive processes of a listener, but

rather with ‘the final state of his understanding’. Jackendoff (1991) describes GTTM

as ‘an account of the abstract structures available to the listener’ and ‘of the principles

available to the listener to assign abstract structure to pieces of music’. Thus GTTM

makes an important contribution to the formalization of the cognitive structures that

underly music, but does not attempt to represent any aspect of the process by which

they are unconsciously produced. Jackendoff (1991) further outlines some principles

for the construction of an interpretative mechanism, a parser, that explain how the lis-

tener can build the structures in real time. These include an approach to ambiguity

now common in statistical parsing of natural language and applied to musical parsing

in the present thesis, though Jackendoff does not propose the use of statistics to model

the relative plausibility of ambiguous interpretations.

Clarke (1986) claims that Lerdahl & Jackendoff do not take due consideration of

Chomsky’s distinction between competence – an idealized system of linguistic knowl-

edge shared by native speakers – and performance – the practical issues that come

into play in the actual use of a language for communication. From their aim of mod-

elling the final state of understanding of an idealized experienced listener, Lerdahl &

Jackendoff appear, like Chomsky, to position their work firmly in the domain of com-

petence. Clarke (1986) questions this classification, pointing out several psychological

aspects of the theory discussed in GTTM, suggesting that they are after all interested

in the implications for a theory of performance. Nevertheless, a theory of competence

grammar cannot be divorced from issues of performance, since the two must have de-

veloped together as part of a single system (Steedman, 2000, p. 261). This holds for

musical grammar just as for linguistic grammar, since music serves its primary com-

municative purpose through its capacity to be interpreted unconsciously by a listener

in real time. A competence grammar must be able to support a plausible theory of a

performance mechanism, providing, for example, any required notions of constituency

and incrementality. A theory of what is computed, as opposed to one of how the

computation is carried out (to use the terminology of Johnson-Laird, 1991), should,

therefore, not eschew consideration of the computational properties of the processing

that it entails. Rosner (1984) points out the particular danger of dismissing as irrele-

vant this aspect of a theory while making such bold claims as Lerdahl & Jackendoff

do regarding the innateness of musical cognition and consequent universality of some

aspects of their theory.

Jackendoff (1991) sets out four necessary components of a theory of music percep-

tion, accounting for (1) abstract structures available to a listener; (2) the principles by

which a listener may assign these structures to music; (3) how the principles are applied

in real time and (4) the facilities in the mind for applying the principles. He claims that

Lerdahl & Jackendoff (1983) addressed (1) and (2) and sketches an approach to pro-

viding (3) for GTTM. The outlined computational model, a parallel multiple-analysis

parser, is essentially the approach to parsing widely accepted as the basis for statistic

parsers in the natural language parsing community, though Jackendoff does not link the

notion of plausibility to anything as concrete as a statistical model. Simply augmenting

the rules of GTTM with a system of priorities, as originally suggested by Lerdahl &

Jackendoff (1983) and implemented by Hamanaka et al. (2006) might appear to take

a step towards answering the criticism of, among others, Longuet-Higgins (1983) that

GTTM fails to constitute a formal analysis. However, Clarke (1986) notes that Ler-

dahl & Jackendoff are mistaken in viewing the indeterminacy of GTTM as a reflection

of the inconsistency of human analysis: instead, the model should provide definite

analyses, with specific points of divergence between alternative, but nonetheless well-

defined, analyses. The theory should explain in precise terms how a human listener

can construct an unambiguous analysis of some passages of music. Where ambiguity

arises in human interpretation, it should be accounted for as a choice between multiple

alternative interpretations, each deemed acceptable by the theory. Jackendoff (1991)

takes quite a different view of ambiguity to Hamanaka et al. (2006), closer to Clarke’s

(1986). His proposed parser would in principle be capable of outputting multiple fully

formed analyses.

The similarities between GTTM and Schenkerian analysis (Schenker, 1906) are

seen by the authors as a happy coincidence, formalization of Schenker’s analysis not

having been a goal in GTTM’s construction. Despite this, they share with Schenkerian

analysis two central and questionable characteristics. Firstly, hierarchical structure is

treated in terms of reductions from one musical surface to another more sparse, but in

essence similar, form. Secondly, a single analysis procedure is applied from the lowest

levels of abstraction – the relationships between individual notes – to the highest – the

overall structure of sections or keys. The use of reductions as the basis for musical

structure is stated in the strong form which they adopt as the strong reduction hypothe-

sis. The essence of this is the idea that the events that make up the musical surface can

be organized into a hierarchical structure of relative perceptual importance. However,

it goes further in assuming that a musical surface can be analysed by a reduction of ad-

jacent constituents, each headed by a single pitch event (note or chord of simultaneous

notes). This appears a reasonable principle, for example, when reducing a suspension

and resolution to just the resolution and thereafter treating the passage as if it were the

resolved chord alone. However, in other circumstances it appears less reasonable. In

an extreme case, for example, the same principle leads to the assumption that a whole

section of a piece is subconsciously identified by a listener with its most salient pitch

event in determining its prominence with respect to surrounding units. It is similarly

questionable in a model of cognition that high-level structural forms which unfold over

long periods of time, such as sonata form, should be a part of the same analysis mech-

anism that interprets relationships at a local level. A quite separate formalism may be

appropriate, such as one based on the models of periodic patterns suggested by Simon

& Sumner (1968).

Marsden (2010) has addressed the possibility of a computational system that di-

rectly models the process of Schenkerian analysis. Unsurprisingly, his system suffered

from high computational complexity and needed to use aggressive pruning techniques

to eliminate unpromising analyses. However, there is a more fundamental problem

in the present context, even if a computational procedure can be defined to produce

satisfactory Schenkerian analyses. Any such approach suffers from the serious draw-

back that, in contrast to GTTM, it remains a mystery what Schenker’s structures aim

to represent (Temperley, 2011) and it is not clear that there is any reason to suppose

they formalize a cognitive structure built by a listener.

The component of GTTM that relates to the type of harmonic structure examined in

the present thesis is prolongation reduction. The rules for construction of prolongation

reduction trees lead to some structures closely related to those formalized by Keiler

(1981), Steedman (1996) and Rohrmeier (2011). The most fundamental difference of

the analyses of these authors from GTTM is that their structures are strictly confined to

expressing harmonic relationships between chords and tonal regions, whilst the struc-

tures of prolongation reduction incorporate in a single set of rules an analysis ranging

from relationships between individual notes at the lowest level to the highest level of

form dominating the entire piece.

Lerdahl (2001) relates the hierarchical structures of prolongation reduction to a hi-

erarchical model of tension. He notes that this notion of tension is only one of a variety

of musical phenomena that are described as tension and that his model captures only

a notion of tension related to harmonic stability. The degree of tension is related to

the level of embedding of harmonic structure and is presumed to be directly percepti-

ble and quantifiable by a listener (provided it can be distinguished from other sources

of tension). Many other authors who propose hierarchical formalizations of harmonic

structure, including those discussed below and the present thesis, do not make a di-

rect link between the depth of embedding of harmonic relations and perceived tension.

Furthermore, a connection between cognitively constructed harmonic structure and im-

mediately perceived tension is incompatible with an account of the mental process of

construction of the structures that permits multiple ambiguous structures to be main-

tained simultaneously and disambiguated by later musical events (as proposed, for

example, by Jackendoff, 1991), since this entails that a listener must be able to modify

their immediate perception of potentially quite distant past events.

2.2.3.2 Katz and Pesetsky

Recently, Katz & Pesetsky (2011) have reassessed GTTM in an attempt to align the the-

ory more closely with linguistic theory. They present a persuasive argument for a spe-

cific correspondence between language and music which they call the identity thesis, in

which the two are distinguished by their building blocks but have in common the struc-

tural operations that must be applied to them to interpret linguistic or musical surfaces.

This proposal is connected to the relationship proposed by Steedman (2002) between

the basic operations of linguistic processing and a general human capacity for planning.

Katz & Pesetsky claim that subsequent developments in linguistics suggest new

ways of looking at GTTM in which many former obstacles to unification of the theo-

ries no longer appear as problematic. They furthermore highlight gaps in the theory,

proposing modifications to remedy them, the result of which is a closer correspon-

dence to linguistic theories. The present thesis proposes a view of the correspondence

between language and music similar to the identity thesis. The major theoretical differ-

ences arise from the quite different tradition of linguistic syntactic theory from which

CCG comes and a view of the harmonic component of musical structure more closely

related to the proposals of Keiler (1981) and Rohrmeier (2011) discussed below than

to GTTM.

2.2.3.3 Temperley

Temperley (2001) introduces a new model of musical structure which, like GTTM, is

based on the concept of preference rules. Its aim is to model the cognitive processes

of a listener and for this reason Temperley rejects the many aspects of music theory

which have as their goal aiding listeners by suggesting new ways of experiencing mu-

sic. He argues that Schenkerian theory is primarily designed for this purpose and not to

model cognitive processes and, therefore, does not attempt to follow Schenkerian anal-

ysis. Whilst closely related to some aspects of GTTM, the model discards GTTM’s

third and fourth components, time-span reduction and prolongation reduction, which

he claims are the less psychologically well established and more controversial. Unlike

GTTM, the rules are precisely defined and computationally implementable, and were

implemented in the Melisma Analyzer program (Temperley & Sleator, 1999). Tem-

perley acknowledges the importance of incrementality and ambiguity in the process

of interpretation and uses an approach similar to Jackendoff’s (1991) suggestion to

incorporate them into the model.

Later, Temperley (2007) argues for probabilistic models of musical structure and

cognition as a more satisfactory means of modelling ambiguity than preference rules.

He reviews a variety of work that has previously proposed a probabilistic treatment

of music and suggests Bayesian models of many aspects of musical structure. I shall

return to the subject of probabilistic models of music below. Temperley (2009) extends

his previous models to relate harmony, metre and stream structure to one another in a

single generative model accounting for rhythm and pitch. The model is also imple-

mented in a new version of the Melisma Analyzer.

Temperley approaches harmonic analysis in both of these works as the problem

of identifying a sequence of chord roots underlying the musical surface, noting that

given a solution to the problem of key identification this is equivalent to a Roman nu-

meral analysis. This approach does not address the issue of structured relationships

between chords and the models are unable to capture long-distance dependencies be-

tween chords. Temperley (2001) mentions that hierarchical structure in harmony and

tonality is an important issue that he does not consider. However, to a large degree,

models may be developed separately for the identification of chord roots from mu-

sical surface of notes and the identification of the relationships between chords. As

we shall see below, models have been proposed for analysis of the latter that assume

that another component of the system is responsible for segmenting the notes of the

surface into chords and identifying roots. That is not to say that the two components

may not interact and influence one another and the architecture of the model provides

an interesting proposal for a means of capturing interactions between metrical and

other musical structures, such as harmony. It constitutes a general proposal for uni-

fying models of different aspects of musical structure in a general model of musical

cognition.

2.2.3.4 Keiler

Keiler (1978, 1981) offers a different application of linguistic-style analysis to music

from GTTM. He argues for the development of a theory of musical analysis in which

the steps by which an analysis is derived are explicit and objective. Like Lerdahl &

Jackendoff (1983) and Longuet-Higgins (1978), he emphasizes the importance of a

theory of the internal organization of the musical or linguistic object. He argues that

the definition of discovery procedures which yield analyses of this organization in an

explicit form is important because examination of the implications of the discovery

procedures allows us to criticize the proposed structures themselves.

The work of Rameau (1722) is the basis for the most common forms of harmonic

analysis today. His work introduced the analysis of tonal harmony as having an un-

derlying series of chords and described principles governing the progression between

chords. A modern form of harmonic analysis based on this theory has formed the basis

for the most common class of approaches to formal and computational harmonic anal-

ysis. This includes those whose end product is an unstructured sequence of segments

labelled with chord names (for example Ulrich, 1977; Pardo & Birmingham, 2002;

Sheh & Ellis, 2003; Bello & Pickens, 2005; Ni et al., 2011), and those concerned with

Roman numeral analysis, relating each to an underlying key (for example Raphael &

Stoddard, 2004; Temperley, 2007). Riemann (1893) introduced an approach to for-

malizing the principles of chord progression based on the concept of chord function

(having its origin in the theory of Euler, 1739). The analysis of harmony presented

by Keiler is in the Euler-Riemannian tradition of functional harmony. The goal is

the identification of relationships between chords, potentially spanning long passages.

The structures, analysed as trees, include a notion of recursion and are closely related

to those described by Winograd (1968), Steedman (1984, 1996), Rohrmeier (2011)

and the present thesis. In Steedman’s (2000) terms, such a tree describes the harmonic

semantics underlying the musical surface and need not necessarily correspond to the

derivational structure that produced it from the surface.

Lerdahl & Jackendoff (1983, notes, p. 338) make several criticisms of Keiler’s

approach, some potentially relevant to related approaches (including Steedman, 1996;

Rohrmeier, 2011; and this thesis). The first is that the theory appears to be restricted to

a particular idiom of tonal music. Of course, a theory concerned with tonal harmonic

structure is somewhat limited in its applicability by definition: non-harmonic musics

are not subject to such analysis. A reasonable starting point for a theory which, like

Keiler’s, sets out to account for the cognition of harmonic structure in a concise set of

rules is to begin by formalizing properties observable in some idiom, but believed to

generalize beyond it. Lerdahl & Jackendoff argue themselves that such a theory should

aim to capture the intuitions of a listener experienced in a particular musical idiom.

Keiler’s analyses may be seen, like those of GTTM, as examples of the application of

a grammar for a particular idiom. Under something like the correspondence between

grammars of language and music suggested by Longuet-Higgins & Lisle (1989), other

idioms can be expected to require at least some changes to the grammar.

Most of the remaining criticisms are based on the inability of the grammar exem-

plified in Keiler’s constituent analyses to handle specific constructions. Far from being

fatal criticisms of the approach, they are observations on the limitations of the specific

grammar – hardly surprising given the preliminary nature of the discussion and the

limited scope possible in a book chapter. In the present thesis, I shall adopt the view of

a grammar of tonal harmony which attempts to capture harmonic analyses of a specific

musical genre, but which does so by proposing rules of which some apply generally to

a broad spectrum of Western harmonic genres.

2.2.3.5 Rohrmeier

Among recent work on computational analysis of harmony, that of Rohrmeier (2011)

has particular bearing on the approach of the present thesis. Like the harmonic analyses

of Keiler (1978), Steedman (1984, 1996) and Wilding (2008), Rohrmeier’s grammar

assigns a structure to chord sequences. However, unlike the previous work, it provides

grammatical rules demonstrated to be capable of interpreting a wide range of musical

inputs. Like all of these approaches, but in contrast to GTTM, Rohrmeier rejects the

idea of extending the same grammatical analysis to higher levels of structure (such as

the overall form of a piece), stating that the relationship between harmonic structure

and higher-level structure should be viewed as the subject of a separate study.

The harmonic theory embodied in Rohrmeier’s grammar is based on the Rieman-

nian tradition of functional chord analysis (Riemann, 1893). The grammar incorpo-

rates several levels of analysis. At the lowest level, a chord expressed in the musical

surface is interpreted as a Roman numeral scale degree within the current key. At

this level, the analysis resembles the Roman numeral analysis of Raphael & Stoddard

(2004). The next level, the functional level, captures a recursive structure of domi-

nant, subdominant and tonic regions as a phrase-structure grammar. At this level, a

region, made up itself of embedded regions, may be interpreted as fulfilling a particu-

lar function within a larger region. At the highest level, the phrase level, local regions

individually analysed as recursive functional structures are combined in sequence into

a single tree that serves as an interpretation of the whole piece.

Although notational differences between the formalisms obscure the connection,

Rohrmeier notes that his grammar is closely related to Steedman’s (1996). The har-

monic structures that the grammar is capable of analysing are very similar to those han-

dled by Wilding (2008) and the present thesis, differences arising largely as a result of

the musical genres that have served as the primary object of study during development

of the grammars.

De Haas et al. (2009) implemented a parser that uses the grammar in a system for

automatic harmonic interpretation. Certain additional constraints needed to be placed

on the grammar’s rules, including limiting modulation, to reduce the ambiguity of in-

terpretation and make parsing practicable. As in natural language parsing, a full parse

using Rohrmeier’s grammar often results in a large number of ambiguous interpreta-

tions. In de Haas et al.’s implementation, a single interpretation was chosen by a score

computed from hand-set weights on each rule. Experience in computational linguistics

has demonstrated that realistic grammars for practical wide-coverage parsing of natural

languages exhibit a high degree of ambiguity. A grammar of harmonic structure which

permits the interpretation of free modulation between keys can be expected to deliver

a very large number of possible interpretations of any reasonable length of chord se-

quence. This thesis examines some ways of using techniques adapted from the parsing

problem in natural language to address the ambiguity of harmonic interpretation in a

parser using a grammar similar to Rohrmeier’s.

2.2.3.6 Longuet-Higgins

Longuet-Higgins (1962a,b) presents a view of the formalization of the perceived mu-

sical object, on the surface quite different to those described above. Tonal harmonic

analysis is related to a formal representation of music theoretic relations between notes

as a discrete three-dimensional tonal space. The tonal space, which has its origins in the

work of Euler (1739), Helmholtz (1862) and Ellis (1874), formalizes theories of West-

ern tonality. Longuet-Higgins & Steedman (1971) formalize as a computer program

the intuitions of a listener that lead to the unconscious interpretation of a sequence of

notes of a melody played on a keyboard in terms of the distinctions made in the tonal

space. Longuet-Higgins (1978) acknowledges that more rules than those implemented

by the program must exist to guide the listener through the tonal space, for example

when a piece modulates to a new key.

The most important contribution of this work was to demonstrate the tonal space

as providing a theoretical model as a basis for a computational theory of the perception

of tonal music. As discussed above, several authors have described hierarchical recur-

sive structures of dependencies underlying the organisation of tonal harmony (Keiler,

1978; Steedman, 1984, 1996; Pachet, 2000; Rohrmeier, 2011). Whilst tonal analysis

in the space of Longuet-Higgins (1962a) may seem at first quite unrelated to the sort

of analysis suggested by these authors, an interpretation of the tonal relations between

the roots of the chords is implied by, for example, Keiler’s tree structures, since they

represent the tonal relations that connect non-adjacent chords. The fragmentary CCG

grammar of Steedman (1996) shows how both of these relationships may be thought

of as components of the harmonic semantics of the music, in Steedman’s (2000) terms.

Wilding (2008) extends Steedman’s grammar to cover a wider range of tonal jazz chord

sequences. A harmonic grammar constructed in this way provides a formal connection

between harmonic structure and music theoretic tonal interpretation of chord roots and,

consequently, the notes of the musical surface.

Pachet (2000) proposes a system for hierarchical analysis of chords as derived

from familiar structures (for example AABA) and patterns (for example cadences and

turnarounds) resembling a generative grammar. He suggests that harmonic semantics

viewed in terms of the tonal space may provide a valuable alternative to a purely syn-

tactic treatment of harmonic structure.

2.2.4 Probabilistic Music and NLP

A common approach to probabilistic modelling of music has been to use Markov mod-

els, in which the probability of each event is dependent only on the preceding event.

Early models suffered from accounting for patterns without a notion of their depen-

dence on an underlying structure (Temperley, 2007). Hidden Markov models (HMM,

Rabiner & Juang, 1986) assume some level of musical structure underlying the events

of a musical surface, which itself is characterized by Markov models. Piston (1949,

p. 17) provides an early example of this approach in the form of a ‘table of usual root

progressions’, listing the most common transitions between chords in a Roman nu-

meral analysis, with some transitions that occur ‘sometimes’ or ‘less often’. HMMs

have been widely applied to musical analysis. Raphael & Stoddard (2004), for exam-

ple, use an HMM to perform analysis of keys and Roman numeral chords underlying

a musical performance. Illescas et al. (2007) present a graph-based computational

model of harmonic analysis based on the Riemannian notion of chord function and

also exploiting regularities in the transitions between consecutive chords. Such mod-

els may be sufficient for many practical purposes and are attractive for having a diverse

and well-studied array of algorithms for training and efficient analysis. However, they

fall short as a model of musical cognition, since they are unable to capture any long-

distance relationships in the underlying structures (Johnson-Laird, 1991). An HMM

is only able to capture dependencies between adjacent structural elements and, where

non-adjacent relationships exist, they are treated as noise.

Lindblom & Sundberg (1969) reject stochastic models on the basis of the inabil-

ity of Markovian models to capture higher-level structures, claiming certain parallels

between the structures underlying language and music. Many of the authors discussed

above have proposed models of musical structure that include structural relationships

between elements not adjacent in the musical surface. Over the intervening decades,

research in computational linguistics has shown Lindblom & Sundberg’s rejection of

all stochastic models to be mistaken, demonstrating that probabilistic models can be

used effectively alongside grammars that capture long-distance dependencies. Recent

research in NLP has seen great progress in the use of statistical models, trained on

large corpora of linguistic data, to overcome the problems of linguistic ambiguity

(Collins, 1999; Hockenmaier, 2003; Charniak, 1997; Nivre, 2010). Statistical parsing

is motivated both by the practical need to make feasible wide-coverage parsing despite

the many ambiguous analyses theoretically possible and by the search for ways of

modelling the heuristic, incremental processing of language that humans perform (At-

tneave, 1959; Hale, 2001, 2011; Levy, 2008). Probabilistic models provide a means

of modelling the relative plausibility of competing syntactic interpretations, including

rejecting the majority of a large number of implausible interpretations, on the basis

of previously observed data. Statistical parsing techniques have not yet been applied

to the task of automatic derivation of the structures underlying music (except by Bod,

2002a,b,c, for a different kind of musical structure to the harmonic structure consid-

ered here). They provide a way in which the benefits a probabilistic approach to music,

as advocated by Temperley (2007), can be combined with a model based on a theory

of the structure of harmony. The present thesis begins to explore this possibility.

Apart from the benefits discussed above to a theoretical account of cognition of

constructing it in explicit formal terms amenable to evaluation on the basis of its com-

putational properties, such an account opens up the possibility of simulating cognitive

processing by automating the formally stated rules and procedures. The field of NLP

explores this possibility for computational models of language processing. The imple-

mentation of a computational account of natural language meaning can be put to use,

for example, in querying databases to answer questions, to retrieve documents from a

corpus relevant to some scenario, or translating a text into a different language. Similar

tasks depend on being able to use a model of musical cognition such as those discussed

above to derive a musical interpretation automatically from an input.

Tasks in music information retrieval (MIR) have received particular attention in

recent years, such as querying a database of songs to find a given a melodic fragment

and measuring the similarity between two given songs. Two performances which ap-

pear quite different in a comparison between the properties of their sound waves or

between the timings of the performed notes may be easily recognized by a human lis-

tener as the same song. Harmony is one aspect of music on which a human judgement

of similarity may be based. To the extent that the hierarchical harmonic structures

proposed by some of the work discussed above are important to human cognition of

harmony, we can expect to be able to construct better measures of musical similarity

by incorporating a representation of these structures. By automating the derivation of

harmonic structure and other aspects of perceived musical structure, better systems can

be built to make human-like judgements about music. De Haas et al. (2009), for ex-

ample, apply Rohrmeier’s grammar to evaluation of harmonic similarity between two

pieces of music. However, few systems to date have made use of automatically derived

representations of harmonic structure of this sort.

Improvements can be expected from automatic harmonic analysis in other tasks,

also on the basis that a better model of human cognition of music permits a system to

emulate human judgements more accurately. Various forms of automatic generation

may benefit from a good model of harmonic analysis. One example is the genera-

tion of variations on a melody, in which the harmonic progression is typically altered

minimally and in a manner constrained by harmonic structure. Another example is

the generation of accompaniments to a melody, which involves the generation of pos-

sible coherent harmonic progressions to fit a melody, ideally constrained by typical

harmonic characteristics of a particular musical style.

Harmonic analysis also plays a twofold role in the task of transcription of musical

performances in score notation. Firstly, in deciding between multiple possible tran-

scriptions of the same performance – for example, choosing whether to treat a short

note as occupying a short metrical unit, as a grace note or as a mistake to be ignored –

a system may employ a musical equivalent of a language model. This is a model of the

plausibility of any given transcription – for example, how likely it is that a score would

have included a short metrical note at the point where it occurred in the performance.

A harmonic analysis, including a notion of long-distance tonal relations, may help to

construct a better language model for tonal music. Secondly, score notation distin-

guishes between enharmonically equivalent notes – for example, A] and B[ – which

are ambiguous in performed music. Enharmonic equivalents are distinguished in the

tonal space of Longuet-Higgins (1962a,b) and may be disambiguated by a harmonic

analysis, as shown by Steedman (1996).

2.3. Syntax in Language 23

In the present thesis, I apply grammatical analysis techniques to provide an account

of the syntactic mapping from music to its harmonic interpretation. The syntax is

closely related to that of Steedman (1996) and that of Rohrmeier (2011). De Haas

et al. (2009) implemented a parser for Rohrmeier’s grammar, but needed to reduce

the ambiguity of the grammar in order to make parsing of a wide range of harmonic

progressions possible in practice. I use statistical parsing techniques adapted from

NLP to make automatic interpretation feasible and give preliminary consideration to

the question of extending these techniques beyond the analysis of chord sequences,

in the style of Keiler (1978), Steedman (1996) and Rohrmeier (2011), to harmonic

analysis of performance data.

2.3 Syntax in Language

Human languages express structured meaning in a linear form, including relations be-

tween entities and events, often represented in distant parts of the surface form – a

sequence of words or sounds. The meaning of an utterance can be understood, at least

in part, by composition of the meaning of words or phrases of the utterance. It is usu-

ally possible to decompose this aspect of the meaning of a sentence recursively into

constituent phrases down to the level of individual words. In order to understand an

utterance, it is not enough for a listener to know the meaning of each of its words; they

must also know the rules that specify how these are composed and, in general, be able

to select from various possible sets of rules that each constitute a valid explanation of

the relationships between the constituents. The structures that may serve to perform

this composition constitute the syntax of a language.

Steedman (2000) describes the syntax of language as a characterization of the in-

terface between the surface form of a sentence (its words as spoken or written) and its

structured meaning. This approach to syntax requires that syntactic rules interpretable

grammatical constituents and be associated with some operation that combines the se-

mantics of the constituents to produce their combined meaning.

Formalizing the syntax of a natural language has been approached by attempting

to characterize in some concise representation all and only the sentences considered

permissible by native speakers of the language (Chomsky, 1965). Although such at-

tempts do not typically make explicit the semantic relations between constituents, the

syntactic structures are invariably motivated (albeit intuitively) on semantic grounds.

Consider, for example, the sentence Keats, who likes treats, eats beets. No phrase-

structure grammar of English would fail to treat the relative clause who likes treats

as a constituent. It seems intuitively obvious not only that it should be thought of as

such, but that it must be combined with Keats first, before it can be combined with the

rest of the sentence. This is not simply because this structure permits a more parsi-

monious description of the grammatically legal sentences of a language, but because

semantically the phrase serves as a modifier to the subject Keats. As we shall see

below, some grammar formalisms make explicit the reasons for their constraints on

constituent structure by representing the predicate-argument structure of the semantics

associated with each constituent. It follows that the syntactic part of such a grammar

exists precisely in order to map that structure onto the linearly ordered surface form in

which it is expressed in the language.

2.4 Combinatory Categorial Grammar

Combinatory Categorial Grammar (CCG, Steedman, 2000) is a grammar formalism

used for parsing natural language sentences to produce logical representations of their

semantics. It is lexicalized formalism: a grammar consists of a large set of rich struc-

tures, or categories, associated with lexical items (usually words) and a small set of

rules. The rules specify how the categories assigned to the words of a sentence may

be combined into a single interpretation. It is notable for the fact that it provides a

transparent interface between syntax and semantics along the lines indicated above.

Syntactic categories are used in a sentence’s grammatical derivation, each composed

of a syntactic type, constraining the rules, and a well-formed semantic interpretation.

Each combinatorial rule corresponds to an operation on the semantics of its inputs. The

formalism described in chapter 3 for processing the structures of musical harmony is

based closely on CCG as it is applied to language. This section gives an introduction

to the aspects of CCG relevant to its adaptation to harmonic analysis.

The syntactic structure of a sentence is derived using CCG by assigning a category

to each word in the input with a syntactic type determining what sorts of structure the

word may appear in. The category is chosen from a large lexicon. Pairs of adjacent

categories are combined according to the combinatorial rules, constrained by the syn-

tactic types of the categories, to produce a category spanning the portion of the input

spanned by both input categories2. The result is subject to further combinations with

2 In addition to binary rules, combining two adjacent categories into one, CCG also uses unary rules,operating on a single category. These will not be described here, since no unary rules are used in the

2.4. Combinatory Categorial Grammar 25

adjacent categories, until a single category is produced, spanning the whole input. This

process is referred to as parsing.

A small set of atomic syntactic types is used, such as S (sentence) and NP (noun

phrase). Complex types are built from atomic types using the / and \ operators. A

complex type X/Y denotes a function category that can combine with an argument

category Y to its right to produce a category X . Likewise X\Y indicates that a Y is

expected to the left and that the result will be a category X .

The set of rules used to combine categories may vary depending on the syntactic

structures to be captured by the grammar. The most basic combinators, used in any

variant of CCG, are the two function application rules, defined as follows. The symbols

> and < are used to identify the rules in derivations, such as example 2.1.

a. X/Y Y ⇒ X (forward, >)

b. Y X\Y ⇒ X (backward, <)

Example 2.1 shows the use of the function application rule in a simple syntac-

tic derivation. (Note that the term function is not related to the concept of harmonic

function introduced in the next section.) This syntactic derivation allows us to pro-

duce an interpretation for the full sentence, working downwards from the words of the

sentence.

(2.1) Keats eats beets

NP (S\NP)/NP NP>

Each syntactic category in the lexicon is also associated with a representation of its

logical semantics – a logical form – and the grammatical rules define how the logical

forms of their arguments are combined. The logical forms use the lambda calculus

(explained in section 3.2.1) to express how they will be combined with other logical

forms under the combinatorial operations associated with the rules. A function accept-

ing a single argument is expressed as λx.E, where E is an expression including x. The

application of a function f to an argument x is written f (x). The function application

rules in their full form are:

a. X/Y : f Y : x ⇒ X : f (x) (>)

b. Y : x X\Y : f ⇒ X : f (x) (<)

musical grammar of the next chapter.

Example 2.2 shows the derivation in example 2.1 again, now with a logical form

associated with each syntactic type3.

NP : keats′ (S\NP)/NP : λx,y.eats′(y,x) NP : beets′>

S\NP : λy.eats′(y,beets′)<

S : eats′(keats′,beets′)

Another pair of grammatical rules, forward and backward function composition, permit

function categories to be combined before their argument is available. The result is a

function category that may then be applied (using function application) to the argument

when it is eventually encountered. The rules do not permit any interpretations of a

full sentence that would not have been permitted by the function application rules

alone, but allow the same outcome to be produced by a different order of combinations.

This is important for, among other things, incremental analysis of a sentence. Their

semantics uses the composition operator, defined using the lambda calculus as: f ◦g≡λx. f (g(x)). The function composition rules are:

a. X/Y : f Y/Z : g ⇒ X/Z : f ◦g (forward, > B)

b. X\Y : f Z\X : g ⇒ Z/Y : g◦ f (backward, < B)

Example 2.3 demonstrates the use of the function composition rule.

(2.3) Keats will eat beets

NP : keats′ (S\NP)/V P : λx,y.will′(y,x) V P/NP : λz.eat ′(z) NP : beets′>B

(S\NP)/NP : λx,y.will′(y,eat ′(x))>

S\NP : λy.will′(y,eat ′(beets′))<

S : will′(keats′,eat ′(beets′))

Note that, although in this particular derivation the analysis of the tensed verb phrase

is left branching, the logical form that it builds is right branching and identical to that

in the alternative function application-only derivation, as its semantics requires.

Several other combinatory rules allow CCG grammars to capture linguistic phe-

nomena such as coordination and relativization, but details of these are not important

to the development of a harmonic grammar. The full range of reasons for including

function composition in the set of rules of a linguistic grammar need not detain us here,

3 We use an apostrophe to distinguish the language-independent meaning of a word, used in a log-ical form, from its written surface form. Thus, beets’ refers to the objects denoted, depending on thelanguage, by the words beets, betteraves, betor, teÔtla, etc.

2.5. Functional Structure in Harmony 27

but one is to do with the fact that constructions like coordination involving long-range

semantic dependencies treat incomplete fragments like will eat as constituents that can

be combined with others in derivations. Example 2.4 shows the use of the coordination

rule (not defined here) to combine bought and will eat into a single constituent that can

combine with beets (the semantics is omitted).

(2.4) Keats bought and will eat beets

NP (S\NP)/NP CNJ (S\NP)/VP VP/NP NP>B

(S\NP)/NP&

(S\NP)/NP>

Musical analysis involves similar long-range dependencies in the construction I refer

to as musical coordination and these can be treated using the same approach.

2.5 Functional Structure in Harmony

Since Rameau (1722), harmonic structure has typically been discussed in terms of

chords – groups of notes that are part of the underlying harmony of a passage of tonal

music. Chords are notated as a root pitch (A, B[, etc.) and a chord type symbol (empty

– major, ‘m’ – minor, etc.). One of the most common forms of harmonic analysis

today is Roman numeral analysis. Passages of music are assigned a key and divided

into chords, each analysed with a symbol that denotes its relation to the key. The

symbols are chosen from a vocabulary of seven chords rooted on the seven notes of the

scale and made up only of the notes of the scale. The key assigned by the analysis may

change during a piece.

The related approach of functional analysis originated with Riemann (1893). Chords

are ascribed a harmonic function, which describes the role they play in relation to their

harmonic context. In any particular key, multiple chords may fulfil the same function

and each resulting functional substitute is described using a distinct symbol. Several

authors have taken functional analysis as the basis for a formal account of the syn-

tax of tonal harmony (Keiler, 1981; Lerdahl & Jackendoff, 1983; Steedman, 1984;

Rohrmeier, 2011). Like the syntax of natural language, these accounts analyse har-

monic syntax using tree structures and have claimed that this syntax, like that of lan-

guage, features formally unbounded embedding of structural elements. I give here an

introduction to the aspects of harmonic structure that will be modelled later using for-

mal grammars. The functional view of harmonic structure adopted here has its roots in

the Euler-Riemannian tradition (Riemann, 1893) and closely follows the approaches

to harmonic analysis of Keiler, Steedman and Rohrmeier.

The functional harmonic analysis presented here is concerned only with the re-

lationships between chords which raise harmonic expectation and those which fulfil

that expectation. Huron (2006) discusses this and other forms of expectation in mu-

sic. Tonal expectation, that of relevance here, is described as a cognitive, rather than

perceptual, phenomenon. The structures analysed here are thus relevant to a descrip-

tion of musical perception, but the relationship between the cognitive structures that

underly human interpretation of harmony and perceptual responses is not simple. To

make predictions regarding listener responses to music on the basis of the theory of

cognition of harmony adopted here would additionally require a theory of incremen-

tal processing, taking account of working memory limitations. In particular, no direct

connection can be drawn between cognitive structures describing the final state of a

listener’s understanding and immediate perceptual responses during listening, as dis-

cussed in section 2.2.3.1. Naturally, harmonic expectation is only one aspect of music

contributing to a listener’s perception. Such predictions would, therefore, also rest on

a theory of the contributions of other aspects of music to the listener’s responses and

how these interact with the processing of harmonic structure. These issues are outside

the scope of the present thesis.

2.5.1 The Cadence and Harmonic Function

The key component of harmonic structure is the cadence, built from tension-resolution

patterns between chords. Large structures can be analysed as extended cadences, made

up of successive tension-resolution patterns chained together. The sort of harmonic

analysis that is described below captures the same relationships described by, among

others, Keiler (1981) and Rohrmeier (2011). The term cadence will be used throughout

this thesis to refer to all connected structures of tension-resolution patterns and never

in the more common sense of a particularly strong resolution that by convention marks

the end of a musical line or section.

Cadences come in two varieties. The authentic (or perfect) cadence consists of a

tension chord rooted a perfect fifth (or seven semitones, in equal temperament) above

2.5. Functional Structure in Harmony 29

D7G7 C

Figure 2.1: An extended authentic cadence in the key of C, a typical example of (tail)

recursion in music. The A7 acts as a dominant resolving to the D7, which in turn resolves

by the same relation to G7, which then resolves to the tonic C.

its resolution. This type of tension chord is defined in relation to its resolution as

having a dominant function. The plagal cadence consists of a tension chord rooted

a perfect fourth (five semitones) above its resolution. This type of tension chord is

referred to in relation to its resolution as having a subdominant function. In both cases,

the resolution chord is classified as a tonic chord in relation to the tension chord. The

classification of a particular occurrence of a chord identifies its harmonic function on

that occasion of use. The function of any particular chord is ambiguous and the same

chord may take on a different function on different occasions of use in the same piece.

An extended cadence occurs when a tension chord resolves by the appropriate in-

terval to a chord that is itself cadential, creating a further tension, which subsequently

resolves. Such a definition is recursive and extended cadences can accordingly be in-

definitely extended. An example is shown in the form of a tree in figure 2.1. Note

that, in terms of harmonic function as defined above, the G7 functions as a dominant

in relation to its resolution C and also as a tonic in relation to the dominant D7 chord4.

Chords that function as dominants are often partially, though never unambiguously,

distinguished by the addition of notes to the chord. In particular, the dominant sev-

enth, realized by the addition of the note a tone below the chord’s root and notated in

chord types with a superscript 7 (as in G7 above), enhances the cadential function of a

dominant chord and heightens the expectation of the corresponding tonic5. However,

this note may be omitted from a dominant chord, or may even be used in chords not

functioning as dominants. There is no similar signal of subdominant function in the

realization of the chords.

In certain contexts, one chord can be used in place of another with a different root

and fulfil the same harmonic function, a phenomenon referred to as chord substitution.

4 The D7 would often be referred to as a secondary dominant. I will use the term dominant to includesuch secondary harmonic function as well as further levels of recursion, or extended dominant where itis necessary to distinguish the recursive phenomenon from a chord with primitive dominant function.

5 The reasons for this are discussed by Steedman (1996) in terms of the tonal theory described below.

D7G7 t A7

D7G7 t

Figure 2.2: An extended authentic cadence featuring coordination, marked here by &.

The shared resolution is marked by a trace symbol t.

In some cases, this can be explained by reference to notes shared between the two

chords, whilst in others it is less clear why a substitution attested by conventional

music practice works as it does. An example of a commonly used substitution is the

tritone substitution, by which a dominant seventh chord appears in place of another

dominant seventh chord rooted three tones above it.

2.5.2 Coordination

A tension chord might not reach its resolution immediately. An unresolved authen-

tic (or plagal) cadence, such as D7 G7, creating an expectation of tonic C, may be

interrupted by a further cadence, say A7 D7 G7, creating the same expectation, where-

upon both expectations/tensions will be resolved by the same tonic C. This cadence

structure is shown as a tree in figure 2.2. I refer to this operation as coordination by

virtue of its similarity to a type of coordination (right-node raising coordination) in

natural language sentences like Keats bought and will eat beets (see the derivation in

example 2.4). A coordinated cadence may itself be embedded in another coordinated

cadence, just as in examples like Keats ((may or may not) have bought) but (certainly

eats) beets. An example of a cadence with several levels of embedded coordination

can be seen in the form of a dependency graph in figure 3.5 in the next chapter.

2.5.3 Jazz Harmony

The typical size and complexity of cadence structures varies with musical period, genre

and composer. The type of harmonic structure discussed above developed with and

was permitted by the tonal system described in detail below and complex structures

2.6. Tonality 31

came about only with the wide acceptance of the equally tempered scale (see sec-

tion 2.6.3). Much previous work on computational music analysis has focused on

genres in which complex extended cadences are rare, such as folk music (Bod, 2002a;

Temperley, 2007) or common practice Western art music (Raphael & Stoddard, 2004).

Tonal jazz standards are of particular interest for the form of analysis described here

because they tend to feature large extended cadences, often with complex embedding.

They also include known harmonic variations created using a well-established system

of harmonic substitutions, embellishments and simplifications.

Jazz standards are rarely transcribed as full scores, but are notated as a melody with

an accompanying chord sequence, such as those analysed in the previous section. Iden-

tification of the times at which chords change in a piece of music and some character-

ization of the chords is usually the first step in harmonic analysis, even when chord la-

bels are not available on a score. The transcribed chord sequences of jazz standards al-

low us to begin defining a grammar of the syntax of tonal harmony by sidestepping the

issue of how this initial analysis is performed. It is an assumption of this approach that

the chord sequences provide a useful approximation to an intermediate analysis that

would necessarily feature in the harmonic analysis of musical performances. In chap-

ter 6 we shall see how a chord-based analysis may provide a starting point for an analy-

sis that incorporates this first phase of analysis in order to operate on performance data.

2.6 Tonality

2.6.1 Consonance and Harmony

In analysing the roles of pitch in music, it is important to distinguish between con-

sonance and harmony. Consonance is the sweetness or harshness of the sound that

results from playing two or more notes together. Harmony refers to the expectation

(or tension) created by particular combinations of notes and the satisfaction of that

expectation (or resolution of the tension)6. Both of these relations over pitches are

determined by small whole-number ratios and are easily confused, but arise in quite

different ways.

6 The word tension in this context refers to a specific type of musical tension associated with har-monic expectation (Huron, 2006). Contrasting consonance and dissonance also create a tension of adifferent sort. A further notion of tension is invoked by Lerdahl (2001), which relates more closely tothe perceived response by a listener and should not be confused with the specific harmonic tension inquestion here.

The modern understanding of consonance originates with Helmholtz (1862), who

explained the phenomenon in terms of the coincidence and proximity of the secondary

overtones and difference tones that arise when simultaneously sounded notes excite

real non-linear physical resonators, including the human ear itself. These tones include

all integer multiples (and, in some cases, dividends) of the fundamental frequency.

Whilst the coincidence of lower ratios in general has a greater effect than higher on

consonance, there is no limit on the ratios that account for consonance. Helmholtz

(1862) describes first the physical principles that explain and characterize consonance.

He then formalizes the Western harmonic tonal system in terms of some related prin-

ciples, which we shall see below. Helmholtz is at pains to stress the importance of

distinguishing the naturally arising phenomenon of consonance from the relations that

exist between the notes of the tonal system (his italics):

Hence it follows,—and the proposition is not even now sufficientlypresent to the minds of our musical theoreticians and historians—that thesystem of Scales, Modes, and Harmonic Tissues does not rest solely uponinalterable natural laws, but is also, at least partly, the result of estheticalprinciples, which have already changed, and will still further change, withthe progressive development of humanity.

(Helmholtz, 1862, chapter XIII)

Harmonic analysis depends on a theory of tonality, which is concerned with the re-

lations between notes in the tonal system. The issue of consonance is largely irrelevant

to this, except in so far as it bears upon a historical description of the system’s devel-

opment. These two aspects of music both contribute to the same musical surface form

and but are commonly treated as theoretically distinct. Other aspects of music that

contribute to the same surface form, such as voice-leading, are likewise distinguished

on theoretical grounds. A listener’s perception of and responses to music are often a

result of a combination of different aspects of music supposed to require the cognition

of separate musical structures. The experiments of Krumhansl (1990) concern the hu-

man responses that result from this combination and thus incorporate issues of both

harmony and consonance. Johnson-Laird et al. (2012) attempt to account for a notion

of perceived dissonance by explicit combination of two distinct models. Unlike these

authors, I shall pass over the issue of consonance in the following description of tonal

theory. The present thesis is concerned only with the analysis of the structure of har-

mony, without any attempt to combine this, in the manner of Johnson-Laird et al., with

a model of other aspects of musical cognition to make predictions about a listener’s

perception.

2.6. Tonality 33

Figure 2.3: Part of the space of note names (Longuet-Higgins, 1962a,b). The circles

mark two points separated by the syntonic comma, the squares two points separated

by the diesis.

2.6.2 Just Intonation

Like the theory of consonance, the tonal harmonic system derives from combinations

of small integer pitch ratios. However, the harmonic relation is based solely on the first

three prime ratios in the harmonic series: ratios of 2, 3 and 5 (the octave, perfect fifth

and major third). The tuning based on these exact intervals is known as just intonation.

Any interval between two notes in just intonation corresponds to a frequency ra-

tio defined as the product 2x · 3y · 5z, where x,y,z are positive or negative integers. It

has been observed since Euler (1739) that the harmonic relation can, therefore, be vi-

sualized as an infinitely extending, discrete, three-dimensional space with these three

prime factors as generators. Since notes separated by octaves are essentially equivalent

for tonal purposes, it is convenient to project the space onto the 3,5 plane. The presen-

tation of this theory that is adopted here, which will later form the basis for harmonic

analysis, was originally developed by Ellis (1874) and formally developed in the form

it is seen here by Longuet-Higgins (1962a,b).

A fragment of the space can be seen in figure 2.3. Conventional musical notation,

by which the points in the space are named, does not distinguish notes separated by

the interval corresponding to the vector (4,−1), the so-called syntonic comma (for

example between the two Cs marked by circles in figure 2.3). Notes separated by the

much larger diesis, (3,0), have distinct names (for example G] and A[, marked by

squares) and are referred to as enharmonic tones. The points in figure 2.4 represent

[[V−

[[VII−

[II−

[[II−

[IV−

[VI−

III−

[[VI−

[III−

VII−

[[III−

[VII−

]IV−

Figure 2.4: Part of the space of tonal relations (Longuet-Higgins, 1962a,b).

(a) Major triad

(b) Minor triad

(c) Major seventh tetrad

Figure 2.5: The tonal relations of the notes of the major (a) and minor (b) triads and the

major seventh tetrad (c).

the tonal relation of each point to the point labelled I by means of Roman numerals

denoting the degrees of the scale. Here the added +s and −s rectify the notational

ambiguity of the syntonic comma. They distinguish, for example, the minor seventh

interval ([VII) from the dominant seventh ([VII−).

Longuet-Higgins & Steedman (1971) observed that the musical scales of the West-

ern tonal system used since the advent of harmony are convex sets of positions and

defined a Manhattan distance metric over this space. According to this metric, it can

be observed that the major and minor triads (such as CEG and CE[G) are two of the

closest possible clusters of three notes in the space and the triad with added major

seventh is the single tightest cluster of four notes (all shown in figure 2.5). The triads

and the major seventh tetrad are stable chords, raising no strong expectations, of the

kind that typically end a piece. Chords like the diminished chord and the dominant

seventh are more spread out, as shown in figure 2.6. This difference is vital to the

understanding of the creation and resolution of harmonic expectation.

The space of justly intoned intervals does not include ratios involving higher prime

factors than the three that define this space, 2, 3 and 5. Whilst these ratios are important

2.6. Tonality 35

(a) Dominant seventh

(b) Diminished seventh

Figure 2.6: The tonal relations of the notes of (a) a dominant seventh chord and (b) a

diminished seventh chord to the tonic of the key (dashed). Unlike the stable triads and

major seventh chords, the notes of these chords are spread out in the space around

their expected resolution (the tonic).

to the explanation of consonance, they do not play a role in the description of the tonal

harmonic system.

2.6.3 Equal Temperament

Over several centuries, it was gradually realized that the harmonic tonality of just in-

tonation could be approximated, first by slightly mistuning the fifths, equating the

positions distinguished by +s and −s in figure 2.4 (the syntonic comma), and then by

a greater distortion of the major thirds, equating enharmonic tones (C with B], etc.).

One way this is can be done is by spacing the 12 tones of the diatonic octave evenly,

so that all the semitones are (mis)tuned to the same factor of 12√

Since the eighteenth century most instruments have been tuned according to this

system of equal temperament. It has the advantage that all keys and modes can be

played on the same instrument without retuning. It was this that permitted the de-

velopment of greater harmonic freedom in the Romantic era and the sort of complex

harmonic structures that we have seen above. In terms of the tonal space, the result is

a distortion of the pitches so that the infinite space is projected onto a finite toroidal

space of just 12 points, looping in both dimensions to form a torus.

Each tonal relation between two equally tempered tones is (theoretically infinitely)

tonally ambiguous as to which vector in the full justly intoned space of figure 2.4 it

denotes. Equal temperament thus obscures the tonal relations underlying the tuning

system, making their interpretation in terms of the justly intoned intervals ambiguous.

The advantage of equal temperament is that it allows the hearer to resolve this tonal

ambiguity, effectively inverting the projection onto the torus and recovering the in-

G 44ˇ 3ˇ 4ˇ ˇ 2 ˇ > ˇ ˘

Figure 2.7: “Shave and a haircut, six bits”.

terpretation of the intervals in the full harmonic space. This is possible because the

harmonic intervals that are sufficiently close in justly intoned frequency to be equated

on the equally tempered torus are sufficiently distant in the full space for the musical

context to disambiguate them. For example, beginning from a G, an equally tempered

C, which could in isolation be interpreted as any of the points labelled as C or one of its

enharmonic equivalents (five possibilities are visible in figure 2.3), must be interpreted

as the point immediately to the left of the G, because that is the only interpretation that

is anywhere close to G.

It has often been claimed (Jeans, 1937; Bernstein, 1976; Tymoczko, 2006) that

the dominant seventh (the leftmost note in the cluster in figure 2.6a) played in equal

temperament approximates a tone based on the seventh harmonic (that is, a ratio of 7 :

4). Such a claim confuses the tonal harmonic relation, situated in the three-dimensional

system above, with consonance. In the tonal system, an equally tempered B[ may

be interpreted as related to C by either a dominant seventh ([VII− in figure 2.4) or

a minor seventh ([VII). The similarity of the equally tempered interval (21012 ) to the

seventh harmonic (74 ) provides no grounds for the addition of a fourth dimension to

the tonal space, since no combination of the interval with the other generators, or even

with itself, is proposed. This argument applies to the Western tonal harmonic system

only. Varieties of music other than those that use the harmonically motivated tonality

described here might take 7 : 4 as a primitive ratio, although it is doubtful that such a

music could support equal temperament or even a very extensive form of harmony.

2.6.4 Harmonic Interpretation

We are now able to define a more precise notion of harmonic interpretation, expressed

in terms of the theory of harmonic tonality outlined above. A harmonic interpretation,

as performed unconsciously by a listener familiar with the tonal system, can be seen

as a projection of the notes and the underlying chord roots of a piece of music from

the 4×3 space of equal temperament onto the infinite space of tonal relations. It is by

recognizing the tonal relations underlying the equally tempered music that a listener

can identify the key in which the music is played.

2.6. Tonality 37

Figure 2.8: Part of the tonal space labelled according to its projection under equal

temperament onto 12 distinct pitches (Longuet-Higgins & Steedman, 1971).

Let us consider the simple musical example shown in figure 2.7, a cliche of a

melodic conclusion (first used in this context by Longuet-Higgins, 1979). The passage

divides naturally into two parts, corresponding to the two bars as it is notated here. The

first moves in the underlying harmonic progression from the tonic C to the dominant

G, whilst the second returns to the tonic. The first movement, in the second half bar,

to the dominant creates an expectation of a cadential return to the tonic. The second

bar satisfies this expectation, albeit a little later than would seem natural, only reaching

the tonic in the second half of the bar. It is easy to verify this analysis by playing the

melody with accompanying chords: a triad of C major and a triad of G major (with

added dominant seventh). Although the score includes only a melody, a crucial part of

understanding its meaning even when it is performed alone is an unconscious analysis

by the listener of the implicit harmony.

Consider the tonal space shown in figure 2.8, labelled as by Longuet-Higgins &

Steedman (1971) using numbers to refer to the notes of an equally tempered keyboard

(C=0, C]=D[=1, etc.). We can express the above harmonic analysis of figure 2.7 in

terms of the movement of the underlying chord root in the space. The passage begins

on C, let us say the central point labelled 0 (circled). Halfway through the first bar it

moves to the G (7) one step to the right. Halfway through the second bar, it steps back

to the left to land on C (0) again. Equal temperament obscures the distinction between

the tone labelled 7 marked in figure 2.8 and all other 7s. The disambiguation in this

case, however, is trivial. The G is interpreted as a dominant in the key established by

the C (outlined) and is, therefore, at the adjacent position. Likewise, the C to which

the dominant resolves must be that immediately to the left of it.

Figure 2.9: A tonal space path for the extended cadence: C A7 D7 G7 C.

The full tonal space may be used as a model underlying a formal harmonic analy-

sis. The harmonic interpretation of a piece is the path through the tonal space traced

by the roots of the chords. If we establish that there is a dominant-tonic relationship

between two chords, we know that the underlying interval between the roots is a per-

fect fifth, a single step to the left in the space, as in the example above. Likewise, a

subdominant-tonic relationship dictates a perfect fourth, a rightward step. Where no

tension-resolution relationship exists, as between a tonic and the first chord of a ca-

dence that follows it, a movement to the most closely tonally related instance of the

chord root may be assumed (according to the Manhattan distance metric of Longuet-

Higgins & Steedman, 1971). The functional relationships between chords discussed

above thus correspond to spatial relationships between chord roots in the tonal space.

A functional harmonic analysis determines a path traced through the tonal space by the

chord roots underlying the music and the tonal relations between tonal regions through

which the path passes.

Figure 2.9 shows an example of a tonal space path for an extended cadence. The

perfect fifth relationship between the dominants and their resolutions is reflected in the

path. The first step on the path is not a tension-resolution relationship, so proceeds

to the closest instance of the A by the Manhattan distance metric. An analysis of the

structure of the harmony, that is the recursive structure of tension-resolution relation-

ships between pairs of chords thus dictates a particular path through the space for the

chord roots of the progression. This constitutes a projection of the equal-temperament

chord roots back onto their justly intoned pitches, since the points equated by equal

temperament, those similarly labelled in figure 2.8, are distinguished in the path.

The cadence Dm7 G7 C is often analysed as a substitute for F G7 C, in which the

F chord has the function of subdominant (or predominant). Alternatively, it could be

analysed as a substitute for the D7 G7 C cadence above. These two interpretations

2.6. Tonality 39

correspond to distinct analyses of the tonal relations between chord roots, as can be

seen in the corresponding tonal space path. The cadence is ambiguous: it supports

both of these interpretations. In jazz, such a cadence may be, and often is, extended

with recursive dominant-function chords. In these extended cadences, 7 and m7 chords

may be freely substituted without constituting a change to the functional structure of

the harmony. Where a recursive dominant relation is extended in this way, as in the

cadence E7 A7 Dm7 G7 C, only an analysis of the Dm7 as being followed by a left

step in the tonal space presents a satisfactory explanation of the presence of the out-of-

key chords E7 and A7 (by the recursive dominant relation). This type of recursion is

very common in jazz. Consequently, the examples in this thesis will employ the latter

interpretation, accepting that the former may be equally acceptable in some cases.

2.6.5 Example Interpretations

I present here several examples of harmonic interpretations expressed in terms of the

movements of chord roots through the tonal space. The next chapter introduces a

language to describe just such interpretations as these in a form that can be built up

compositionally from interpretations of individual chords.

Basin Street Blues Louis Armstrong

This song, in the key of B[, has a long introduction, mostly on a B[ chord. As is

usual in jazz standards, this is followed by a head, which may be repeated indefinitely

and improvised over. The repeated chords in the head are shown in example 2.5, with

their tonal space analysis.

(2.5) B[ D7 G7 A[6 G7 C9 F7 B[

Note first the use of the tritone substitution. The fourth chord, A[6 is analysed as a

substitution in place of a D7. The tonal space analysis does not include a A[: this

chord corresponds instead to the rightward movement from G back to the D.

What makes this chord sequence interesting is that the final resolution of the ca-

dence to a tonic on B[ is shown in the analysis to be the point a row above and four

steps to the left of where we began. This is because the cadence contains enough re-

cursive dominants, which must be analysed as leftward steps, that its starting point is

closer to another instance of B[ than the one it resolves to. The transition from the

initial tonic to the start of the cadence (D7) is taken to be by the closest possible tonal

relation – a major third.

This sort of trickery is permitted by equal temperament. If the piece were per-

formed on a justly intoned instrument with fixed tuning, a distortion of the perfect fifth

would be heard between the G7 and C9. If it were performed on a justly intoned in-

strument capable of adjusting its tuning during a performance7, the result would be a

gradual, but noticeable, drop in pitch with each repetition.

Prelude No. 1 (from Eight Short Preludes and Fugues) J. S. Bach

I have previously presented an analysis (Wilding, 2008) of the first prelude from

J. S. Bach’s Eight Short Preludes and Fugues (BWV 553–560) using the categories

of an earlier development of the musical syntactic formalism described in the next

chapter. The opening section is reproduced below.

��

� ��

� � ��

��

mf� � ��

��

� � ��

��

� � ��

��

� ��

��

� �

��

� ��

��

� ��

��

7 Thanks to modern technology, this is in fact possible nowadays, even without the services of Ellis’obliging London harmonium manufacturer (Helmholtz, 1862).

2.6. Tonality 41

��

� ��

��

� ��

��

Unsurprisingly, given Bach’s enthusiasm for well temperaments (tuning systems

which, like equal temperament, approximate just intonation in a way that permits free

key changes), this short passage moves through an extraordinary sequence of con-

nected tonalities that would have been impossible without contemporary tuning in-

novations. The tonal space analysis in example 2.6 is labelled with the bar and beat

numbers of the above score (in the form ‘bar.beat’). The function of a chord is made

clear by the voicing in many cases by presence of either a dominant seventh note (such

as the C in the first inversion D major chord of bar 3), implying a dominant function,

or a major seventh note (such as the F] of the G major chord that follows it), implying

a tonic function.

1.12.3 3.13.3

4.14.35.15.36.1

6.37.17.38.18.3. . .9.1. . .1.1

The piece opens with one and a half bars of a C major chord, establishing the piece’s

main key. The F major chord at 2.3 is interpreted as a subdominant, since the major

seventh of the previous C chord deters us from treating it as a dominant in relation

to this F. Consequently, the D major (3.1) must be interpreted as that closest to the

opening tonality of C (rather than that above and left of the F). The dominant D resolves

to a new tonic G at 3.3. A dominant E at 4.1 resolves briefly to a new tonic again,

this time A minor. Bar 5 contains another dominant D to tonic G resolution and bar

6 returns us momentarily to the main key of C. Whilst the interpretation so far has

moved by pairs of dominant-tonic resolutions, notice that the tonic A is followed by

a dominant a perfect fifth below, D, and the tonic G by an unprepared tonic C, also

a perfect fifth below. Bach appears to be exploiting a similar sort of trick to that we

have seen in jazz harmony, whereby a resolution of one dominant itself has a dominant

relation to the chord that follows.

Bars 6.3–9 do the same, taking us through a series of five perfect fifths. The re-

mainder of the passage simply moves between the subdominant, tonic and dominant

chords of G, the final pedal sequence returning us to the original tonality of C. Over

the course of the passage, Bach has taken us through a series of tonalities, using only

the dominant-tonic relation, that, justly intoned, would land us two syntonic commas

away from the first tonic.

Autumn Leaves Johnny Mercer

Part of the chord sequence for Autumn Leaves is analysed in the two paths of ex-

ample 2.7. (The skipped passage contains a repeat of the first section, followed by the

same two cadences in reverse order.) The main key of the piece is E minor and its

chord sequence consists mainly of IIm7 V7 I progressions in this key or the relative

major key of G major. As a result, the analysis is mostly uninteresting, but it does raise

one interesting issue regarding ambiguity.

(2.7) Am7 D7 GM7 CM7 F]ø7 B7 Em . . . F]ø7 B7[9 Em7 E[7 Dm7 D[7 CM7 B7[9 Em

DDT/D?DDDT

In addition to the points traced by the harmonic roots, the analysis above contains a

letter denoting the chord function (tonic, dominant or subdominant) of each chord. The

analysis of each cadence is simply two leftward steps onto its resolution (either E or

G). The second part of the analysis, of the part of the sequence after the ellipsis, begins

2.6. Tonality 43

F]ø7B7[9 Em7 E[7

Dm7D[7 CM7

Figure 2.10: Two ways of interpreting a passage from Autumn Leaves: (a) as two

separate cadences; (b) as a single cadence, in which the Em7 has a dominant function.

with another cadence onto Em, which is followed by a long extended cadence onto C.

The cadence includes two tritone substitutions (E[7 and D[7), resulting in a sequence of

surface chord roots moving down in semitones. Two ways in which this passage could

be interpreted are as follows: as a cadence resolving to an E minor tonic followed by

another resolving to C major; or as a single cadence in which the Em7 chord functions

as an extended dominant resolving to E[7 (which is a tritone substitution for A7). The

only difference between the two interpretations in the tonal space analysis is in the

annotation of the E root as a tonic in the former case or a dominant in the latter. The

fact that many recordings of the song use a chord sequence that does not permit the

second interpretation of this passage, together with the chord types on the first three

chords typical of a minor cadence ending, suggests that we must interpret the Em7 as

a tonic. Nevertheless, when this particular sequence is used, it seems to be possible to

hear the Em7 chord also as a dominant participating in the next cadence. A third, and

perhaps more plausible, way of viewing the connection between the Em7 and E[7 in

this latter analysis is to treat the Em7 as having two functions: a tonic at the right edge

of the first subtree of figure 2.10a and a dominant at the left edge of the second. Such

double functions are represented in the analyses of Rohrmeier (2011).

This is an example of an ambiguity in interpretation: these readings can be dis-

tinguished in the tonal space path analysis, when harmonic functions are included,

allowing us to propose multiple different analyses, expressing both formally and mak-

ing explicit the difference between them in terms of the music theory that underlies

the analysis. Such ambiguities, featuring multiple plausible alternative readings, are

common in music and it is essential to a theory of harmonic structure that it be capable

of explaining precisely the differences between the alternative readings. A similar type

of ambiguity in natural language gives rise to multiple readings of sentences like two

sisters reunited after 18 years at checkout counter.

2.7 Conclusion

This chapter has put the present thesis in the context of research in music cognition

and computational linguistics and laid out the theoretical background for the approach

taken in later chapters. Constructing models of the structure of tonal harmony is im-

portant both for modelling human cognition of music and for practical tasks in auto-

matic music processing. Formal grammars modelling syntactic processing of natural

language to produce semantic representations will form the basis for the approach to

formal harmonic analysis taken in the next chapter. In particular, the grammar formal-

ism of CCG will be adapted as appropriate to tonal analysis and the basic elements of

the formalism have been described here as they are applied to linguistic analysis.

The result of harmonic analysis is an identification of the tonal relationships be-

tween the chords of the musical material. This chapter has introduced the sorts of

structure formed by these relationships and related them to the interpretation of the

music in terms of the three-dimensional tonal space of Longuet-Higgins (1979). A

formalization of the two views of harmonic analysis introduced here – the decompo-

sition of the hierarchical structure of cadences and the corresponding path of chord

roots through the tonal space – will both be later treated as the output of the process of

automatic harmonic analysis and used as the basis for quantitative evaluation metrics.

CHAPTER 3A Grammar for Tonal Harmony

3.1 Introduction

The process of harmonic analysis involves the inference of tonal relations between the

chord roots underlying passages of a piece of music. As we have seen, in figure 2.2,

a particular analysis may specify the tonal relation between two chords that are not

adjacent in the musical surface. Given some formal language to express these rela-

tions, harmonic analysis becomes the process of mapping the musical surface onto an

expression in that language, or a set of possible expressions representing alternative

analyses. As in linguistics, a generalized mechanism for mapping the surface form to

its analysis can be characterized using a grammar. The grammar expresses the neces-

sary constraints on the types of constituents that can be combined during the analysis

process and the form of the resulting analysis.

Steedman (1996) proposed a small generative grammar, using an adaptation of

CCG to harmonic syntax, covering a narrow musical domain – chord sequences of

variations on the twelve-bar blues – and a formal language sufficient to express har-

monic interpretations of these chord sequences. The present author expanded both

the harmonic analysis language and the syntactic formalism to permit interpretation

of a wider range of chord sequences than just twelve-bar blues (Wilding, 2008). This

chapter presents some small further developments of the harmonic analysis language

46 Chapter 3. A Grammar for Tonal Harmony

(section 3.2), including a new presentation as dependency graphs. There follows a

substantial improvement to the syntactic formalism (section 3.3). The resulting har-

monic adaptation of CCG, apart from using a more intelligible notation, permits a

concise handling of a wider array of harmonic structures, most notably covering the

long-distance relations created by coordination. Section 3.4 defines a grammar, con-

sisting of a lexicon of harmonic interpretations of chords and a set of rules governing

how the interpretations can be combined. The grammar can be used to automatically

interpret chord sequences, including recognizing a variety of chord substitutions used

by jazz composers and performers.

Whilst the grammar presented here is designed to interpret a particular musical

genre, much of it could equally be applied to the analysis of other tonal harmonic

genres. Adapting the grammar to another genre would primarily involve modifications

to the lexicon to add chord substitutions common in that genre and to remove some

specific to jazz that are not used in that genre.

3.2 Tonal Harmonic Interpretation Semantics

Chapter 2 introduced the three-dimensional tonal space of Longuet-Higgins (1962a,b)

as a model for formal tonal analysis and a two-dimensional projection of the space for

harmonic analysis of the chord roots underlying a piece of music. In this section, I

define a language of harmonic analysis resembling the predicate logic used to repre-

sent natural language semantics in section 2.4. It expresses a harmonic analysis as a

specification of the movements and points in the two-dimensional tonal space of the

path traced by the chord roots. An expression in this language represents a harmonic

analysis of a passage of music. The language is based closely on that described by

Steedman (1996) and Wilding (2008).

A harmonic analysis can be constructed by treating the analysis as analogous to

the semantics of a sentence in natural language. A syntax of tonal harmony formalizes

the relationship between the musical surface form – the chord sequence or passage

of notes – and the harmonic structure1. I will henceforth refer to expressions in the

language defined below as the semantics of the music, or its logical form.

1 Although the trees in section 2.5 resemble the phrase structure trees often used in linguistics torepresent graphically the syntactic structure of a sentence, they in fact encode the harmonic relationsexpressed more formally in this section and should, therefore, be considered diagrams of the harmonicsemantics, in the terms of Steedman (2000).

3.2. Tonal Harmonic Interpretation Semantics 47

Logical forms in this language are compositional: the logical form for a piece of

music can be composed from the logical forms of subsequences of the piece, using

a suitable syntax and using the lambda calculus to express gaps in the analysis that

will be filled by other partial logical forms. Thus individual chords can be assigned an

interpretation as a logical form, along with a syntactic type to constrain the ways this

can combine with the interpretation of other chords (see section 3.3). An interpretation

(or multiple interpretations) of the full piece can be produced by combining them, just

as the meaning of Keats will eat beets was built up from the meaning of its individual

words in example 2.3.

3.2.1 The Lambda Calculus as Notation

The lambda calculus provides a notation for expressing variable binding in functional

expressions.

Consider the semantics of example 2.2, reproduced in example 3.1:

keats′ λx,y.eats′(y,x) beets′>

λy.eats′(y,beets′)<

eats′(keats′,beets′)

The predicate eats′ represents the action of eating, taking the agent (the eater) as its

first argument and the patient (the eaten) as its second. The semantics of eats is this

predicate, with gaps left for the arguments, marked by the variables x and y. Recall

that the > and < combinatorial rules (forward and backward function application) are

associated with the operation of function application in the semantics.

The lambda calculus allows us to bind a variable x in an expression, creating a

function which, when applied to an argument A, will return the same expression, but

with A substituted for every occurrence of x. The operation of binding a variable is

notated using λ and is referred to as lambda abstraction:

f = λx.E, where E may include x as a free variable

Where more than one variable is bound at the same time by immediate nesting of

lambda abstractions, only one λ is written, with comma-separated variable names:

f = λx,y.E

≡ λx.λy.E

In example 3.1, the lambda calculus allows us to build the semantics of eats as a

function which takes two arguments and produces a new logical form consisting of

the predicate eats′ applied to the function’s arguments (in reverse). The corresponding

syntactic category (S\NP)/NP ensures that the first argument to the function is the

patient of the action, in the object position (right of the verb in English), and the second

argument is the agent, in the subject position (left in English).

Example 3.2 shows how similar logical forms could be used in a passive construc-

tion to produce an identical interpretation of the whole sentence. The only difference

between the logical form for are eaten by here and eats above is the order of the vari-

ables as arguments to the predicate, now eats′(x,y) as opposed to eats′(y,x): the first

argument is now treated as the agent and the second as the patient.

(3.2) Beets are eaten by Keats

beets′ keats′λx,y.eats′(x,y)

>λy.eats′(keats′,y)

<eats′(keats′,beets′)

Using the lambda calculus to express variable binding in functions and a formal lan-

guage for the sort of harmonic interpretations described in section 2.6.4 will allow us

to associate partially formed harmonic interpretations with constituents of a harmonic

progression and compose them into an interpretation of the full progression.

3.2.2 Interpretation of Tonics

The semantics of a tonic chord is represented in a logical form as a point in the tonal

space which is constrained to be one that is mapped by equal temperament to the

chord’s root as written. This still permits a theoretically infinite set of points that could

be used to interpret the chord’s root. For an isolated tonic logical form, there is no

reason to constrain which of these points is chosen: its relation to other points in a full

logical form will be fully determined by constraints imposed by the rest of the logical

form. The logical form is a coordinate in a 4×3 enharmonic space identifying this

infinite set. The enharmonic space can be seen as the projection of the full tonal space

onto equal temperament, or equivalently as an infinite set of 4×3 subspaces in the tonal

space, as shown in figure 3.1. The notation 〈x,y〉 denotes an enharmonic coordinate, as

distinct from the coordinate or vector (x,y) in the full space. The coordinate, therefore,

Figure 3.1: Enharmonic blocks at the centre of the space. A position within one of

these 4× 3 blocks is equated by equal temperament with the same position in every

other block.

lies between 〈0,0〉 and 〈3,2〉. Effectively, this means that once such a coordinate 〈x,y〉has been chosen for a chord, issues of chord substitution have been resolved, but the

projection from equal temperament onto the full space of tonal relations would be

meaningless for a single isolated chord root. In the context of a full harmonic analysis

represented by a fully composed logical form, the coordinate will fully determine the

relations between the chord root and those around it.

All logical forms are lists, notated with square brackets, since a harmonic inter-

pretation is a sequence of interpretations of tonic passages and cadences. The logical

form of a single tonic chord is a single-element list, containing such a coordinate. A

tonic may be concatenated with other units, or may serve as the resolution of a ca-

dence. Example 3.3 shows some interpretations of chords as tonics. The final example

is an interpretation of the two chords in sequence both as tonics, implying that there

has been a sudden change of key2. Note that the points on their own represent points

within the 4×3 enharmonic space, but that the 〈0,2〉 that follows a 〈0,0〉 must refer to

the point underneath the 〈0,0〉, for reasons explained fully below.

2 Implausible as this may look, it is, in fact, the transition from the A-section to the bridge of ComeFly With Me.

(3.3) CM7 ⇒ [〈0,0〉]

A[6 ⇒ [〈0,2〉]

C A[M7 ⇒ [〈0,0〉,〈0,2〉]

3.2.3 Interpretation of Cadences

The semantics of an authentic or plagal cadence resolution is a predicate represent-

ing a movement in the tonal space. An extended cadence, of the sort described in

section 2.5.1, is interpreted as the recursive application of each movement to its reso-

lution.

Authentic (dominant) cadences are leftward steps and use the leftonto predicate.

Just as, in the linguistic logical forms, the predicate eats′ served as a notation for the

concept of eating, here the leftonto predicate stands for a direct movement in the tonal

space. Plagal (subdominant) cadences are rightward steps and use the rightonto predi-

cate. For example, a single dominant chord G7 resolving to a tonic C would receive the

logical form leftonto(〈0,0〉), whilst D7 G7 C would receive leftonto(leftonto(〈0,0〉)).A special behaviour is defined in the case of the application of a unary predicate,

like leftonto, to a list. The following reduction causes the predicate to be applied to the

first item in the list, so that the result is a list:

pred([X0,X1, ...])⇒ [pred(X0),X1, ...]

Example 3.4 shows an interpretation of an extended cadence with two dominants

– the familiar IIm7 V7 I – derived from the logical forms of the individual chords. The

resulting tonal space path movements are shown to the right.

(3.4) Dm7 G7 C

λx. leftonto(x) λx. leftonto(x) [〈0,0〉]>

[leftonto(〈0,0〉)]>

[leftonto(leftonto(〈0,0〉))] D[

We will see later how the syntactic types associated with logical forms constrain the

rules that may combine them in derivations. For now, derivations are shown without

the syntactic types of the constituents.

The ◦ operator denotes function composition, defined as it was for linguistic se-

mantics:

f ◦g≡ λx. f (g(x))

Here f and g may stand for a predicate like leftonto. As in the logical semantics of

language, functions can be combined using composition. The resulting function f ◦g

can then be applied to the argument x, producing the same logical form that would have

been produced by first applying g to x, then f to the result: that is, ( f ◦g)(x)≡ f (g(x)).

In the harmonic semantics, this allows us to derive a functional interpretation of a

whole cadence before seeing its resolution, as in example 3.5. It should be noted

that this particular derivation of the extended cadence is now left branching, but that

its logical form is identical to that given by the alternative derivation in the previous

example (just as in example 2.3 in chapter 2) and has a right-branching embedding.

(3.5) Dm7 G7 C

λx. leftonto(x) λx. leftonto(x) [〈0,0〉]>B

λx. leftonto(leftonto(x))>

[leftonto(leftonto(〈0,0〉))]

3.2.4 Coordination of Cadences

Logical forms representing unresolved cadences can be coordinated to represent their

sharing of the eventual resolution. This is represented by a special operator ∧. This

simply conjoins the cadence logical forms, which are always functions expecting the

resolution as their argument.

Example 3.6 shows the conjunction of two unresolved cadences into a single logi-

cal form. (The predicate leftonto is abbreviated in the following examples to L for the

sake of space.)

(3.6) A7 Dm7 G7 Dm7 G7

λx.L(x) λx.L(x) λx.L(x) λx.L(x) λx.L(x)>B >B

λx.L(L(x)) λx.L(L(x))>B

λx.L(L(L(x)))&

λx.L(L(L(x)))∧λx.L(L(x))

The ∧ symbol is used because of the structural similarity between this phenomenon

and coordinating conjunction in language, whose semantics uses the logical conjunc-

tion operator ∧. The musical ∧, however, does not reduce in a way that might be

expected of its logical counterpart: the functions denoting partial cadences are not ac-

tually applied to their resolution, but maintained separately in the logical form. Note

also that, unlike the logical ∧, this operator must preserve the order of its arguments.

A∧B 6≡ B∧A

The operator is associative and will always be normalized to its unbracketed form on

the left-hand side of the equivalence:

A∧B∧C ≡ (A∧B)∧C

≡ A∧ (B∧C)

The result of a coordination is treated as a function that can be applied to its reso-

lution, as in example 3.7. Theoretically, each of the individual partial cadences could

be applied to the resolution to retrieve its semantics, but for now it is important for

the logical form to retain the structure of the coordination that was represented for a

similar cadence in the tree of figure 2.2.

(3.7) A7 Dm7 G7 Dm7 G7 C

[〈0,0〉]λx.L(L(L(x))) λx.L(L(x))

&λx.L(L(L(x)))∧λx.L(L(x))

>[(λx.L(L(L(x)))∧λx.L(L(x)))(〈0,0〉)]

More than two cadences can be coordinated to share the same resolution:

(3.8) Dm7 G7 Dm7 G7 Dm7 G7 C

[〈0,0〉]λx.L(L(x)) λx.L(L(x)) λx.L(L(x))

&λx.L(L(x))∧λx.L(L(x))

&λx.L(L(x))∧λx.L(L(x))∧λx.L(L(x))

>[(λx.L(L(x))∧λx.L(L(x))∧λx.L(L(x)))(〈0,0〉)]

The result of a coordination, once applied to its resolution, may become the resolution

of a preceding recursive cadence step, as in example 3.9.

(3.9) A7 Dm7 G7 Dm7 G7 C

λx.L(x) λx.L(L(x)) λx.L(L(x)) [〈0,0〉]&

λx.L(L(x))∧λx.L(L(x))>

[(λx.L(L(x))∧λx.L(L(x)))(〈0,0〉)]>

[L((λx.L(L(x))∧λx.L(L(x)))(〈0,0〉))]

However, the tonal space path dictated by this logical form is identical to that pro-

duced by composing the A7 with the following Dm7 G7 before coordinating, derived

in example 3.6, leading to the following definition of equivalence.

A((B∧ ...)(C))≡ (A◦B∧ ...)(C)

In order that this equivalence is easily recognizable, expressions of the form of the left-

hand side will be reduced to the form of the right-hand side wherever possible. The

logical form of example 3.9 thus reduces to that of example 3.6.

3.2.5 Multiple Cadences: Development

The language is so far capable of expressing a tonal space interpretation of a complex

cadence structure, including recursion and coordination. A cadence structure may

resolve to a tonic, or a tonic may constitute the whole structure. In order to be able to

interpret a full piece of music, we must be able to join together many of these structures

in sequence. Recall that all fully reduced logical forms are lists (so far, single-element

lists). A sequence of such tonal structures is represented simply as the concatenation

of these lists, an operation I shall refer to as development.

Example 3.10 shows a pair of resolved cadences being combined in this way and

example 3.11 shows a tonic passage combining with a subsequent resolved cadence.

(3.10) Dm7 G7 C G7 C

λx.leftonto(x) λx.leftonto(x) [〈0,0〉] λx.leftonto(x) [〈0,0〉)]> >

[leftonto(〈0,0〉)] [leftonto(〈0,0〉)]>

[leftonto(leftonto(〈0,0〉))]dev

[leftonto(leftonto(〈0,0〉)), leftonto(〈0,0〉)]

(3.11) C Dm7 G7 C

[〈0,0〉] λx.leftonto(x) λx.leftonto(x) [〈0,0〉]>

[leftonto(〈0,0〉)]>

[leftonto(leftonto(〈0,0〉))]dev

[〈0,0〉, leftonto(leftonto(〈0,0〉))]

3.2.6 Colouration: Empty Semantics

Some brief harmonic excursions add colouration to a piece, but contribute little to the

functional structure of the harmony. It is convenient to ignore these for the purposes

of harmonic analysis. A typical example is the sequence I IV I, often played during

long passages of a I chord. This could be analysed as a form of plagal cadence and in

a fine-grained analysis this might be appropriate. Another example is passing chords,

which resemble diminished seventh chords or diminished triads, but have no harmonic

function.

Such chords can be ignored by assigning them a logical form that is the identity

function: λx.x. Example 3.12 shows this in action on a I IV I sequence.

(3.12) C F C

λx.x λx.x [〈0,0〉]>

[〈0,0〉]>

[〈0,0〉]

3.2.7 Extracting the Tonal Space Path

3.2.7.1 Path Constraints

A logical form can be seen as denoting a set of constraints on the path through the

tonal space traced by the roots of the chords. The full path can be inferred from a fully

formed logical form, given an arbitrary starting point. Since the tonal space is a space

of tonal relations between chords, absolute position in the space is meaningless. The

arbitrary choice of the starting point of a path sets a reference point to begin the series

of tonal relations.

Let us first examine the constraints encoded in the various types of predicate. The

movement made by leftonto(p) begins one step in the grid to the right of the first

point of the path p. Given an unambiguous resolution to a point (x,y), the whole path

leftonto(leftonto((x,y))) is also unambiguous. Two cadences that share a resolution

through coordination are constrained to end at the same point, since their paths are

constrained relative to their shared resolution. There is no obvious constraint between

items in a list, which are either tonics or resolved cadences. They are, therefore, linked

by the closest tonal relation (the smallest possible Manhattan distance) that satisfies all

other constraints.

For example, consider the two logical forms shown in figure 3.2 with their tonal

space paths. The start of the second item in logical form (a) is dependent, ultimately,

on the cadence resolution 〈0,0〉. This point, as distinct from (0,0), is ambiguous:

we can choose for it any of the infinite points that lie at (0,0) within their enhar-

monic block. Taking the start of the path to be at the central (0,0), we choose the

same point for the end of the second item, since it puts the start of the second item,

leftonto(leftonto(〈0,0〉)), as close as possible to (0,0). A choice of (−4,1) (dashed)

for the final resolution would also have been permitted by other constraints, but would

have resulted in a larger jump between the two path fragments.

In logical form (b), on the other hand, the second item begins at a point further

from its ending, since it includes three left steps. In this case we must choose (−1,1)

as the start point for the second item by putting its resolution 〈0,0〉 at (−4,1).

3.2.7.2 The Algorithm

Algorithm 1 uses the information encoded in a logical form to produce the correspond-

ing interpretation as a path through the tonal space. As well as demonstrating that any

(a) [〈0,0〉, leftonto(leftonto(〈0,0〉))]

(b) [〈0,0〉, leftonto(leftonto(leftonto(〈0,0〉)))]

Figure 3.2: Tonal space paths corresponding to two logical forms. Arrows join consec-

utive chord roots into a path. (a) begins at C, (0,0), jumps to D, (2,0), and left-steps

back to C. (b) also begins at C, but jumps to A, (−1,1), and left-steps to a different C,

(−4,1).

logical form is interpretable as an analysis in the tonal space, there are circumstances

in which this transformation is of use. In section 5.3.7.1, the path is used to define a

harmonic similarity metric to compare a tonal space interpretation output by the sys-

tem to a human-annotated interpretation. The paths compared are those produced by

this algorithm.

Algorithm 1 finds the path through the tonal space, expressed as a list of fully

specified two-dimensional coordinates, that respects the constraints on tonal relations

expressed in a logical form. Recall that a logical form is a list of fully resolved cadence

and tonic interpretations. The algorithm iterates over the items in the list, calling the

procedure cadencepath on each, which finds a path for the item that respects its in-

ternal tonal relations. The only constraint between these items is that the first point on

a path produced by cadencepath is taken as that most closely tonally related to the

last on the previous path. This can only be enforced once the constraints internal to

the cadence have been satisfied, at which point the resulting path is shifted in the tonal

space such that it starts as close as possible to the point reached prior to its beginning.

Algorithm 1: path(lf) – output tonal space path for a logical form

1 endpoint← (0,0)

2 foreach cadence in lf do3 points← cadencepath(cadence)

4 shift points to make closest connection with endpoint

5 print(points)

6 endpoint← last(points)

Procedure cadencepath(cad) – tonal space path for a cadence (see algorithm 1)

1 path← [root(cad)]

2 foreach predicate applied to root of cad, innermost first do3 if predicate = rightonto then prepend(path,path0 +(−1,0))

4 else if predicate = leftonto then prepend(path,path0 +(1,0))

5 else if predicate = (C0 & . . . & Cn) then6 resolution← path0

7 foreach Ci←Cn, . . . ,C0 do8 cadpath← cadencepath(Ci(resolution))

9 remove last point from cadpath

10 prepend(path,cadpath)

11 return path

The procedure cadencepath produces a path from the logical form of a single

cadence. It begins by taking the root of the predicate structure – that is, the tonic to

which it eventually resolves – to be that closest to (0,0), relative to which the rest of

the path fragment will be built. The algorithm then works its way outwards through

the predicate applications, adding points to the path. Predicates leftonto and rightonto

take a single argument and add a point to the start of the path respectively one step to

the right and left of the current first point. A coordination structure contains several

structures representing unresolved cadences. They all must resolve to the argument to

which the coordination is applied. Therefore, each cadence in turn is transformed into a

path by a recursive call to cadencepath, taking the resolution to be the point currently

at the start of the path. The resulting path fragment for each cadence is prepended to

the path (removing the point that was treated as the resolution, so that it is not repeated

at the end of each cadence).

A hearing is scheduled on the issue today .

Figure 3.3: Linguistic syntactic dependency graph.

3.2.8 Representation as Dependency Graphs

A dependency graph is a common representation of syntactic relations between the

words of a sentence in NLP. Figure 3.3 shows an example of a syntactic dependency

graph for a sentence. The syntactic structure of the sentence is represented on the

sentence itself by means of directed arcs drawn wherever a syntactic dependency exists

between a pair of words. A label may be associated with each arc to indicate the

type of dependency. For example, hearing (along with its dependent a) is marked

as dependent on is as its subject, and today as a temporal adverb attaching to the verb

scheduled. A single root node dominates the whole graph, with the single highest-level

node attaching to it, denoted by a vertical arrow.

The harmonic relations expressed in the musical logical forms can be fully repre-

sented as dependency graphs, similar to linguistic syntactic dependency graphs, giving

a more easily intelligible representation of analyses. Dependencies are included in the

graph wherever a constraint on the form of the tonal space path exists between two

chords in the logical form. Each tension chord is marked as dependent on its resolu-

tion. The arc is labelled with the name of the predicate (leftonto or rightonto). Each

chord interpreted as a tonic is connected by an arc to the unique ROOT node, labelled

with the tonal space coordinate interpretation.

For example, the chord sequence

C E7 A7 Dm7 G7 Dm7 D[7 C

could be interpreted with the logical form:

[〈0,0〉,(λx. leftonto(leftonto(leftonto(leftonto(x))))∧λx. leftonto(leftonto(x)))(〈0,0〉)]

This interpretation is fully (and more clearly) represented as the labelled dependency

graph over the chord sequence in figure 3.4. A longer example, including several

levels of embedded coordination, is Call Me Irresponsible, whose harmonic analysis

is shown as a dependency graph in figure 3.5.

3.3. CCG for Harmonic Syntax 59

〈0,0〉〈0,0〉

leftonto leftonto leftonto

leftonto

leftonto leftonto

Figure 3.4: Dependencies in a cadential chord sequence.

Despite their similarity to linguistic syntactic dependency graphs, these graphs rep-

resent the information encoded in the logical forms (the semantics) of a harmonic in-

terpretation. A benefit of this representation over the logical forms already described,

apart from the clarity of the visual representation, is that it permits the use of eval-

uation metrics used on linguistic dependency graphs, as described in section 5.3.7.2.

The dependency graphs used here do not express relations analogous to linguistic syn-

tactic dependencies, but the harmonic relations between chords in terms of the theory

of functional expectations and resolutions outlined in the previous chapter. They can,

therefore, be thought of as the equivalent of linguistic semantic dependency graphs,

which represent predicate-argument relations between words.

3.3 CCG for Harmonic Syntax

We have seen how logical forms representing interpretations of the chords of a har-

monic passage can be combined to produce a single interpretation of the whole se-

quence. The order in which constituents may be combined and the operations that may

combine them are not constrained by the logical form itself. In parsing natural lan-

guage, syntactic types and rules provide these constraints, as described in section 2.4.

To express the syntax of harmony, I define below a grammar formalism similar to the

standard CCG for English. Instead of the linguistic syntactic categories, such as NP

and S, it uses harmonic syntactic categories that define cadential expectation, follow-

ing Steedman (1996) and Wilding (2008). It includes some combinatory rules closely

related to those described in section 2.4 and some additional rules specific to harmonic

syntax. Each category, lexical or derived, pairs a syntactic type with a logical form in

the formalism described above.

An atomic type carries information about the tonality at the start and end of the

passage it spans. This is the only harmonic information relevant to constraining how it

can combine with adjacent categories. Both ends have a harmonic root, in the form of

an equally tempered pitch class (A, B[, B, . . .), and a chord function, one of T (tonic),

6D]◦

sus4,7E

〈0,0〉

leftontoleftonto

leftonto

leftontoleftonto

leftonto

leftonto 〈0,0〉

Figure3.5:

onicinterpretation

ofthesecond

halfofCallM

eIrresponsible

representedas

adependency

graph,featuringm

ultiplelevels

beddedcoordination.

Theinterpretation

treatsseveralpairs

ofconsecutive

chordsas

acontinuation

underlyingharm

onicroot,

marked

joiningthe

chords.

3.3. CCG for Harmonic Syntax 61

D (dominant) and S (subdominant), written as a superscript. For example, a passage

starting on a tonic C and ending on a tonic A[ would have syntactic type CT –A[T .

For brevity, where the start and end parts of an atomic type are the same, just one is

written: CT –CT is abbreviated to CT . Such a type is associated with a single tonic

chord.

A forward-facing slash type X/Y gives the starting tonality Y expected for the

category to its right (its argument) and the starting tonality X that will be used for the

result of applying it to such an argument. Such a type is used in the interpretation of

a dominant chord. We are now able to associate syntactic types of this sort with the

logical forms of section 3.2 in order to constrain the ways adjacent constituents can be

combined, according to the definitions of the combinatorial rules, given in full below.

Example 3.13 adds syntactic types to example 3.11. As is conventional, the syntactic

type is given first, separated from the logical form by a colon.

(3.13) C Dm7 G7 C

CT : [〈0,0〉] DD/GD|T GD/CD|T CT : [〈0,0〉]: λx.leftonto(x) : λx.leftonto(x)

GD–CT : [leftonto(〈0,0〉)]>

DD–CT : [leftonto(leftonto(〈0,0〉))]dev

CT : [〈0,0〉, leftonto(leftonto(〈0,0〉))]

The categories assumed by Dm7 and G7 are identical, relative to the roots of the chord

labels, and are specified by a single schema ID/IV D|T : λx. leftonto(x) in the lexicon,

using Roman numerals to express pitch relative to the chord root. A primitive domi-

nant chord could be interpreted with the syntactic type schema ID/IV T , reflecting the

expectation of a resolution down a perfect fifth to follow. Extended cadences, such as

the one in example 3.13, are handled syntactically by allowing the dominant category

to take as its resolution either a tonic or another dominant by using the type ID/IV D|T .

The combinatorial rule that performs function composition (as in example 3.5)

performs the appropriate manipulation of the syntactic categories:

(3.14) C Dm7 G7 C

CT : [〈0,0〉] DD/GD|T GD/CD|T CT : [〈0,0〉]: λx.leftonto(x) : λx.leftonto(x)

DD/CD|T : λx. leftonto(leftonto(x))>

DD–CT : [leftonto(leftonto(〈0,0〉))]dev

CT : [〈0,0〉, leftonto(leftonto(〈0,0〉))]

Backward slash (\) categories are precisely the reverse of forward slash categories.

They are able to combine with an argument to their left and specify the end tonality

required of the argument (after the slash) and the end tonality that the result of combi-

nation will have (before the slash). They are much less used in the musical grammar

than /.

A combinatory rule resembling the one used for natural language coordination

(Steedman, 2000) allows interpretation of coordinated cadences. A further rule, de-

velopment, allows tonic passages and resolved cadences to be combined into a single

derivation, performing the concatenation of the list logical forms. The rules are for-

mally defined in the next section. Example 3.15 makes use of the coordination and

development rules. (Once again leftonto is abbreviated to L.)

(3.15) C Dm7 G7 A7 Dm7 G7 C6

CT : [〈0,0〉] CT : [〈0,0〉]DD/CD|T AD/CD|T

: λx.L(L(x)) : λx.L(L(L(x)))&

DD/CD|T : λx.L(L(x))∧λx.L(L(L(x)))>

DD–CT : [(λx.L(L(x))∧λx.L(L(L(x)))) (〈0,0〉)]dev

CT : [〈0,0〉,(λx.L(L(x))∧λx.L(L(L(x)))) (〈0,0〉)]

The categories that may be used to interpret individual chords, including those de-

scribed above for tonic and dominant chords, are defined in the lexicon of the grammar.

A full derivation of an interpretation, such as those in the above examples, is produced

by choosing a category from the lexicon for each chord and combining them using

combinatory rules.

Chord substitution is handled in the lexicon. For example, jazz musicians may re-

place a dominant seventh chord using the tritone substitution (see section 2.5.1). A

special lexical category allows a dominant chord that has undergone a tritone substitu-

tion to be interpreted in the same way as the chord for which it is a substitute. Other

3.4. A Grammar for Jazz 63

similar substitutions are handled likewise by adding a new line to the lexical schemata

in table 3.1 below. The cadence in example 3.16 from Can’t Help Lovin’ Dat Man (in

the key of E[) is shown here with its syntactic types and contains an example of the

tritone substitution, where the B7 replaces a F7.

(3.16) Gm7

GD/CD|TCm7

CD/FD|TB7

FD/B[D|TB[7

B[D/E[D|TE[6

3.4 A Grammar for Jazz

3.4.1 The Rules

Since CCG is a strongly lexicalized formalism, most of the work of the grammar is

done in the lexicon. Only six rules are used to build derivations. Rules are applied to

combine simultaneously the syntactic categories and the logical forms. Each rule has

a symbol used to identify its use in derivations. Rules are written as productions, with

inputs to the left of a⇒ and the result of applying the rule to the right. A rule may be

applied wherever syntactic types for adjacent spans match the forms of the schemata

on the left-hand side of the production. No conditions are placed on the logical forms

of the inputs.

Function application and function composition are merely adaptations of their con-

ventional forms, described in section 2.4, to the musical formalism. For the sake of

simplicity, requirements on the tonal functions (superscript T , D and S) are omitted

here, but note that the rule must also check that, for example, the function of Y in the

first category of forward application matches that of the second category, including

permitting a Y T |D in the first category to be matched by a Y T or Y D in the second.

Function application:Forward (>) X/Y : f Y –Z : x ⇒ X–Z : f (x)

Backward (<) X–Y : x Z\Y : f ⇒ X–Z : f (x)

Function composition:Forward (>B) X/Y : f Y/Z : g ⇒ X/Z : f ◦g

Backward (<B) X\Y : g Z\X : f ⇒ Z\Y : f ◦g

The coordination rule combines two unresolved cadences to behave as a single un-

resolved cadence and requires that they are expecting the same resolution. The two

cadences are required to be either both authentic (dominant) or both plagal (subdomi-

nant). The logical form of the result is treated as a function that will be applied to the

resolution.

Coordination:

(&) XF/Y : f ZF/Y : g ⇒ XF/Y : f ∧g where F ∈ {D,S}

The trivial development rule joins together fully resolved passages. It requires its

inputs to be atomic categories and not slash categories, which need to be combined

with something by function application before they can be considered resolved. Syn-

tactic constraints and the composition of the lexicon ensure that such an atomic cate-

gory will always have a list logical form, so this rule can simply concatenate the two

lists.

Development:

(dev) V –W : l0 X–Y : l1 ⇒ V –Y : l0⊕ l1

This is a permissive rule: it permits any two consecutive passages interpreted in-

dividually as harmonically stable to be conjoined, regardless of key. Modulation (key

change) can occur freely in music at any time. We, therefore, do not use the grammar

to put any restrictions on what sorts of modulation to permit. Whilst some modula-

tions are more common than others, such preferences are better captured by statistical

models, such as those introduced in chapter 5, than by syntactic constraints.

In addition to the lexical schemata given in the next section, the lexicon is expanded

by the application of two unary rules. These provide categories to handle the repeti-

tion of chords without a change in harmonic function. This occurs sometimes simply

because of notational conventions – a chord will often be repeated at the beginning of

a new line or section, even if it is really just a continuation of the previous chord – and

sometimes because there is a change in the chord at a level which is not reflected in

the analysis. The former case could be handled simply by preprocessing the input to

remove repeated chords, but the latter cannot be dismissed as easily.

For example, say a tonic chord CM7 is followed by a chord on the same root with

a different chord type C6. The change in chord type is important for performers, but

is irrelevant to the analysis. A chord might also be followed by another on a different

root which behaves as a substitute for the first. In this case too, the chords should

be combined early in the analysis into a single unit. For example, McCoy Tyner’s

Contemplation begins on a Cm7(11) chord, establishing the tonic of the piece, for the

first two lines. The third line is on an A[M7 chord, behaving as a substitute for the Cm

chord (really nothing more than an inversion). This can be interpreted as a substitution

using a special lexical category (Ton-bVI, see the lexicon in the next section). The two

chords are then treated as a single tonic passage.

Dominant and subdominant chords too can be prolonged in this manner, similarly

permitting movement between different substitutions for the same chord. The follow-

ing two rules produce lexical schemata that can be applied to the first of such a pair of

chords in place of the schema that it would have had on its own. The resulting lexical

schemata will be referred to using a mnemonic distinguished from that of the original

schema by the addition of ‘-rep’.

Tonic repetition:

(TRep) XT : f ⇒ XT/XT : λx.x

Cadence repetition:

(CRep) XF/Y : f ⇒ XF/XF : λx.x where F ∈ {D,S}

The tonic passage of Contemplation can now be interpreted as shown in exam-

ple 3.17, using the Ton-bVI schema and Ton-rep, which is produced from the Ton

schema (see next section) using the tonic repetition rule.

(3.17) Cm7(11) A[7

CT/CT : λx.x CT : [〈0,0〉]>

CT : [〈0,0〉]

The unary rules of tonic repetition and cadence repetition are restricted to the role of

expanding the lexicon and cannot be applied at any other point in a derivation3. For

example, the category resulting from the derivation in example 3.17 cannot be used as

input to the tonic repetition rule to produce the category CT/CT : λx.x.

3.4.2 The Lexicon

We are now able to define in full the lexicon of a grammar of jazz chord sequences,

shown in table 3.1. Each entry is a lexical schema and has a mnemonic label to serve

as an identifier, a surface chord class, a syntactic type and a logical form. The surface

chord class generalizes over chord roots X and the syntactic type is given relative to the

root of the surface chord symbol, using Roman numeral notation. For example, if the

first entry, Ton, were assigned to a chord G, the syntactic type of the category would be

GT . During parsing, a lexical schema may be specialized to a particular root pitch to

3 A similar restriction is typically put on the type raising rule in CCG for natural language.

produce a category. The category may be assigned to a chord with that root provided

the type of the chord falls into the surface chord class represented to the left of the :=

in table 3.1. Each schema is made up of a generalized syntactic type and a logical form

in the language of compositional harmonic interpretations described in section 3.2.

Table 3.1: The lexicon for the jazz grammar. For each schema, a typical example is

given of a chord in the key of C and the syntactic type of the category that would be

assigned to that chord.

Mnemonic label Category schemaExample

chord syntactic

Ton. X(m) := IT : [〈0,0〉] CM7 CT

Ton-III. Xm := [V IT : [〈0,2〉] Em CT

Ton-bVI. X := IIIT : [〈0,1〉] A[M7 CT

Dom. X(m)7 := ID/IV D|T : λx.leftonto(x) G7 GD/CD|T

Dom-backdoor. X(m)7 := V ID/IID|T : λx.leftonto(x) B[7 GD/CD|T

Dom-tritone. X(m)7 := [V D/V IID|T : λx.leftonto(x) D[7 GD/CD|T

Dom-bartok. X(m)7 := [IIID/[V ID|T : λx.leftonto(x) E7 GD/CD|T

Subdom. X(m) := IS/V S|T : λx.rightonto(x) F FS/CS|T

Subdom-bIII. X := V IS/IIIS|T : λx.rightonto(x) A[ FS/CS|T

Dim-bVII. X◦ := IV D/[V IID|T : λx.leftonto(x) D◦7 GD/CD|T

Dim-V. X◦ := IID/V D|T : λx.leftonto(x) F◦7 GD/CD|T

Dim-III. X◦ := V IID/IIID|T : λx.leftonto(x) A[◦7 GD/CD|T

Dim-bII. X◦ := [V ID/[IID|T : λx.leftonto(x) B◦7 GD/CD|T

Pass-I. X◦ := IT/IT : λx.x C◦7 CT/CT

X◦ := ID/ID : λx.x G◦7 GD/GD

Pass-VI. X◦ := V IT/V IT : λx.x A◦7 CT/CT

X◦ := V ID/V ID : λx.x E◦7 GD/GD

Pass-bV. X◦ := [V T/[V T : λx.x G[◦7 CT/CT

X◦ := [V D/[V D : λx.x D[◦7 GD/GD

Pass-bIII. X◦ := [IIIT/[IIIT : λx.x E[◦7 CT/CT

X◦ := [IIID/[IIID : λx.x B[◦7 GD/GD

Table 3.1: (continued)

Mnemonic label Category schemaExample

chord syntactic

Aug-bII. X7 := [V ID/[IID|T : λx.leftonto(x) Baug GD/CD|T

Aug-VI. X7 := IIID/V ID|T : λx.leftonto(x) E[aug GD/CD|T

Colour-IVf. X(m) := V T/V T : λx.x F CT/CT

Colour-IVb. X(m) := V T\V T : λx.x F CT\CT

Colour-IIf. X(m) := [V IIT/[V IIT : λx.x Dm CT/CT

Colour-IIb. X(m) := [V IIT\[V IIT : λx.x Dm CT\CT

Dom-IVm. Xm := IID/V T : λx. leftonto(x) Fm(6) GD/CT

Class Chord types

X X, XM7, X7, Xaug,M7, X[5,M7, Xsus4, Xsus4,7

Xm Xm, Xm7, XmM7

X7 X7, Xaug, Xaug,7, X, X[5,7, X[5, Xsus4, Xsus4,7

Xm7 Xm7, Xø7, X◦7, Xm, Xm[5, XmM7

X◦ X◦7, Xø7

Table 3.2: Definitions of the chord types included in each chord class, as used in the

lexicon. This component of the grammar is flexible and may be altered to suit notational

conventions and genre-specific chord function conventions. The definitions in this table

are those that I have used to parse jazz chord sequences.

Figure 3.2 gives an example definition of the chord classes used in the lexical en-

tries. If a chord in the input has a type in the class X7, it may be assigned a category

corresponding to any of the lexical entries associated with that class. This component

of the grammar is somewhat flexible. It may be modified to reflect different transcrip-

tion conventions: for example, a C major seventh chord is transcribed variously as

CM7, CM7, C4, C4, Cmaj7, etc. It may also be modified to reflect a genre’s con-

ventions on the association of chord types with functions: for example, the chord type

Figure 3.6: The 4× 3 space in which the logical form of an isolated tonic chord is

specified. Which part of the full tonal space this block is projected onto in relation to the

rest of the analysis depends on constraints expressed by the full logical form. Circled

are the points corresponding to (a) a C chord, or any substitute for a C chord, and

(b) a B chord or substitute. Note that the enharmonic spelling of the pitches in these

diagrams is not meaningful, since the projection into the full tonal space is ambiguous.

X7 is included in the class X, since it is not uncommon in jazz (especially in blues

numbers) to use this chord type as a tonic chord4.

All surface chords are assumed to be transcribed in equal temperament and the

transcriber is not assumed to have distinguished between enharmonic equivalents, like

G] and A[. Indeed, this disambiguation is part of the analysis performed during parsing

and may be inferred from the semantics resulting from a full parse. The constraints

expressed by the syntactic types operate prior to this analysis, so do not make these

distinctions. For consistency, I have arbitrarily used flats throughout the lexicon.

The mnemonic label Ton is used to identify a simple tonic chord function. The

corresponding syntactic type takes on the same root that the surface chord had. Like

the syntactic types, the coordinate in the logical form of a tonic category implicitly

generalizes over the possible roots of the surface chord. For example, if the surface

chord has root C, the logical form will become 〈0,0〉, whilst if the root is B the logical

form is 〈1,1〉, both shown in figure 3.6.

The mnemonic Dom identifies a rule that says a surface chord G7 can be interpreted

with the syntactic type GD/CD|T , which expects to find its resolution (rooted a perfect

fifth below) to its right. Its logical form denotes a leftward movement in the tonal space

to its resolution.

Example 3.18 shows these two schemata in action, using the combinatory rules

defined above in section 3.4.1. The logical form here is right branching, due to the em-

4 The use of X7 as a tonic reflects the ambiguity of the equally tempered flattened seventh tonebetween the minor seventh ([VII) and the dominant seventh ([VII−).

bedding of the full remainder of the cadence interpretation in the argument to a leftonto

predicate, as required to express the notion that the relationship of the extended dom-

inant chord to its key is explained by the recursive relations between the sequence of

localized dominant resolutions. This same embedding is produced in the interpretation

whether the syntactic derivation that builds it takes the form of a left-branching tree,

as here, or a right-branching tree, as in example 3.135.

(3.18) Dm7 G7 C

DD/GD|T : λx. leftonto(x) GD/CD|T : λx. leftonto(x) CT : [〈0,0〉]>B

DD/CD|T : λx. leftonto(leftonto(x))>

DD–CT : [leftonto(leftonto(〈0,0〉))]

The mnemonic Dom-tritone in table 3.1 identifies the tritone substitution of a dominant

function chord, introduced in section 2.5.1. The syntactic type and logical form are

identical to those that would have been assigned to the substituted chord (that rooted

on the tritone), interpreted as a dominant. In other words, this entry allows us to in-

terpret a chord D[7 exactly as if it had been a G7 chord. Example 3.19 shows this

schema in action in the cadence from Can’t Help Lovin’ Dat Man, which we saw first

in example 3.16. Other schemata serve to interpret other substitutions or harmonic

functions. Dom-backdoor and Dom-bartok, for example, handle substitutions of dom-

inant chords described by Cork (1996) and Elliott (2009) (and derive their names from

the same source).

(3.19) Gm7 Cm7 B7 B[7 E[6

GD/CD|T : CD/FD|T : FD/B[D|T : B[D/E[D|T : E[T :λx. leftonto(x) λx. leftonto(x) λx. leftonto(x) λx. leftonto(x) [〈0,1〉]

B[D–E[T : [leftonto(〈0,1〉)]>

FD–E[T : [leftonto(leftonto(〈0,1〉))]>

CD–E[T : [leftonto(leftonto(leftonto(〈0,1〉)))]>

GD–E[T : [leftonto(leftonto(leftonto(leftonto(〈0,1〉))))]

5 The logical forms feature both right-branching embedding (due to recursive application of pred-icates) and left-branching (due to coordination). Dependency graphs, such as that in figure 3.5, fullyrepresent the same structures. A right-branching embedding is implicit in a sequence of chords con-nected by arcs. That is, A7 leftonto−−−−→ Dm7 leftonto−−−−→ G7 leftonto−−−−→ C6 represents the embedding (explicit in thelogical form) A7 leftonto−−−−→ (Dm7 leftonto−−−−→ (G7 leftonto−−−−→ C6))

Figure 3.7: Harmonic analysis of the first two bars of Bach’s Ermuntre Dich, mein

schwacher Geist, simplified from that of Rohrmeier (2011, figure 3).

3.5 Key Structure

Neither the formal language of harmonic interpretation nor the syntactic grammar pre-

sented above provides a means of interpreting hierarchical relations between resolved

cadences or tonic passages. Each cadence is individually interpreted as a structure

made up of tension-resolution relationships and these structures are presented in se-

quence as an interpretation of an entire piece. Modulation and tonicization (a form of

short-term modulation) are permitted, since the grammar does not constrain the struc-

tures to remain in the same key throughout. However, this analysis overlooks one

aspect of the structure of harmony – the hierarchical relationships between these local

structures.

Rohrmeier (2011) captures these relationships in his harmonic grammar. He per-

mits a full recursive cadence structure to serve not only as an element in the global

sequence of phrases – rather like the top-level list structure of the logical forms pre-

sented here – but also as a functional region within another cadence structure. This

allows an entire local key structure to be interpreted, for example, as functioning as a

dominant chord with respect to a tonic at a higher level.

Consider the analysis of the opening of Bach’s Ermuntre Dich, mein schwacher

Geist given by Rohrmeier (2011, figure 3), presented in a simplified form in figure 3.7.

The non-terminals T R and DR represent tonic and dominant functional regions and

3.5. Key Structure 71

carry a feature denoting their key, written here as a subscript. The first two chords are

interpreted as a tonic passage in G major. The following chords, C D G, are treated as

a subdominant-dominant-tonic progression, still in G major. The jazz grammar of this

chapter does not include rules to handle this progression, reflecting a difference in the

intended domains of the two grammars. Where these five chords are combined, we see

the first example of a level of structural interpretation not analysed in the present the-

sis. The tree6 maintains the hierarchical structure of the combination of tonic regions

T RG into higher-level T RGs: the G and Em first, then this subtree with the tree of the

following three chords.

The final three chords, G A D, are again interpreted as a subdominant-dominant-

tonic progression, but this time in the key of D major. A unary expansion allows

this whole T RD tonic region to be treated as a dominant region DRG in a higher-level

structure. Rohrmeier’s grammar includes a rule T R→ T R DR, allowing a tonic region

to be followed by a dominant region. This leads to the full interpretation, in which

the initial tonic region is combined with the cadence in D major by the same rule that

would allow it to be followed by a single D major chord.

This example includes two types of structure ignored by the present thesis: the

hierarchical structure of tonic regions (as in the combination of T RGs on the left side)

and the analysis of whole tonic regions as fulfilling harmonic function in another key

at a higher level (as in the DRG → T RD expansion). The first extension required to

permit the present grammar to produce these analyses would be to the formal language

of section 3.2. Syntactic rules or lexical entries could then be added to derive the

structures. The hierarchical structure of cadences could be interpreted, for example,

by modification of the development rule. The interpretation of resolved passages as

having a function in the key of an adjacent passage could, at least in some cases, be

lexicalized using the music theoretic concept of a pivot chord – a chord at the junction

of two passages that has one function in the first and another in the second. The concept

was used in Rohrmeier’s example, where the fifth and sixth chords are in fact a single

chord interpreted with respect to both the preceeding and following trees.

Example 3.20 sketches a syntactic derivation that would support a hierarchical in-

terpretation of modulation. It demonstrates how a pivot chord can be used to bear the

lexical interpretation of the modulation structure. A category for a subdominant chord

preparing a dominant is assumed to exist. A full account along these lines would re-

6 Note that Rohrmeier’s trees, unlike CCG derivation trees, represent the harmonic semantics, so arecomparable to the logical forms of this thesis.

quire further consideration of the ways in which pivot chords can function and some

modifications to the existing grammar, which has not been designed with this level of

structure in mind.

(3.20) G Em C D G A D

GT/GT GT CS/DD DD/GD|T GT/( AD–DT ) AD/DD|T DT

> >B >

GT CS/GD|T AD–DT

CS–GT

CHAPTER 4Building an Annotated Corpus

4.1 Introduction

This chapter describes the construction of a corpus of jazz chord sequences annotated

with grammatical analyses using the lexicon and combinatory rules presented in chap-

ter 3. It is common in NLP to use human-annotated datasets to train statistical models

of the parsing process and to use the statistics to guide and speed up the parsing pro-

cess. We shall see in the following chapters how some such techniques from NLP can

be adapted to the music parsing task. Annotated resources serve two purposes: training

the statistical models and testing them to find out how well they can automatically pro-

duce the human annotations. It is common to divide datasets into two parts for these

two purposes, often with a third development (or validation) division used as test data

to compare models during system development.

Whilst some annotated musical datasets exist, none was suitable for training the

current parsing models. Among the most commonly used is the Essen Folksong Col-

lection (Schaffrath & Huron, 1995), a corpus of melodies of European folksongs, with

phrase boundaries annotated, but without harmonic analyses. Temperley (2007) uses

the Kostka-Payne corpus containing metrical and chordal harmonic analyses for 46

short excerpts from tonal music from the common practice era, excerpts and analyses

due to Kostka & Payne (2004). The pieces included do not typically exhibit the sort of

74 Chapter 4. Building an Annotated Corpus

hierarchical structures that provide the motivation for the grammatical formalism we

propose. Moreover, the harmonic annotations consist of chord roots only, so, where

such structure does exist, it is not made explicit in the annotations. Other authors, in-

cluding Kroger et al. (2008) and Sapp (2007), use corpora of music by J. S. Bach. As

with the Kostka-Payne corpus, annotations are only in the form of chord roots and lack

further explicit information about hierarchical harmonic structure.

In the absence of any suitable existing corpus, the models described in the follow-

ing chapters are trained on a new corpus of chord sequences annotated with harmonic

analyses. This chapter describes the new corpus of jazz chord sequences and the pro-

cess of annotating them with gold-standard harmonic analyses in a form suitable for

training statistical models in chapter 5. The corpus consists of 76 annotated sequences,

totalling roughly 3,000 chords.

It is worth noting that it is becoming increasingly common in NLP to make use of

unsupervised statistical methods, in which models are trained using unlabelled data,

without annotations of the output that the model should ideally produce and semi-

supervised methods, which use a mix of labelled and unlabelled data or data annotated

with only partial information about the intended output. It would have seemed pre-

mature to explore the application of such learning strategies to harmonic analysis in

the present work, but the models used do have some extensions to unsupervised or

semi-supervised scenarios and in the case of the parsing models this is an active field

of current research.

The dataset is available to download from:

http://jazzparser.granroth-wilding.co.uk/JazzCorpus.

4.2 Chord Sequence Data

4.2.1 Data Format

To train the statistical parsing models, I collected and annotated a small corpus of

jazz chord sequences. The sequences are taken from lead sheets transcribed for jazz

performers to play from.

Table 4.1 shows the data structure used to store each chord. Each chord sequence

is stored as a list of chords, along with the name of the song, the time signature and the

main key1. The roots of the chords are stored as equally tempered pitch classes: that is,

1 Time signature and key do not play a role in any of the models described in this thesis, but were

4.2. Chord Sequence Data 75

Field Description

Root Integer pitch class of transcribed chord root (0, . . . ,11),

relative to the main key

Type Main chord type (tetrad) symbol, chosen from a small

vocabulary (e.g. ‘’, ‘m’, ‘M7’)

Additions (optional) Any further transcribed tones added to the chord, but

not included in the type symbol (e.g. ‘[13’)

Bass (optional) Pitch class of a bass note, if one is given in the chord

symbol (e.g. the B in ‘CM7/B’)

Duration Duration of the chord in beats

Table 4.1: Fields of the data structure used to store each chord in the jazz corpus.

chords rooted on A[ and G] are represented identically in the corpus. Correct enhar-

monic spelling is informative to harmonic analysis (Rohrmeier & Graepel, 2012), but

is excluded from this dataset for two reasons. Firstly, the enharmonic spelling of chord

roots (and melodies) in jazz lead sheets is unreliable and is likely to be misleading

if used as a guide for harmonic analysis. Unlike on classical scores, spelling choices

are often made on the basis of ease of reading for performers, an issue almost entirely

orthogonal to decisions about tonality. Secondly, instead of being considered as a con-

straint on harmonic analysis, enharmonic spelling is treated here as a problem that can

be solved using a harmonic analysis of pitch class-based input. The same approach is

taken by Rohrmeier (2011). This paves the way for extension of the automatic analysis

techniques to take input in the form of symbolic performance data (as demonstrated

in chapter 6) or audio recordings of performances, in which such enharmonic spelling

information is not available. All pitch classes in the chord data structure are given

relative to the key. This is purely for ease of annotation; none of the models in chap-

ter 5 relies on knowing the key of its input and all are invariant under transposition.

Some example chords, decomposed into their representation in the corpus, are given

in table 4.2.

included in case they turn out to be of use in future.

Chord symbol Root Type Additions Bass Duration

E[M7 0 M7 4

Dø7 11 %7 2

Gaug7 4 aug7 2

Cm7 9 m7 4

F7 2 7 4

B[7 7 7 2

Fm7/C 2 m7 9 2

D[m6 10 m 6 2

B[7/D 7 7 11 2

E[6 0 6 4

Table 4.2: Example representation of a chord sequence in the jazz corpus, taken from

the beginning of All the Way.

4.2.2 Cross-Validation

Corpora for supervised training of statistical models are typically divided into two

parts: a training set, used to train the models, and a smaller test set, only ever used

at the end of training to report the models’ performance on unseen data. Often a third

division is made: a development set, used during the development of the models to test

them on data that was not in the training set without compromising the test set, which

is maintained strictly as unseen data until the final experiments. Due to its small size,

the jazz corpus contains no heldout test set or development set. Instead, models are

tested using cross-validation. Each experiment is run 10 times2, using 910 of the data

to train the model and the remaining 110 to evaluate the trained model. This means that

the full corpus is used for evaluation, but no model is tested on the same data that was

used to train it.

It is important to stress that, whilst this reduces the problem of over-fitting models

to their training data, it is not a satisfactory alternative to using completely unseen test

data. Over-fitting still occurs, as models are generally tested repeatedly during devel-

opment and selected or parameterized in response to the results. For example, in the

2 The number of divisions of the dataset may vary, but 10-fold cross-validation, used here, is acommon choice. Another commonly used strategy is leave-one-out cross-validation, in which the modelis tested on every entry in the dataset, each time training on the full dataset minus that entry.

4.3. Annotating a Unique Gold-Standard Interpretation 77

Algorithm 2: gold parse(seq) – gold-standard analysis by parsing annotations

1 stack← /0

2 while length(seq)> 0 or length(stack)> 1 do3 switch stack do4 case [A/B, C/D, . . .] reduce(stack, >B)

5 case [A\B, C\D, . . .] reduce(stack, <B)

6 case [A, B/C, . . .] reduce(stack, >)

7 case [A\B, C, . . .] reduce(stack, <)

8 case [A, B, . . .] reduce(stack, dev)

9 case [./, A/B, /, C/D, . . .] reduce(stack, &)

10 otherwise shift(stack, seq)

experiments reported in chapter 5, parameters are chosen for the supertagging model

on the basis of the model’s supertagging accuracy, using cross-validation over the cor-

pus. The choice of parameters is optimized to this particular dataset, the same dataset

that must be used to evaluate the parser that uses the supertagger. The results from

cross-validation should be thought of as a preliminary evaluation on a development set.

4.3 Annotating a Unique Gold-Standard Interpretation

Every chord is annotated with a choice of category from the lexicon of the jazz gram-

mar, identified by its mnemonic label (see table 3.1). Since CCG is a strongly lexical-

ized grammar formalism and we only use a small set of combinatory rules, the category

annotations provide almost enough information to construct a unique logical form for

each sequence, a fact that is exploited by supertagging in chapter 5. The additional

annotation of the points where coordination occurs is sufficient to determine a unique

tonal space analysis of every sequence.

Algorithm 2 specifies a procedure by which a logical form in the form given in

the previous chapter can be produced for a valid set of annotations. This procedure

is crucial to the annotation strategy adopted here. It is the existence of the algorithm

that means it is sufficient to annotate a choice of lexical category for each chord and

the locations of coordination in order to achieve a full annotation of a gold-standard

tonal space path for each sequence. The algorithm can be used in practice to produce

the logical form, and, by extension, the tonal space, path encoded in the annotations.

During the process of manual annotation, the algorithm can be used as a sanity-checker

for the annotations: a failure of the algorithm to find a full parse or a failure of any of

the rule applications carried out in each of the reductions indicates that there is an error

in annotation3.

The algorithm produces a single gold-standard analysis of any chord sequence fully

annotated with a choice of lexical schema for each chord and chords which occur at

the end of the first constituent of a coordination and those at the end of the second. It

defines deterministic rules for a shift-reduce parser (Kozen, 1997, pp. 181–190) based

on the form of each category (atomic, backward slash or forward slash) and whether

the next input chord is marked as the end of a coordination constituent. The parser

works left to right and greedily performs forward function composition wherever pos-

sible, applying the result to an atomic category when it is reached. This does not rule

out any possible correct results, since any result that can be produced by function ap-

plication alone can also be produced using function composition first, then function

application. Furthermore, the result of coordinating two constituents and then com-

posing with a preceding category (an order which would be excluded by the current

strategy) produces a result identical to that produced by first composing the category

with the first constituent and then performing coordination. This can be seen in ex-

ample 4.1. The logical forms are omitted, but are also identical, due to the reduction

defined in section 3.2.4.

(4.1) A7 Dm7 A[7

AD/DD|T DD/GD|T DD/GD|T

AD/GD|T

&AD/GD|T

A7 Dm7 A[7

AD/DD|T DD/GD|T DD/GD|T

&DD/GD|T

AD/GD|T

The algorithm maintains a stack of categories and iterates, at each step either reduc-

ing the categories on the top of the stack (the left end) or adding the next category from

the annotations onto the stack. The reduction applies one of the rules from section 3.4.1

(identified by its symbol) and replaces the categories on the stack with the single result-

ing category. It does not need to check the syntactic types of the categories other than to

determine whether each is a backward slash category, forward slash category or atomic

category. The function reduce() combines the top two categories using a grammati-

cal rule. The algorithm assumes that this application will succeed without performing

3 Success of the algorithm does not, of course, guarantee that the analysis found is that that wasintended by the annotator, but at least ensures that the corpus contains no nonsensical annotations.

G7 ./ C7 F7 . . . G7 / C

GD/CD|T CD/FD|T FD/B[D|T GD/CD|T CT

CD/CD|T

&GD/CD|T

GD–CT

G7 ./ C7 F7 . . . G7 / C

GD/CD|T CD/FD|T FD/B[D|T GD/CD|T CT

>B×GD/FD|T

GD/CD|T

GD–CT

Figure 4.1: An example that demonstrates the necessity of annotations to mark the

midpoint between two constituents that should be coordinated. The checks on the

form of the syntactic types in algorithm 2 in the absence of the ./ and / annotations

would lead to the composition of the first two cadential chords, as in (b), instead of the

intended interpretation, given by (a). (The two result in the same syntactic type, but a

different semantics.) In this (rather extreme) case, even checking the syntactic types

themselves would not prevent composition. The ./ ensures that the composition made

in (b) is forbidden.

the necessary checks on the syntactic types: failure implies an error in the annotations

which would prevent a full derivation. In addition to the categories themselves, the an-

notations include the special symbols ./, after each category marked by the annotator

as the end of the first constituent of a coordination, and /, after each marked as the end

of the second constituent. The markings are included in figure 4.1, showing how they

ensure that the algorithm performs derivation (a). A category marked as being both (as

happens where more than two constituents are coordinated) is followed by first / and

then ./.

Lines 4–9 define the conditions under which the categories on the top of the stack

are reduced. Although line 4 performs the composition of C/D with A/B4 it does not

4 This might seem back to front, but A/B is the category most recently added to the stack.

check that A=D. Any time categories of this form are seen adjacent to one another, the

only combinatory rule that can combine them is forward composition. The ordering

of the argument conditions ensures that all possible reductions are performed on the

categories on the stack before a new category is shifted. Finally, if no further reductions

can be applied and the input is not exhausted, line 10 shifts the next input category onto

the stack.

The algorithm terminates once all of the categories in the annotations have been

shifted onto the stack and the stack has been reduced to a single category. If there

are no more categories to be shifted, but all reductions are applied and more than one

category remains on the stack, the shift() in line 10 will fail, indicating that there is

an error in the annotations. The algorithm is deterministic and produces exactly one

analysis (by one derivation). This means that, given soundness and completeness of

the algorithm, any valid set of annotations corresponds to a unique harmonic analysis.

A formal proof of soundness and completeness of the algorithm will not be pre-

sented in full here, but could be constructed according to the following form. Proving

soundness involves showing that, if the algorithm produces an analysis from a pairing

of a chord sequence and its annotations, that analysis must be a valid interpretation

permitted by the annotations, and that it is unique. This part of the proof is trivial.

Each of the reductions corresponds to the combination of two adjacent categories by

a grammatical rule and succeeds only if that rule application is permissible under the

grammar. Line 9 is the only reduction that processes the coordination annotations and

it ensures that the coordination rule is used on the constituents they delimit. Since the

algorithm is deterministic, allowing only one derivation, any successful termination

must yield a single legal grammatical interpretation.

Proving completeness involves showing that, if a set of annotations is valid – that is,

if there exists some valid derivation from its lexical categories obeying the constraints

on coordination – the algorithm will find a derivation of the full sequence. A proof

can be constructed by considering in turn each case in which the algorithm could fail,

either by attempting to use an inapplicable rule or by leaving more than one item on

the stack once the entire input is consumed, and showing that there must have been an

error in the annotations – that is, that there is no possible derivation that observes the

constraints of the annotations. For example, consider the first reduction, on line 4:

case [A/B, C/D, . . .] reduce(stack, >B)

This stack state represents the following state of the derivation tree:

x0 · · ·xi xi+1 · · ·x j x j+1 · · ·xk xk+1 · · ·xN−1

· · ·C/D A/B

The algorithm will fail if C/D and A/B cannot be combined by the forward com-

position rule. In proving completeness, we must show that, in any case where this

happens, there exists no possible derivation permitted by the annotations. In the gram-

mar of the previous chapter, a category A/B can only either be a lexical category or

have been derived by a combination of compositions and coordinations. If it is not a

lexical category, the leftmost lexical category from which is was derived must have

been a category A/E. Since there is no / or ./ between x j and x j+1, when A/E was

first shifted onto the stack the derivation tree would have looked as follows and the

stack would have been subjected to the same reduction of line 4:

x0 · · ·xi xi+1 · · ·x j x j+1 x j+2 · · ·xN−1

· · · A/EC/D

The algorithm would have either composed C/D with A/E or failed and in neither case

could it have reached the current state. A/B must, therefore, be a lexical category –

that is, j+1 = k. Any derivation that could be produced from these annotations must

at some stage combine A/B with an adjacent category, either to its left or its right. If

this can be shown not to be possible, there can be no full derivation.

Since C/D has failed to compose with A/B, we know that D cannot unify with A.

Furthermore, x j must have had a lexical category F/D. It is a property of the grammar

that any combination of F/D with categories to its left will produce a category that

ends at x j and has a forward slash with argument type D. But D and A do not unify, so

A/B can never combine with a category to its left.

Any combination of A/B with categories to its right will produce either a category

A/G (if it is combined by composition and/or coordination) or a category A–H (by

any combination of forward application and other rules). In any case, the resulting

category can never combine with F/D, since the combination would depend on the

unification of D and A. We must conclude, then, that in any case where line 4 fails,

no derivation is possible. Similar arguments on each of the other reductions lead to

the conclusion that any failure of the algorithm to produce a derivation implies that no

derivation exists, thus proving completeness.

Example 4.2 is a cadence from Alfie, using the annotations from the corpus. The

annotations include the markers of points of coordination ./ and /. The series of shift

and reduce operations carried out by the algorithm on this input is shown in table 4.3.

(4.2) CM7

ED/AD|TA7

AD/DD|TDm7

DD/GD|TG7

GD/CD|T ./

ED/AD|T

AD/DD|TDm7

DD/GD|T ./

E[◦7

AD/DD|TDm7

DD/GD|T /

GD/CD|T /

4.4 Annotation Procedure

I annotated each chord sequence in the corpus with a single analysis. Each chord

received a single lexical schema, chosen from the lexicon of table 3.1, and any chord

might be marked as the end of a non-final coordination constituent, the end of a final

coordination constituent, or both. Once a sequence was fully annotated, it was parsed

using the annotations according to algorithm 2 by way of a sanity check to detect errors

preventing a normal-form derivation. Furthermore, the result of a successful parse was

examined to ensure that the interpretation was that that was intended.

Ultimately, a decision about a chord’s function or a tension-resolution dependency

must be made on the basis of intuition. Conceivably, some of these decisions could

be made (or at least constrained) on the basis of enharmonic pitch spelling of score

notation or chord roots, reflecting the reading of the composer or transcriber. Two

problems stand in the way of this. Firstly, most chord root motion is by cadences,

along the horizontal dimension of the tonal space. As a result, the most common tonal

ambiguity is between notes separated by the syntonic comma – (4,−1) in the tonal

space – which is not distinguished by pitch spelling. For example, if we were able to

distinguish a dominant seventh tone from a minor seventh tone of a chord we would

know whether the chord has a dominant function. However, pitch spelling leaves us

in the dark as far as this distinction is concerned. Secondly, jazz lead sheets prioritize

ease of reading for performers over music theory. Consequently, even where pitch

spelling in chord roots or melodies appears to give clues as to functional analysis, they

cannot be relied on. The annotation procedure, then, must rely on a lengthy process

of repeatedly listening to recordings, playing from the lead sheets and experimenting

with substitutions and other harmonic modifications that are dependent on particular

interpretations.

4.4. Annotation Procedure 83

Operation Stack

1. Shift CT

2. Shift CT , CT\CT

3. Reduce < CT

4. Shift CT , ED/AD|T

5. Shift CT , ED/AD|T , AD/DD|T

6. Reduce >B CT , ED/DD|T

7. Shift CT , ED/DD|T , DD/GD|T

8. Reduce >B CT , ED/GD|T

9. Shift CT , ED/GD|T , GD/CD|T

10. Reduce >B CT , ED/CD|T

11. Shift CT , ED/CD|T , ./

12. Shift CT , ED/CD|T , ./, ED/AD|T

13. Shift CT , ED/CD|T , ./, ED/AD|T , AD/DD|T

14. Reduce >B CT , ED/CD|T , ./, ED/DD|T

15. Shift CT , ED/CD|T , ./, ED/DD|T , DD/GD|T

16. Reduce >B CT , ED/CD|T , ./, ED/GD|T

17. Shift CT , ED/CD|T , ./, ED/GD|T , ./

18. Shift CT , ED/CD|T , ./, ED/GD|T , ./, AD/DD|T

19. Shift CT , ED/CD|T , ./, ED/GD|T , ./, AD/DD|T , DD/GD|T

20. Reduce >B CT , ED/CD|T , ./, ED/GD|T , ./, AD/GD|T

21. Shift CT , ED/CD|T , ./, ED/GD|T , ./, AD/GD|T , /

22. Reduce & CT , ED/CD|T , ./, ED/GD|T

23. Shift CT , ED/CD|T , ./, ED/GD|T , GD/CD|T

24. Reduce >B CT , ED/CD|T , ./, ED/CD|T

25. Shift CT , ED/CD|T , ./, ED/CD|T , /

26. Reduce & CT , ED/CD|T

27. Shift CT , ED/CD|T , CT

28. Reduce > CT , ED–CT

29. Reduce dev CT

Table 4.3: The series of shift and reduce operations carried out by the algorithm on a

cadence from Alfie.

Harmonic analysis is both somewhat subjective and somewhat ambiguous: of the

many conceivable analyses of a particular piece, some people will hear one whilst oth-

ers another; and there are circumstances where multiple possible analyses may be con-

sidered simultaneously plausible. (Indeed, this is frequently exploited by composers

to disorient listeners.) These are two distinct difficulties for annotation and must be

considered separately.

The problem of subjectivity gives rise to two specific annotation problems: agree-

ment between multiple listeners (annotators) and consistency between analyses pro-

duced by the same listener on different hearings. The first, annotator agreement, can

be measured by involving multiple annotators, each being given a set of inputs to an-

notate such that each input is annotated by several annotators. The second, annotator

consistency, can be measured by asking the same annotator to annotate the same input

again a suitable length of time after the first occasion and measuring the agreement

between the two analyses. Both procedures require a large amount of additional time

and expense, measurement of annotator agreement in particular, since it involves train-

ing multiple annotators. For this reason, I have not attempted to measure annotator

agreement in this work. We must bear in mind, therefore, that the models presented

in chapter 5 are being trained to reproduce the analyses of one individual listener and

that others’ hearings might differ. Measuring annotator consistency, on the other hand,

is a simpler matter. An experiment to judge the annotation consistency is described in

section 4.5.1.

The ambiguity of harmonic analysis is elegantly modelled by the present grammat-

ical approach. Ambiguities of interpretation can be reflected in the choice of lexical

categories and some of the choices of rule application during the derivation process.

For example, different lexical categories interpret a chord as having a dominant or sub-

dominant function, or as a dominant function chord subjected to a tritone substitution.

Decisions about where cadences are left unresolved and which chord eventually sup-

plies their resolution are made by the choice of in what order to apply the combinatory

rules, in particular when the coordination rule is used. The parser described in chap-

ter 5 uses an algorithm that maintains multiple possible interpretations of the chord

sequence at any particular point in the parsing process. A probabilistic model asso-

ciates weights with the alternative interpretations. In this way, a notion of ambiguity is

built into the interpretative mechanism which, given a suitable probabilistic model, em-

ulates the ambiguity of human interpretation. A similar approach to modelling musical

ambiguity is suggested by Jackendoff (1991). When annotating gold-standard deriva-

4.4. Annotation Procedure 85

tions, however, for practical reasons, only one analysis of any particular sequence is

included in the corpus5. Many ambiguities that exist when only a part of a chord se-

quence has been heard are later resolved and it is the resulting final resolution that the

annotations represent. In order to produce the same interpretation, a parser will often

have to allow for local ambiguities – multiple interpretations of a part of the sequence

– of which many will later turn out to be incompatible with other parts of the sequence.

Much of the time, of the remaining ambiguities in a final interpretation, one may be

selected as preferred, as in the Autumn Leaves example of section 2.6.5.

4.4.1 Analysis Decisions

In the process of annotating a chord sequence the annotator must choose a particular

analysis of the harmonic structure by deciding the function of each chord, the substitu-

tion if one is used and the role that the chord plays in the harmonic structure. Certain

principles can be established to help resolve difficult cases of ambiguity, introduced

below by means of an example. It is important to be aware that the annotation deci-

sions discussed below are the conclusion of a process of playing and listening to the

music in question. Some decisions require extensive consideration and experimenta-

tion, whilst others barely warrant discussion. That is not to say that the interpretation

falls unambiguously out of a simple algorithmic analysis, but merely that the intuitions

of a listener familiar with the style of music lead to a strong preference for a particular

interpretation.

Let us consider again example 4.2, repeated here in example 4.3 with the names of

lexical entries used for each chord, chosen from the lexicon of table 3.1.

(4.3) CM7

Colour-IIb

ED/AD|T

AD/DD|T

DD/GD|T

GD/CD|T ./

ED/AD|T

AD/DD|T

DD/GD|T ./

E[◦7

Dom-tritone

AD/DD|T

DD/GD|T /

GD/CD|T /

5 Marcus et al. (1993) describe the inclusion of a small amount of ambiguity in the annotation of alinguistic treebank corpus, motivated by a wish to avoid forcing annotators to make arbitrary decisionsin cases of genuine linguistic ambiguity. The ambiguity is limited to permitting multiple part-of-speechtags for individual words and specific attachment ambiguities in syntactic trees, in which one attachmentis chosen as preferred.

Much of the annotation here is uncontroversial, given the treatment of extended ca-

dence structures introduced in chapter 2 and the corresponding grammar of chapter 3.

The first and last chords are tonics of the main key of the piece, so are interpreted using

the Ton schema. Most of the remaining chords are dominant function chords without

substitution, interpreted using the Dom schema, which covers not only a primitive

dominant function but also the extended, recursive function. Their function is strongly

signalled by the addition of the dominant seventh note in each case and the sequence

of downward fifths between roots leads to a clear extended dominant function in each

of the cadential constituents – [Em7 A7 Dm7 G7], [Em7 A7 Dm7] and [Dm7 Gaug7].

The E[◦7 prepares the Dm7 and serves the same harmonic function as an A7 would.

The use of the Dom-tritone schema may seem odd, since this is a much weaker form

of substitution than a tritone substitution: it is in fact merely an inversion an A◦7 chord

and is only written with an E[ root for the performer’s convenience. The grammar’s

lexicon, however, does not include a separate schema for each of the three inversions

of a diminished seventh chord with this interpretation. If schemata were added for

these, they would have both identical syntactic types and identical logical forms to the

schemata for dominant substitutions (Dom-backdoor, Dom-tritone and Dom-bartok).

The inverted A◦7 chord, then, may be interpreted using the Dom-tritone schema.

The addition of annotations to mark points of coordination force the derived struc-

ture to treat the G7 chord’s resolution as delayed until the final tonic (thanks to the

/ preceding the tonic) and the penultimate Dm7’s resolution as momentarily delayed

until the Gaug7.

There remains now one major ambiguity. The Dm7 after the first chord could be

interpreted in two quite different ways. It could be treated as an extended dominant,

using the Dom category, followed by a ./ to delay its resolution until after [Em7 A7

Dm7]. Alternatively, it can be treated (as in these annotations) as a substitute for an F

chord, a small excursion to the subdominant prior to the cadence, comparable to the

IV chord in the common I IV I elaboration of a tonic. In this case, it is no easy matter

to decide between the two quite plausible analyses and the decision to annotate the

latter should not be taken to claim that the other is wrong. In fact, the decision was

made after a lengthy process of playing the chords on a piano with the tune, trying a

variety of alternative substitutions that would be permitted by each interpretation. The

eventual result was a weak preference for the reading given. However, cases with this

degree of difficulty in annotator interpretation are relatively rare, the majority of the

material being closer to the remainder of this example.

4.5. Omissions 87

Category, ordered by count

Harmonic Analysis of Music Using Combinatory Categorial...

Documents

Transcript of Harmonic Analysis of Music Using Combinatory Categorial...

A PRELIMINARY COMBINATORY CATEGORIAL GRAMMAR …ocoban/survey.pdf · 2006. 6. 2. · Combinatory categorial grammar (CCG) has been proved to be a powerful tool in explaining a wide-variety

Generating with Discourse Combinatory Categorial Grammar · Generating with Discourse Combinatory Categorial Grammar / 3 cue threading, a tightly constrained technique in which a

人工知能特論 II 第 5 回 二宮 崇 1. CCG (COMBINATORY CATEGORIAL GRAMMAR) 組合せ範疇文法 2 今日の講義の予定.

Binary Lambda Calculus and Combinatory Logic

Cierre Categorial - Iñigo Ongay.doc

Morfologia Categorial

I. DIALÉCTICA CATEGORIAL ECONÓMICA A. DIALÉCTICA …I. DIALÉCTICA CATEGORIAL ECONÓMICA Y FILOSOFÍA A. DIALÉCTICA CONSTITUTIVA CORTE EPISTEMOLÓGICO Y CIERRE CATEGORIAL El proceso

Combinatory Categorial Grammar Parser in Natural · PDF fileCombinatory Categorial . Grammar Parser in . ... In this project, ... In this project, a CCG parser is implemented as a

2 T - Informatics Homepages Serverhomepages.inf.ed.ac.uk/steedman/papers/ccg/SteedmanBaldridgeNT… · COMBINATORY CATEGORIAL GRAMMAR Mark Steedman and Jason Baldridge 1 INTRODUCTION

Diagnóstico Categorial Versus Dimensional

Combinatory Categorial Grammar and Linguistic Diversityusers.metu.edu.tr/bozsahin/esslli05/cb-notes.pdf · Combinatory Categorial Grammar and Linguistic Diversity ... Categorial Grammar

Combinatory Categorial Grammar Teil 2: Semantik in der CCGmujdricz/referate/... · 2014. 2. 10. · CCG (2), 29.04.2008 41 Parsing Beispiel für die Ausgabe des Parsers:The school-board

THE PRINCIPAL TYPE-SCHEME OF AN OBJECT IN COMBINATORY LOGIC · COMBINATORY LOGIC BY R. HINDLEY Introduction. In their book Combinatory Logic [1], Curry and Feys introduced the notion

CS 460 NATURAL LANGUAGE PROCESSING. CONTENTS Introduction Motivation Categorial Grammar Combinatory Categorial Grammar Three parts of formalism of CCG.

Functional Combinatory Categorial Grammardaniel_mcmichael/papers/FCCGJ06.pdf · 2006. 6. 24. · Functional Combinatory Categorial Grammar driven phrase structure grammar [HPSG] (Sag,

Combinatory Categorial Grammar Parser in Natural Language Toolkit

analisis categorial

Intuicion categorial

Harmonic Analysis of Music With Combinatory Categorial … · 2 A Robust Parser-Interpreter for Jazz Chord Sequences 5 ... If harmony contains syntactic structure like natural language,

Combinatory Categorial Grammar (CCG)tbergkir/11711fa17/Combinatory Categorial... · 2017-11-30 · Categorical Unification Grammar •Extending the formalism to allow features: –agreement,

人工知能特論 II 第 5 回二宮崇 1. CCG (COMBINATORY CATEGORIAL GRAMMAR) 組合せ範疇文法 2 今日の講義の予定.