n9351833 krishna nikhil sumanth behara thesis · 2019. 9. 3. · ORIGIN-DESTINATION MATRIX...
Transcript of n9351833 krishna nikhil sumanth behara thesis · 2019. 9. 3. · ORIGIN-DESTINATION MATRIX...
ORIGIN-DESTINATION MATRIX
ESTIMATION USING BIG TRAFFIC DATA:
A STRUCTURAL PERSPECTIVE
Krishna Nikhil Sumanth Behara
Master of Civil (Transportation) Engineering Birla Institute of Technology and Science (BITS), Pilani, India, 2012
Submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy (PhD)
School of Civil Engineering and Built Environment
Science and Engineering Faculty
Queensland University of Technology
2019
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective i
Keywords
Bi-level; Bluetooth; subpaths; Brisbane city; BSTM; clustering OD matrices;
DBSCAN; gradient descent; Mean geographical window based SSIM (GSSI); Mean
Levenshtein distance for OD matrices (NLOD); non-assignment-based; local sliding
window; origin destination (OD) matrix; OD matrix estimation; OD matrix structure;
single-level formulation; statistical performance measures; structural correlations;
structural consistency; structure of trips; structural proximity measures; structural
similarity Index (SSIM); subspace analysis; turning proportions; typical OD matrix;
travel pattern; structural comparison of OD matrices; trajectories; optimum parameters
of DBSCAN; under-determinacy problem.
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective ii
Abstract
Origin-destination (OD) matrices and the knowledge of travel patterns are key
inputs into most transport models aimed at both long-term strategic planning, as well
as short-term traffic control and management. OD matrices are not simply mere
representations of individual OD flows. The distribution of OD flows between
different OD pairs indicates the inherent structural information of the OD matrix that
cannot be neglected while comparing OD matrices. The structural knowledge of OD
matrices aids in understanding and analysing travel demand patterns.
An OD matrix is generally unobserved; thus, it is often estimated as an
optimisation problem. However, optimisation models are generally dependent on
point-based (loop detectors) traffic count observations to update and estimate the
outdated prior OD matrix, and lack the ability to describe the distribution of trips (or
“structure” of travel patterns) across the network. To maintain structural consistency
during OD estimation, the adopted methods are generally based on traffic survey-based
constraints, such as trip productions/attractions, the ratio of OD flows, or enhancing
the objective function through deviations with respect to the target OD matrix.
However, the major drawback is that these constraints/formulations are based on travel
surveys that are generally outdated. Most popular statistical measures used for either
the general comparison of OD matrices or for quality comparison of OD matrices
(estimated from different optimisation algorithms), depend on individual cell-based
statistics, and fail to account for the inherent structural information of OD matrices.
With advancements in technology, there is growing interest in exploiting big
traffic data sources, such as Bluetooth, etc. in travel demand modelling. However,
knowledge about travel demand obtained from these data sources may not reveal a
detailed demographic and contextual picture about commuter trips. For instance,
Bluetooth data only captures a fraction of the actual demand, providing incomplete
information about trips, and most importantly, the penetration rate of Bluetooth trips
remains unknown due to the unavailability of ground truth. Nevertheless, it provides
high spatial and temporal resolution compared to travel surveys. Thus, despite the
abundant availability, more effort is required to integrate advanced data sources, such
as Bluetooth, into main stream traffic modelling.
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective iii
To this end, the research mainly emphasises on the importance of the structural
knowledge of travel demand (either OD/path flows) and it has four major
contributions. First, it develops statistical metrics - Mean Geographical window based
Structural Similarity Index (GSSI) and Mean Normalised Levenshtein Distance for
OD matrices (NLOD) - for the structural comparison of OD matrices. As compared to
traditional SSIM, the GSSI technique is computationally effective, can capture local
travel patterns and preserves geographical integrity. NLOD is a novel approach to
capture the “structural” information of OD matrices through the preference of
destinations and distribution of origin flows. It is an optimisation-based metric and is
computationally better than another popular metric – Wasserstein distance. The
sensitivity analysis performed on both metrics proved that they are robust in nature.
Second, the study enhances the bi-level formulation by integrating structural
knowledge of Bluetooth trips (in terms of Bluetooth subpath flows) into the existing
objective function without the need to know their penetration rates. Third, the study
develops a novel non-assignment-based approach to estimate the OD matrices from
observed turning proportions and structural knowledge of Bluetooth trips. This is a
single-level formulation, does not depend on simulation-based assignment and thus
computationally faster than bi-level OD estimation method. Finally, the fourth
contribution is the development of a methodological framework (three-level approach)
to cluster multi-density OD matrices using DBSCAN algorithm. It highlights the
importance of accounting the structural information of OD matrices in the proximity
measures of clustering algorithms. The methodology is tested with a real case study
application on identifying typical travel patterns of the Brisbane City Council (BCC)
region.
Although the proposed methods were tested using Bluetooth data and
demonstrated using the BCC case study, they are generic in nature and suitable for any
other emerging data sources that can provide similar type of measurements over any
other study network.
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective iv
Table of Contents
Keywords .................................................................................................................................. i
Abstract .................................................................................................................................... ii
Table of Contents .................................................................................................................... iv
List of Figures ......................................................................................................................... ix
List of Tables ........................................................................................................................ xvi
List of Publications .............................................................................................................. xvii
Notations ............................................................................................................................... xix
Abbreviations ....................................................................................................................... xxii
Statement of Original Authorship ....................................................................................... xxiv
Acknowledgements ...............................................................................................................xxv
Chapter 1: Introduction ...................................................................................... 1
1.1 Background .....................................................................................................................1
1.1.1 Origin-Destination (OD) matrix ...........................................................................2
1.1.2 OD matrix estimation problem .............................................................................4
1.1.3 The “structure” of OD matrix/trips .......................................................................8
1.1.4 Advanced traffic data sources ............................................................................11
1.2 Research Problem .........................................................................................................14
1.2.1 Problem of under-determinacy ...........................................................................14
1.2.2 Mapping relationship between link flows and OD flows ...................................15
1.2.3 Computation cost ................................................................................................16
1.2.4 Lack of potential performance measures ............................................................17
1.2.5 The need for typical OD matrices that represent typical travel patterns ............17
1.2.6 Unknown penetration rates of trips inferred from advanced data sources .........18
1.3 Research Motivation .....................................................................................................19
1.4 Research Questions, Aim, and Objectives ....................................................................19
1.5 Research Methodology .................................................................................................20
1.5.1 Task-1 .................................................................................................................21
1.5.2 Task-2 .................................................................................................................22
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective v
1.5.3 Task-3 .................................................................................................................23
1.5.4 Task-4 .................................................................................................................23
1.6 Significance and scope .................................................................................................23
1.7 Definitions ....................................................................................................................25
1.8 Thesis Outline ...............................................................................................................26
Chapter 2: Literature Review ........................................................................... 28
2.1 Background of OD matrix estimation ...........................................................................28
2.2 Problem Formulation ....................................................................................................30
2.2.1 Static OD formulation - uncongested networks .................................................30
2.2.2 Static OD formulation - congested networks .....................................................34
2.2.3 Dynamic OD formulation ...................................................................................41
2.2.4 Quasi-Dynamic formulation ...............................................................................44
2.3 The solution algorithms ................................................................................................45
2.4 OD matrix structural information .................................................................................46
2.5 Statistical performance measures..................................................................................48
2.6 Indirect/partial measurements of OD flows ..................................................................51
2.6.1 Point sensors .......................................................................................................52
2.6.2 Point to point sensors (AVI data) .......................................................................52
2.7 Summary of literature review .......................................................................................54
Chapter 3: Development of Statistical Metrics for the Structural Comparison
of OD Matrices ......................................................................................................... 57
3.1 Background ...................................................................................................................57
3.2 Structural Similarity (SSIM) index ...............................................................................58
3.2.1 Local sliding window .........................................................................................61
3.3 Mean Geographical window-based SSIM (GSSI) ........................................................64
3.3.1 Structural comparison of local travel patterns ....................................................67
3.3.2 Geographical window vs sliding window ..........................................................68
3.3.3 Computational efficiency ...................................................................................69
3.4 Levenshtein Distance ....................................................................................................69
3.4.1 Traditional Levenshtein distance ........................................................................70
3.4.2 Proposed Levenshtein distance for structural comparison of OD matrices ........74
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective vi
3.4.3 Levenshtein vs Wasserstein distances ................................................................79
3.5 Sensitivity analysis of GSSI and NLOD.......................................................................84
3.5.1 Experimental criteria ..........................................................................................86
3.5.2 Results of uniform scaling effects: .....................................................................88
3.5.3 Results of random scaling effects .......................................................................89
3.6 Summary .......................................................................................................................91
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the
Structure of Bluetooth Trips ................................................................................... 93
4.1 Background ...................................................................................................................93
4.1.1 B-OD structure-based method (or B-OD method) .............................................94
4.1.2 B-SP structure-based method (or B-SP method) ................................................95
4.2 Study network and data ................................................................................................97
4.2.1 Development of observed B-OD flows ( ) ......................................................100
4.2.2 Development of observed B-SP flows ( ) .......................................................101
4.3 Bi-level Framework: Matlab - Aimsun Integration ....................................................103
4.4 B-OD method: OD matrix estimation using B-OD structure .....................................103
4.4.1 Objective function formulation ........................................................................104
4.4.2 OD matrix estimation algorithm .......................................................................106
4.4.3 Experiments – ideal and near-ideal scenarios of B-OD method.......................109
4.4.4 Results for the ideal scenario of B-OD method ................................................110
4.4.5 Results for the near-ideal scenario of B-OD method .......................................114
4.4.6 Discussion ........................................................................................................118
4.5 B-SP method: OD matrix estimation using B-SP structure ........................................119
4.5.1 Objective function formulation ........................................................................121
4.5.2 OD matrix estimation algorithm .......................................................................122
4.5.3 Experiments for B-SP method ..........................................................................123
4.5.4 Results for B-SP method ..................................................................................123
4.5.5 Discussion ........................................................................................................127
4.6 Comparison of B-OD and B-SP methods ...................................................................127
4.7 B-SP method for lower penetration rates of Bluetooth trajectories ............................129
4.7.1 Experiments for B-SP method (lower penetration rates): ................................131
4.7.2 Results for B-SP method (lower penetration rates) ..........................................131
4.7.3 Discussion ........................................................................................................133
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective vii
4.8 Summary .....................................................................................................................134
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting
Observed Turning Proportions and Structure of Bluetooth Trips ................... 136
5.1 Background .................................................................................................................136
5.2 OD matrix estimation: Traditional versus proposed approach ...................................137
5.3 Study networks ...........................................................................................................138
5.3.1 Toy network .....................................................................................................139
5.3.2 TMR network ...................................................................................................139
5.4 Concept of possible paths ...........................................................................................140
5.4.1 Possible paths in the toy network .....................................................................140
5.4.2 Possible paths in the TMR network ..................................................................141
5.5 OD matrix estimation methodology ...........................................................................142
5.5.1 Link flows estimation from turning proportion matrix ....................................142
5.5.2 The structural comparison of OD flows ...........................................................145
5.5.3 OD matrix estimation formulation ...................................................................146
5.6 Experiments and Results: Toy network ......................................................................147
5.6.1 Convergence of gradient descent algorithm .....................................................149
5.6.2 Structural consistency .......................................................................................149
5.6.3 Under-determinacy problem .............................................................................150
5.6.4 Optimal percentage of Bluetooth connectivity .................................................151
5.7 Experiments and Results: TMR network ....................................................................151
5.7.1 Non-assignment-based vs assignment-based experiments ...............................152
5.7.2 RMSE results ....................................................................................................153
5.7.3 GSSI results ......................................................................................................154
5.7.4 Computational time: Non-assignment-based vs assignment-based ..................155
5.8 Summary .....................................................................................................................156
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical
Travel Patterns: Case Study Application of the BCC region ............................ 159
6.1 Background .................................................................................................................159
6.2 Methodology to cluster B-OD matrices and identify typical travel patterns ..............162
6.2.1 Traditional DBSCAN approach .......................................................................162
6.2.2 Three-level approach for identifying DBSCAN parameters ............................165
6.2.3 Distance measures for clustering B-OD matrices.............................................167
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective viii
6.3 Experiments and results ..............................................................................................168
6.3.1 Experiment-1: dGSSI as proximity measure .......................................................170
6.3.2 Experiment-2: dNLOD as proximity measure......................................................171
6.3.1 Experiment-3: dRMSN as proximity measure .....................................................173
6.3.2 Typical B-OD flows .........................................................................................173
6.3.3 Discussion ........................................................................................................174
6.4 Summary .....................................................................................................................178
Chapter 7: Conclusion ..................................................................................... 179
7.1 Brief summary ............................................................................................................179
7.2 Research findings........................................................................................................181
7.3 Recommendations for future research ........................................................................182
Bibliography ........................................................................................................... 184
Appendices .............................................................................................................. 202
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective ix
List of Figures
Figure 1.1: Trends in the social cost of congestion in AUD for different scenarios
(Transport & Economics, 2007) .................................................................... 1
Figure 1.2: Illustration of OD matrix for a spatial distribution of travel demand ........ 2
Figure 1.3: Statistical Areas of BCC region: SA2 (left) and SA3 (right) .................... 3
Figure 1.4: TAZs in Greater Brisbane region (BSTM, 2015) ...................................... 4
Figure 1.5: The overview of OD matrix estimation process ........................................ 5
Figure 1.6: Traditional bi-level framework .................................................................. 6
Figure 1.7: Demonstration of (a) the skeleton/structure of OD and (b)
corresponding mass/OD flows ....................................................................... 9
Figure 1.8: Example of OD matrix structural dimension ............................................. 9
Figure 1.9: Location of Bluetooth Scanners within the BCC region ......................... 12
Figure 1.10: BMS and loop detectors at an intersection in Brisbane city .................. 12
Table 1.1: Sample Bluetooth data from Brisbane, Australia (Bluetooth data from
Brisbane City Council, 2016) ...................................................................... 13
Figure 1.11: Comparison of turning proportions: Bluetooth vs SCATS (Chung,
2016) ............................................................................................................ 13
Figure 1.12: Consistency of Bluetooth trajectories during regular weekdays ........... 14
Figure 1.13: Demonstration of under-determinacy problem using (a) example
network, (b) feasible solutions ..................................................................... 15
Figure 1.14: Research methodology framework ........................................................ 21
Figure 2.1: Pictorial representation of some of the widely-used sensor types in
OD estimation problem ................................................................................ 52
Figure 3.1: Comparison of MR with OD matrices M1 and M2 .................................. 58
Figure 3.2: (a) Comparison of Images (source Wang et al., 2004) vs (b)
comparison of OD matrices (source Djukic et al., 2013) ............................ 59
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective x
Figure 3.3: An example of sliding window for SSIM calculation. ............................ 61
Figure 3.4: Sensitivity of MSSIM towards local window size .................................. 63
Figure 3.5: An example to illustrate the proposed geographical window-based
approach ....................................................................................................... 65
Figure 3.6: Splitting (a) Monday and (b) Sunday OD matrices into geographical
(SA4) windows ............................................................................................ 66
Figure 3.7: Insights into local travel patterns using geographical local window:
(left) Brisbane South to Brisbane North and (right) Brisbane South to
Brisbane West .............................................................................................. 67
Figure 3.8: GSSI vs sliding windows based MSSIM for weekends .......................... 68
Figure 3.9: GSSI vs sliding windows based MSSIM for weekdays .......................... 69
Figure 3.10: Comparison of computational costs: Sliding windows based SSIM
vs SSIM ........................................................................................................ 69
Figure 3.11: Example to demonstrate Generalised Levenshtein Distance ................. 70
Figure 3.12: Matrix demonstration of traditional Levenshtein approach
(Algorithm 1) ............................................................................................... 72
Figure 3.13: Comparison of strings “Monday” and “Saturday” using GLD ............. 73
Figure 3.14: Example to demonstrate Levenshtein distance application for OD
matrices comparison .................................................................................... 74
Figure 3.16: Matrix demonstration of Algorithm 2 ................................................... 78
Figure 3.17: Matrix (L) demonstration for ......................................... 79
Figure 3.18: Demonstration of Wasserstein distance through an example ................ 80
Figure 3.19: (a) Sample network and (b) OD matrices XR and XQ with their
corresponding paths and travel costs. .......................................................... 82
Figure 3.20: Results of uniform scaling for GSSI and NLOD ................................... 88
Figure 3.21: Results of random scaling effects for (a) GSSI and (b) its structure
component .................................................................................................... 89
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xi
Figure 3.22: Results of random scaling effects for (a) NLOD and (b) its structure
component .................................................................................................... 90
Figure 4.1: Sample network (with installed BMS), paths and OD matrices .............. 95
Figure 4.2: (a) Study site installed with Bluetooth scanners and loop detectors
(b) spatial structure of Brisbane City core network ..................................... 98
Figure 4.3: Splitting the study OD matrix into geographical windows ................... 100
Figure 4.4: Generation of for the near-ideal scenario of the B-OD method......... 101
Figure 4.5: Generation of for the B-SP method .................................................... 102
Figure 4.6: MATLAB-Aimsun integration framework ........................................... 103
Figure 4.7: (a) Traditional link counts-based method vs (b) proposed B-OD
method........................................................................................................ 104
Figure 4.8: RMSE w.r.t. Xtrue for the traditional and ideal scenario cases of the
B-OD method ............................................................................................. 110
Figure 4.9: Percentage of improvement in RMSE w.r.t. Xprior for traditional and
ideal scenario cases of the B-OD method .................................................. 111
Figure 4.10: Percentage of improvement in RMSE w.r.t. traditional method for
ideal scenario case of B-OD method ......................................................... 111
Figure 4.11: StrOD w.r.t. Xtrue for the traditional and ideal scenario cases of the
B-OD method ............................................................................................. 112
Figure 4.12: Percentage of improvement in the StrOD w.r.t. Xprior for the
traditional and ideal scenario cases of the B-OD method .......................... 112
Figure 4.13: Percentage of improvement in the StrOD w.r.t. traditional method
for the ideal scenario cases of the B-OD method ...................................... 112
Figure 4.14: GSSI w.r.t. Xtrue for the traditional and ideal scenario cases of the
B-OD method ............................................................................................. 113
Figure 4.15: Percentage of improvement in the GSSI w.r.t. Xprior for the
traditional and ideal scenario cases of the B-OD method .......................... 113
Figure 4.16: Percentage of improvement in the GSSI w.r.t. traditional method
for the ideal scenario cases of the B-OD method ...................................... 113
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xii
Figure 4.17: RMSE results w.r.t. Xtrue- Near-ideal, B-OD method ......................... 114
Figure 4.18: The percentage of improvement in the RMSE w.r.t. Xprior for near-
ideal B-OD method .................................................................................... 115
Figure 4.19: The percentage of improvement in the RMSE w.r.t. traditional
method for the near-ideal B-OD method ................................................... 115
Figure 4.20: StrOD results w.r.t. Xtrue- near-ideal B-OD method ........................... 116
Figure 4.21: The percentage of improvement in the StrOD w.r.t. Xprior for the
near-ideal B-OD method ............................................................................ 116
Figure 4.22: The percentage of improvement in the StrOD w.r.t. traditional
method for the near-ideal, B-OD method .................................................. 117
Figure 4.23: GSSI results w.r.t. Xtrue- near-ideal B-OD method ............................. 117
Figure 4.24: The percentage of improvement in the GSSI w.r.t. Xprior for the
near-ideal B-OD method ............................................................................ 118
Figure 4.25: The percentage of improvement in the GSSI w.r.t. traditional
method for the near-ideal B-OD method ................................................... 118
Figure 4.26: Proposed B-SP method ........................................................................ 120
Figure 4.28: RMSE w.r.t. Xtrue ,B-SP experiments ................................................. 124
Figure 4.29: Percentage of improvement in RMSE w.r.t. Xprior for the traditional
and B-SP experiments ................................................................................ 124
Figure 4.30: Percentage of improvement in the RMSE w.r.t. traditional method ... 124
Figure 4.31: StrOD w.r.t. Xtrue for the prior, traditional, and B-SP experiments .... 125
Figure 4.32: Percentage of improvement in the StrOD w.r.t. Xprior for the
traditional and B-SP experiments .............................................................. 125
Figure 4.33: Percentage of improvement in the StrOD w.r.t. traditional method .... 125
Figure 4.34: GSSI w.r.t. Xtrue for the prior, traditional, and B-SP experiments ...... 126
Figure 4.35: Percentage of improvement in GSSI w.r.t. Xprior for the traditional
and B-SP experiments ................................................................................ 126
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xiii
Figure 4.36: Percentage of improvement in the GSSI w.r.t. traditional method
for B-SP experiments ................................................................................. 126
Figure 4.37: RMSE comparison of the B-OD (ideal, near-ideal) and B-SP
methods with prior OD and traditional methods........................................ 128
Figure 4.38: StrOD comparison of B-OD (ideal, near-ideal) and B-SP methods .... 128
Figure 4.39: GSSI comparison of the B-OD (ideal, near-ideal) and B-SP methods
.................................................................................................................... 128
Figure 5.1: Non-assignment-based OD matrix estimation methodology ................ 138
Figure 5.2: Sketch of the toy network ...................................................................... 139
Figure 5.3: TMR network ........................................................................................ 140
Figure 5.4: Paths traversed by vehicles in simulation .............................................. 141
Figure 5.5: Traversed paths from all origins until link, l14....................................... 141
Figure 5.6: Possible paths from all origins until link, l14 ......................................... 141
Figure 5.7: The number of possible paths from all origins until the detector
locations of TMR network ......................................................................... 142
Figure 5.8: Schematic representation of an isolated intersection and associated
turning proportions..................................................................................... 143
Figure 5.9: Sample network used by Bar-Gera et al. (2006) ................................... 144
Figure 5.10: Convergence of RMSE for all cases .................................................... 149
Figure 5.11: Convergence of StrOD for all cases .................................................... 149
Figure 5.12: RMSE comparison with ........................................................... 150
Figure 5.13: StrOD comparison with ........................................................... 150
Figure 5.14: RMSE results for non-assignment-based and assignment-based
approaches.................................................................................................. 153
Figure 5.15: Percent improvement in RMSE with respect to Xprior - non-
assignment vs assignment-based methods ................................................. 153
Figure 5.16: Percent improvement in RMSE with respect to traditional method-
non-assignment vs assignment-based methods .......................................... 154
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xiv
Figure 5.17: GSSI results for non-assignment-based and assignment-based
approaches.................................................................................................. 154
Figure 5.18: Percent improvement in GSSI with respect to Xprior - non-
assignment vs assignment-based methods ................................................. 155
Figure 5.19: Percent improvement in GSSI with respect to traditional method-
non-assignment vs assignment-based methods .......................................... 155
Figure 6.1: Typical shape of sorted k-dist graph ...................................................... 163
Figure 6.2: Sample data points (left) along with kth nearest neighbour and k-dist
of all points (right) ..................................................................................... 164
Figure 6.3: Sorted k-dist graphs for k=1, k=2 and k=3 and the resulting clusters ... 164
Figure 6.4: Demonstration of two density levels through sorted k-dist plot ............ 165
Figure 6.5: Three level approach to cluster B-OD matrices .................................... 166
Figure 6.6: Sorted k-dist plots for experiment-1 ...................................................... 169
Figure 6.7: Sorted k-dist plots for experiment-2 ...................................................... 169
Figure 6.8: Sorted k-dist plots for experiment-3 ...................................................... 169
Figure 6.9: (a) Number of clusters vs MinPts and proportion of clusters; and (b)
vs for Subspace-1 of experiment-1 ..................................................... 170
Figure 6.10: (a) Number of clusters vs MinPts and proportion of clusters; and (b)
vs for subspace-2 of experiment-1 ..................................................... 171
Figure 6.11: (a) Number of clusters vs MinPts and proportion of clusters; and (b)
vs for subspace-1 of experiment-2 ..................................................... 172
Figure 6.12: (a) Number of clusters vs MinPts and proportion of clusters; and (b)
vs for subspace-2 of experiment-2 ..................................................... 172
Figure 6.13: (a) Number of clusters vs MinPts and proportion of clusters; and
(B) vs for Subspace-2 of experiment-3 .............................................. 173
Figure 6.14: Box-Whisker plot demonstrating the difference among the typical
B-OD flows for OD pair – Mt. Gravatt and Brisbane CBD (results of
experiment-1) ............................................................................................. 174
Figure 6.15: Classification of day types ................................................................... 174
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xv
Figure 6.16: Comparison of clusters resulted from all three experiments ............... 177
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xvi
List of Tables
Table 3.1: Comparison results using the traditional metrics ...................................... 58
Table 3.2: GSSI and local SSIM values: Monday vs Sunday B-OD matrices .......... 67
Table 3.3: Algorithm 1 for Normalised Levenshtein distance for strings
comparison (see Figure 3.12) ....................................................................... 71
Table 3.4: Algorithm 2 for Levenshtein distance for OD matrices (see Figure
3.16) ............................................................................................................. 77
Table 3.5: Computation of Wasserstein distance for the example problem .............. 81
Table 3.6: Structural comparison of sample OD matrices using the proposed
metrics .......................................................................................................... 91
Table 4.1: Path flows for example network ............................................................... 95
Table 4.2: Demonstrating the difference between true and Bluetooth subpath
flows for the given example ......................................................................... 97
Table 4.3: Comparison of Xprior with Xtrue for all three replications ....................... 100
Table 5.1: Demonstration of equation (62) for l14.................................................... 144
Table 5.2: Paths and path flows for Bar-Gera et al. (2016) network ....................... 144
Table 5.3: Link flows at link, l5-2 estimated using the proposed approach .............. 145
Table 5.4: Comparison of link flows for the selected links ..................................... 151
Table 5.5: Comparison of OD demand flows .......................................................... 151
Table 5.6: Comparison of computational times: Non-assignment-based vs
assignment-based methods......................................................................... 156
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xvii
List of Publications
JOURNALS
Behara, K. N., A. Bhaskar, and E. Chung. Levenshtein distance for the
structural comparison of origin-destination matrices (Chapter 3 of thesis and under
review in Transportation Research Part C: Emerging Technologies).
Behara, K. N., A. Bhaskar, and E. Chung. Geographical window based
structural similarity index for OD matrices comparison (Chapter 3 of thesis and under
review in Journal of Intelligent Transportation Systems).
Behara, K. N., A. Bhaskar, and E. Chung. OD matrix estimation using observed
traffic counts and Bluetooth subpath flows (Chapter 4 of thesis and to be submitted to
Transportation Research Part C: Emerging Technologies by 31st July 2019).
Behara, K. N., A. Bhaskar, and E. Chung. A non-assignment-based approach to
estimate OD matrices using observed turning proportions and structural knowledge of
Bluetooth trips (Chapter 5 of thesis and to be submitted to IEEE Transactions on
Intelligent Transportation Systems by 7th Aug 2019).
Behara, K. N., A. Bhaskar, and E. Chung. Clustering multi-density OD matrices
datasets using structural proximity measures: A case study on Brisbane Bluetooth
based OD (Chapter 6 of thesis and to be submitted to a Q1 journal by 21st Aug 2019).
CONFERENCES
Behara, K. N., Bhaskar, A., & Chung, E. (2017). Insights into geographical
window based SSIM for comparison of OD matrices. In 39th Australasian Transport
Research Forum (ATRF), 27-29 November 2017, Auckland, New Zealand (abridged
version).
Behara, K. N., Bhaskar, A., & Chung, E. (2017). Classification of typical
Bluetooth OD matrices based on structural similarity of travel patterns- Case study on
Brisbane city. In Transportation Research Board 97th Annual Meeting, 7th-11th
January 2018, Washington D.C., USA.
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xviii
Behara, K. N., Bhaskar, A., & Chung, E. (2018). Novel approach for OD
estimation based on observed turning proportions and Bluetooth structural
information: Proof of the concept. In 40th Australasian Transport Research Forum
(ATRF), 30-31 October 2018, Darwin Convention Centre, Darwin, Australia
(abridged version).
Behara, K. N., Bhaskar, A., & Chung, E. (2018). Levenshtein distance for the
structural comparison of OD matrices. In 40th Australasian Transport Research Forum
(ATRF), 30-31 October 2018, Darwin Convention Centre, Darwin, Australia
(abridged version).
Behara, K. N., Bhaskar, A., & Chung, E. (2019). Estimating OD matrices from
observed trajectories and link counts. In World Conference on Transport Research -
WCTR 2019, 26-31 May 2019, Mumbai, India (abridged version).
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xix
Notations
It refers to the origin zone number e.g. oth origin
Number of zones which serve as origin points
It refers to the destination zone number e.g. dth destination
Number of zones which serve as destination locations
It refers to the OD pair e.g. wth OD pair
It is the number of OD pairs in the OD matrix; w ϵ W
OD vector to be estimated
Target OD vector
True OD matrix
Prior OD matrix
OD matrix in Aimsun format
The flows of wth OD pair in
The flows of wth OD pair in General dimensions of OD matrix whenever expressed in matrix form
Trips produced from oth zone
Origin flows vector to be estimated
Trips attracted to dth zone
It refers to the link number e.g. lth link
It is the total number of selected links in the network
It is the simulated/estimated flow on lth link
It is the observed flow on lth link
It is the estimated link flows vector of size L*1
It is the observed link flows vector of size L*1
It refers to the path number connecting wth OD pair
It refers to the path number connecting lth link with oth origin
It refers to the number of paths connecting wth OD pair
It refers to the number of possible paths connecting lth link with oth
origin It is flow on kth path
Kronecker Delta function. It is equal to 1, if lth link is present in kth
path, and 0 otherwise.
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xx
Weight factor of OD flows deviation from target OD matrix in the
objective function
Weight factor of link flows deviation in the objective function
It refers to the travel cost for lth link.
It is the path cost though kth path between wth OD pair
It is the cost on the shortest route for wth OD pair,
It represents the observed Bluetooth OD vector
It represents the observed Bluetooth subpath flows vector
It represents the consolidated vector of Bluetooth subpath flows
observed from several days of similar travel patterns Path flows from the model (Aimsun)
It represents the vector of OD flows that are Bluetooth connected
It represents the vector of true OD flows that are Bluetooth connected
Incidence matrix that converts X to X*
It is the proportional assignment matrix linking link flows with OD
User equilibrium assignment (link-proportion) matrix (either analytical
or simulated) User equilibrium path-proportion matrix (either analytical or
simulated) The proportion of Xw flowing in lth link
Local window ID Number of local windows
Likelihood
Error term for the OD matrix (difference between and X)
Error term for the link flows (difference between and Y)
Error term for the link flows (difference between and AX) It is a dispersion parameter to describe road users’ perception of travel
costs Link OD matrix
It is the incidence matrix; that is, a network-based information
Time slice
It is the trips generated from oth origin during tth time slice
It is the proportion of trips generated from oth origin to dth destination
It is the OD flow between oth origin and dth destination during time-
slice, t
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxi
Correlation coefficient between and Y Scale factor expressed as a sum of ratios of and
compares the mean values ( ) of the group of OD pairs (i.e. x
and y) from both matrices, X and Y. compares the standard deviations ( of the group of OD pairs compares the structure by computing correlation between the
normalised group of OD pairs (i.e. x and y) from both matrices, X and
Y. Sequence of Levenshtein edit operations to transform strings or sorted
kth Levenshtein edit operation
The Levenshtein matrix for comparing strings
It is set including that is ith preferred destination from oth origin
It is set including that is the corresponding demand value of
from oth origin It is the sorted set of destination IDs ( ) and the corresponding
demand from oth origin ( )
Penetration rate of Bluetooth inferred trips
Percentage of Bluetooth connected OD pairs
Step length at kth iteration
Objective function value
Step length parameter to scale-up by times Step length parameter to scale-down by times Turning Proportion matrix developed from observed turning
proportions It refers to the intersection number
It is the turning proportion observed at intersection present along
(kl,o)th path
It refers to the probability of origin flows passing through (kl,o)th path
and observed at lth link It is the total probability of trips generated from oth origin observed at
lth link
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxii
Abbreviations
ABS Australian bureau of statistics
ANOVA Analysis of variance
ATAP Australian transport assessment and planning
AVI Automatic vehicle identification
BCC Brisbane City Council
BMS Bluetooth media access control scanner
B-OD Bluetooth based Origin Destination matrix
BPR Bureau of Public Roads
B-SP Bluetooth based subpaths
BSTM Brisbane Strategic Transport Model
CBD Central Business District
CDA Combined distribution and assignment
DBSCAN Density-based spatial clustering of applications with noise
EBM Eigenvalue-based measure
EM Entropy maximisation
GEH Geoffrey E. Havers statistic
GPS Global positioning system
GLD Generalised Levenshtein distance
GLS Generalised least squares
GU Global Theil measure of fit
HTS Household travel survey
IM Information minimisation
ITS Intelligent transport systems
KF Kalman filter
LOD Levenshtein distance for OD matrices
LSQR Least squares
LW Long weekend
MAE% Mean absolute error percent
MAER Mean absolute error ratio
MAPE% Mean absolute percent error
GSSI Mean geographical window based structural similarity index
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxiii
ML Maximum likelihood
MLPP Most likely possible paths
NLOD Mean Normalised Levenshtein distance for OD matrices
MPAE Maximum possible absolute error
MSE Mean square error
NLD Normalised Levenshtein distance
OD Origin-destination
PH Public holidays
RE Relative error
RMSE Root mean square error
RMSN Normalised root mean square error
RSD Relative standard deviation
SSIM Structural similarity index
SA Statistical area
SCATS Sydney coordinated adaptive traffic system
SEQTS South East Queensland Travel Survey
SPSA Simultaneous perturbation stochastic approximation
SSIM Structural similarity index
StrUE Strategic user equilibrium
SATR Saturdays regular
SATSH Saturdays during school holidays
SUNR Sundays regular
SUNSH Sundays during school holidays
TAZ Traffic Analysis Zone
TMR Transport and Main Roads
TDD Total demand deviation
TLAP Time lapse aerial photography
WDR Weekday regular
WDSH Weekday during school holiday
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxiv
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the best
of my knowledge and belief, the thesis contains no material previously published or
written by another person except where due reference is made.
Signature:
Date: ______19/07/2019___________________
QUT Verified Signature
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxv
Acknowledgements
I would like to thank the following people for their involvement at various stages
of my PhD journey.
My gurus – Dr Ashish Bhaskar and Professor Edward Chung. In Sanskrit – “gu”
means “darkness and ignorance” and “ru” means “that which removes”. Both terms
combined together forms the word “guru”. This PhD journey wouldn’t have been made
possible without continuous support of my gurus. They helped me to understand
myself and explore my strengths. Dr. Ashish, my principal supervisor, is a source of
inspiration. I have learnt a lot from him, both academic and otherwise, and for this I
am highly indebted. He has been always been very positive and encouraged brain-
storming discussions, the results of which have always been helpful for my research.
Professor Edward Chung, my associate supervisor, for his continuous guidance and
encouragement. I have been very lucky to receive guidance from him. I am thankful
for the many thought-provoking discussions that helped shape this thesis into a quality
piece of work. Special thanks to teachers from India - Dr Shriniwas Arkatkar and
Professor AK Sarkar who were the main reasons for beginning this PhD journey.
Minh and Gabriel for being sources of inspiration. Minh was the first person I
met at QUT and he has been part of my supervision team for some-time. Gabriel visited
QUT during his holiday to Brisbane, where we had a friendly chat and encouraging
discussion about his work and research achievements. His motivating words about
looking at my PhD as just a part of life helped me to think outside the box, which
helped me achieve to my aims with ease and passion.
All of my friends and colleagues, too many to list, who made my Australian
experience very pleasant. However, there are five of them very special to me –
Umashankar, Narendra, Kiran, Mahadeesh and Yasir. Uma was my dearest and closest
friend, always encouraged me. Both of us had planned to start an IES coaching centre
in India after finishing my PhD. However, I lost him in an unfortunate car accident
that should never have happened. He always reminds me that life is too short and make
utmost use of it every moment. Narendra, Kiran and Mahadeesh are excellent beings
whom I can always trust without a second thought. They have always been there
Origin-Destination Matrix Estimation using Big Traffic Data: A Structural Perspective xxvi
extending their support during the toughest rides of my journey. Yasir has always been
supportive, and a ready-to-help person at any moment. Every time we sit for some
good research discussion, I would feel more empowered and confident about my skills.
Queensland University of Technology for providing the HDR tuition fee waiver,
QUT postgraduate and top-up scholarships, and providing the necessary infrastructure
that fostered the research developments. The staff are very friendly, and my special
thanks go to the HDR research support team for providing guidance during the many
phases of my research at QUT. I am also very thankful to all those I have encountered
outside QUT during my stay in Australia. People here are very friendly, kind, and
helpful. I would also like to thank professional editor, Kylie Morris, who provided
copyediting and proofreading services, according to university-endorsed guidelines
and the Australian Standards for editing research theses. I am grateful to the thesis
examiners and reviewers of my academic papers who provided valuable comments
and appreciated my work.
Many thanks to my family members - my mother, aunts, uncles, grandparents,
sisters and cousins, who have always been very supportive. Especially my uncle
(srimammu) and my grandfather (thathagaru), who have constantly encouraged and
motivated me; both of them have always been my strength. My nephew – Akki entered
my life during this PhD journey – he is very very special to me!
Finally, this acknowledgement would be incomplete without conveying my
special thanks to my dear friend, well-wisher, role-model, motivator, and guide - the
all attractive Kṛṣṇa, the Supreme God Himself. This PhD thesis is a homage to Him.
म ना भव म ो म ाजी मा नम |
मामव िस स त ितजान ि योऽिस म || BG 18.65||
Always think of Me and become My devotee. Worship Me and offer your
homage unto Me. Thus, you will come to Me without fail. I promise you this because
you are My very dear friend.
Chapter 1: Introduction 1
Chapter 1: Introduction
This chapter discusses the background of this research (Section 1.1); the research
problem (Section 1.2); research motivation (Section 1.3); research questions, aim, and
objectives (Section 1.4); research significance and scope (Section 1.5); definitions
(Section 1.7); and finally, provides an outline of the thesis (Section 1.8).
1.1 BACKGROUND
With ever increasing urban sprawl, cities are witnessing more serious problems from
traffic congestion. Policy decisions to mitigate traffic problems can have a huge impact
on a nation’s economy, environment, and society (Australian Transport Assessment
and Planning (ATAP), 2017). For instance, the social cost of traffic congestion on
Australian roads for the year 2020 (predicted from the base year 2005 cost of $9.4
billion) is estimated to be nearly $20.4 billion (Transport & Economics, 2007). Figure
1.1 illustrates the near-linear escalation of the congestion cost (base case) for Australia.
Figure 1.1: Trends in the social cost of congestion in AUD for different scenarios (Transport & Economics, 2007)
It is extremely important to have an accurate estimation and prediction of travel
demand for strategic planning and control and for the success of any major transport
infrastructure projects, as the lack of such could result in huge economic losses. Thus,
the accurate knowledge of how, when, and where people move on the road network is
Chapter 1: Introduction 2
important before making any policy decisions. While this sounds simple,
understanding how a city moves is the most complicated process due to challenges
related to indirect, incomplete, and inaccurate measurements, and errors in modelling
realistic travel patterns.
1.1.1 Origin-Destination (OD) matrix
Fundamentally, a city is a geographical entity divided into many statistical zones.
While the geographical structure of any city implies spatial distribution of urban
centres connected by transportation networks, the distribution of travel demand
between different zones defines the structure of the travel patterns in a city. In transport
planning, travel demand between zonal pairs and their distribution pattern (also
referred as structure (see section 1.1.3)) is generally represented using an origin-
destination (OD) demand matrix (see Figure 1.2). The yellow coloured cell in the OD
matrix represents trips (say, by car) between the OD pair of Z1 and Z2, and similarly,
for other cells of the OD matrix.
Figure 1.2: Illustration of OD matrix for a spatial distribution of travel demand
The demand for an OD pair, which is the number of trips between an origin and
a destination, is a given number (for a time interval) equal to the sum of the path flows
in the paths connecting them. These flows can change due to changes in route choice,
but the total amount remains unchanged for that time. Broadly speaking, there are two
types of OD matrices: static and dynamic. For static OD matrices, the time-period
considered is sufficiently large (of order of hours) so that the traffic observed at the
detectors is from the demand departing during the same time interval. Every trip is
assumed to be completed within a single analysis time-period. On the other hand,
dynamic OD matrices assume a shorter time-period and the traffic observed at the
detector must be assigned to different departure time intervals.
0 4500
3500
0
OD matrix for a typical peak-period Spatial representation of OD pairs and directions of OD flows
Z1
Z2
Chapter 1: Introduction 3
The zones that produce trips are referred to as origins and those that attract trips
are destinations. In Australia, two types of zones are popularly used for strategic
planning: statistical areas (SAs) and traffic analysis zones (TAZs).
1.1.1.1 Statistical Areas (SAs)
According to Australian Bureau of Statistics (ABS) (ASGS, 2017), “A statistical
geography provides the extra dimension of location to statistics”. The ABS defines
the hierarchy of geographical areas for the release of statistical information. This
includes statistical areas (SA) for four levels: Statistical Area Level 1 (SA1) to
Statistical Area Level 4 (SA4). SA1 has a population of between 200 to 800 persons,
SA2 normally reflects the sub-urban level and is an aggregation of SA1, SA3 is
designed at the regional level and is an aggregation of SA2, and SA4 reflects the labour
market within each state and territory and is an aggregation of SA3. Figure 1.3
illustrates the SA2 (left) and SA3 (right) zones of the Brisbane City Council (BCC)
region (excluded Moreton Bay Islands in this study).
Figure 1.3: Statistical Areas of BCC region: SA2 (left) and SA3 (right)
While the term “statistical area” is popularly used in Australia, the urban
structures around the world have their own representations of zonal hierarchy. For
instance, the zones in the US are referred as metropolitan statistical areas (USCensus,
2019), and refer (Naoki, 2013) for the hierarchy of geographical boundaries defined
for Japan cities.
Chapter 1: Introduction 4
1.1.1.2 Traffic Analysis Zones (TAZs)
The geographical units generally used in transport planning models are referred
to as traffic analysis zones (TAZs). In Australia, a TAZ generally covers a population
of approximately 3,000 people. The number of trips between zonal pairs is dependent
on the size and shape of the zone. In addition to population, other factors that
differentiate TAZs from SAs are potential future alternatives to existing road
infrastructure, network details, etc. The size of the zones is smaller within the central
business district (CBD) region and larger in far-away suburbs and rural/regional areas.
Each TAZ can be a combination of SA1s and/or each SA2 can contain multiple
portions of TAZs. The Brisbane Strategic Transport Model (BSTM)1 considers the
TAZ boundaries of Greater Brisbane area, which includes Brisbane City Council (key
partner), Redland City Council, Logan City Council, Ipswich City Council, and
Moreton Bay Regional Council, as shown in Figure 1.4: (BSTM, 2015).
Figure 1.4: TAZs in Greater Brisbane region (BSTM, 2015)
1.1.2 OD matrix estimation problem
As the complete distribution of travel demand across the network cannot be
observed directly, an OD matrix needs to be estimated. In practice, traffic demand
forecasts rely heavily on the base (reference) year OD matrix and establishing this is
critical before implementing any major transport projects (ATAP, 2016b).
1 BSTM is a multi-model transport model for medium to long-term strategic planning for the Greater
Brisbane region. The model aids transport planners to estimate/forecast and assess travel patterns and
behaviour across the region.
Chapter 1: Introduction 5
Traditionally, the base year OD demand for large scale networks is generally estimated
using four step model. However, there have been many concerns over the effectiveness
of four-step modelling approach. For example, they were unanswerable to road
congestion problems in the 1990s and it led the US Department of Transportation to
heavily sponsor Travel Demand Improvement Programs with more focus on activity-
based travel demand models (McNally, 2008). Although to date, there have been no
alternative frameworks to defy the theoretical construction of activity-based models,
researchers have still been working to enhance their predictive capabilities. The main
difficulty with the activity-based approach is combinatorics involved in multi-
dimensional choice modelling at an individual (agent) level and relies heavily on travel
surveys. On the other hand, seamless observations of traffic counts are able to provide
up-to-date information related to traffic demand and thus, researchers have begun to
consider OD estimation as an optimisation problem.
The following sub-sections provide insights into the overview of OD matrix
estimation, the optimisation modelling approach, the structural significance of OD
matrix/trips, and the role of advanced data sources in OD matrix estimation problems.
1.1.2.1 Overview of OD matrix estimation
An overview of the OD matrix estimation process is shown in Figure 1.5, where
the key elements are: inputs (observed link flows, target OD matrix, and any other
measurements, such as path flows, travel time, etc.), optimisation model (solution
algorithm and assignment model), outputs (OD matrix estimates, user equilibrium link
flows, and travel time, etc.), and a reliability check of the OD matrix estimates using
performance measures (e.g., root mean square error (RMSE) etc.).
Figure 1.5: The overview of OD matrix estimation process
Inputs
Outputs
Optimisation model
Reliability check using performance
measures
Chapter 1: Introduction 6
1.1.2.2 Optimisation model
Most studies generally adopt a bi-level framework for the OD matrix estimation
process, as represented using a flowchart in Figure 1.6. In the bi-level formulation, the
upper level minimises the objective function formulation (generally deviations of
traffic counts) and lower level runs traffic assignment (generally user-equlibrium). The
assignment and OD matrix are inter-dependent on each other, and as such, the former
plays a significant role in the OD matrix estimation process. The optimisation process
begins with a prior OD matrix that is generally developed from traffic surveys and
socio-economic data in a four-step transport modelling approach. The traffic
simulation model (which could be built in Aimsun (2019)) considers the prior OD
matrix as an input, and runs traffic assignment (either as a stochastic route choice or
dynamic user equilibrium) over the study network.
Figure 1.6: Traditional bi-level framework
The most general outputs of this simulation are the user-equilibrium link flows
and assignment matrix. Once the lower-level assignment is complete, the value of the
objective function; that is, the deviations between the user-equilibrium (estimated) and
observed link flows, is computed in the upper-level formulation. The OD matrix is
then updated using any popular search direction techniques (such as gradient-based
methods) to estimate the OD matrix for the next iteration. In this way, the OD matrix
is constantly updated until the convergence criteria are reached. Further details about
Traffic Survey Socio Economic data
OD matrix (X)
User Equilibrium Assignment
(Simulator or Analytical)
Obs. Link flows ( )
Est. Link counts ( )
Minimizing deviationUpdate
Upper Level
Lower Level
Bi-level framework
Chapter 1: Introduction 7
the bi-level estimation process are provided in Chapter 4. Earlier bi-level methods
depended on analytical models for assignment. However, it is preferable to choose
simulation-based assignment for its capacity to model realistic congestion effects over
the network.
Objective function formulation
The traditional method for expressing link counts deviation within the objective
function (Z1), as described by Spiess (1990), is shown in Equation (1).
Z= (1)
Where the modelled link flows (Y) for every iteration are retrieved from the
simulation, and is the observed link flows.
The size of OD matrix is generally far greater than the size of the link flows
vector. As such, there is an imbalance between unknowns and knowns, and the
traditional traffic counts-based formulation leads to the problem of under-determinacy.
Thus, most previous studies (Cascetta & Postorino, 2001; Yang, 1995) have used
deviations from the target OD matrix ( ) as an additional objective in the formulation,
as shown in Equation (2).
Z= (2)
Where, and are the weight factors given to the corresponding objectives,
and X and are the estimated OD matrix and target OD matrix, respectively.
Mapping relationship between link flows and OD matrix
Since the observed traffic counts are indirect measurements of OD matrix, a
mapping relationship between observed link flows and the OD matrix must be present.
Fundamentally, traffic counts on any link are the result of the OD matrix assigned over
a network. Thus, the relationship between both is an assignment model (de Dios
Ortuzar & Willumsen, 2011). Different types of traffic assignment models have been
used in transport modelling, such as all-or-nothing assignment, incremental
assignment, capacity restraint assignment, user-equilibrium assignment, stochastic
user equilibrium assignment, system optimal assignment, etc. (Patriksson, 2015).
In general, traffic assignment is modelled using Wardrop’s (1952) user
equilibrium principle. According to this, users choose routes in such a way that it
Chapter 1: Introduction 8
minimises their travel cost, and it is assumed that the decisions of route choices at an
individual level creates an “equilibrium” at the network level. The link flows are said
to be in equilibrium when no user can further improve his/her travel cost by unilaterally
shifting to any other route. This state is referred to as Wardrop’s (1952) user
equilibrium. The results of this assignment are the user-equilibrium link flows, and the
matrix that represents the proportions of OD flows passing through the selected links
is referred to as either a link-proportion matrix, or in general, an assignment matrix.
The general relationship between link flows and the OD matrix is shown in Equation
(3).
(3)
Equation (3) shows that the assignment matrix is dependent on OD flows and is
generally obtained as an output of the simulation.
1.1.3 The “structure” of OD matrix/trips
The definition of the word “structure” refers to “the arrangement of and relations
between the parts or elements of something complex” (Oxford, 2018). In general, the
word “structure” is used either with respect to material structure (either man-made or
natural) or an abstract structure. “Material structure” refers to the arrangement of
physical things. In transportation terms, the road network is an example of man-made
structure (Dandy, Daniell, Foley, & Warner, 2017). An “abstract structure” basically
includes the precise rules of behaviour, such as chords of music (Cooper, 1977) or in
transport terms – the travel behaviour of commuters during working weekdays
(Hensher, 1976).
In this study, the structure of an OD matrix is defined as “the arrangement of and
the correlation that exist between OD pairs within the OD matrix”. To avoid
ambiguity, in this research the following terms are defined:
a) Structure is the skeletal framework of the OD matrix, where the skeleton is
expressed as the preference/arrangement of the destinations from each origin. For
instance, refer to Figure 1.7, where the skeleton/structure of the OD matrix (shown at
the top) is illustrated in Figure 1.7a. Here, the columns for each row (origin) is arranged
in order of the destination preferences. The correlations, if exist, between OD pairs
due to sharing similar activities, geographical zones, trip productions/attractions, etc.
are referred as structural correlations (Antoniou et al., 2016).
Chapter 1: Introduction 9
b) The OD flows corresponding to the structure (skeleton) of the OD is termed
as mass. The corresponding mass for the structure illustrated in Figure 1.7a is
presented in Figure 1.7b.
Figure 1.7: Demonstration of (a) the skeleton/structure of OD and (b) corresponding mass/OD flows
Different methods are used to quantify the similarity between two OD matrices.
If the structure of the OD matrices is also considered in the similarity estimation, then
it is termed as structural similarity. Two OD matrices have perfect structural
similarity if their structures are similar with zero differences in the OD flows. Perfect
structural similarity is possible only when the OD matrices are exactly the same.
One of the ways to capture the skeleton/structure of OD matrix is through
correlation coefficient. The relationship between the correlation coefficient and the
preference/arrangement of destinations can be explained with an example shown in
Figure 1.8, where the two OD matrices represent the distribution of trips during; for
example, Sunday and Australia Day. The order of destination preferences is same; that
is, A, B, and C during both days, and both OD matrices have highest correlation
coefficient and it is equal to one.
Figure 1.8: Example of OD matrix structural dimension
D1 D2 D3 D4
O1 3 4 6 10
O2 7 4 5 11
O3 12 8 5 6
O4 13 7 9 6
Dest. Choice-1
Dest. Choice-2
Dest. Choice-3
Dest. Choice-4
O1 D4 D3 D2 D1
O2 D4 D1 D3 D2
O3 D1 D2 D4 D3
O4 D1 D3 D2 D4
Dest. Choice-1
Dest. Choice-2
Dest. Choice-3
Dest. Choice-4
O1 10 6 4 3
O2 11 7 5 4
O3 12 8 6 5
O4 13 9 7 6
Skeleton/structure of OD matrix Mass/ OD flows on Skeleton/Structure
OD matrix
(a) (b)
A B C A B CO1 200 100 50 O1 160 80 40: :: :
A B CO1 4 2 1::
Sunday OD Australia Day OD
Skeleton/Structure
Scaled down by 50 times
Scaled down by 40 times
Chapter 1: Introduction 10
Since correlation coefficient performs on the normalised values, let’s see the
skeleton/structure of the normalised flows in both OD matrices. To achieve this,
Sunday OD and Australia OD are normalised by scaling down by 50 times and 40
times, respectively. It can be shown, in Figure 1.8(bottom), that the “skeleton” of both
OD matrices are the same. In other words, although both OD matrices have different
sets of individual OD flows, they have same skeleton or structure and it is reflected in
the same preferences of destinations and correlation coefficient. Note that in this
example uniform scale factor is assumed for all OD pairs for ease of demonstration
only.
The significance of “structure” based information is that it yields classification
through patterns. For instance, in biology, the structure of organisms is analysed, and
the classification is based on the similarity of patterns defined by their structures
(Kroeber, 1943). Similarly, if there are a large number of observations from a specific
study region, there is an inherent structure attached to those observations. For example,
the structure of trips helps to classify travel patterns by analysing the demand
variations among different days or different times of a day. Because an OD matrix
defines the structure of travel patterns between different geographical locations, a
structural comparison of OD matrices also reflects structural comparison of travel
patterns.
Most transport planning models depend on the knowledge of travel patterns
expressed in terms of OD matrices for both short-term intelligent transport systems
applications, such as effective route guidance strategies, etc. (ATAP, 2016a), and long-
term strategic planning, such as transport network planning and service design. The
knowledge of travel patterns is also helpful for certain policy decisions, such as
shifting public holidays towards weekends. In Japan, as a strategic move to improve
the nation’s ailing economy, public holidays have been shifted to long weekends
(Chung, 2003).
Although the importance of OD matrix structural information is acknowledged
in the literature (see Section 2.4), the measures adopted to maintain structural
consistency during the OD estimation process are either based on traffic counts
measurements or travel surveys. It is possible that neither of these approaches may
capture the true structure of OD matrix because: a) the traffic counts on any link are
only point-based observations and cannot capture the distribution (structure) of trips
Chapter 1: Introduction 11
over a larger spatial context; and b) the constraints, such as trip productions/attractions
or ratio of OD flows or deviations from target OD matrix, are generally based on a
target OD matrix that is generally outdated. The most popular performance measures
are generally based on deviations of individual OD flows, and therefore cannot capture
the structure of the OD matrix.
1.1.4 Advanced traffic data sources
The availability of automated traffic counts has lessened the burdens of
cumbersome conventional methods of traffic modelling. Despite many issues and
challenges, the traffic counts-based approach has been widely adopted, mainly due to
the unavailability of alternative data sources at a larger spatial-temporal scale.
However, with advancements in information and communication technologies,
such as Bluetooth, GPS, smart cards, e-tags, mobile phones, etc., travel behaviour
research has garnered a large amount of interest compared to conventional methods.
Knowledge about travel demand obtained from big data sources may not reveal a
detailed demographic and contextual picture about commuter trips, but could provide
high spatial and temporal resolution when compared to travel surveys (Toole et al.,
2015). In cities such as Brisbane, Bluetooth data sets are currently used for travel time
and speed analysis (Bhaskar & Chung, 2013). However, with a good penetration rate
and detection layout, Bluetooth observations can also be used to construct and estimate
the OD matrices for large scale networks (Barceló, Gilliéron, Linares, Serch, &
Montero, 2012; Carpenter, Fowler, & Adler, 2012; Michau, Pustelnik, et al., 2017).
Thus, the era of big data is creating new avenues for the development of alternative
methods. The following section provides some detailed insights about the Bluetooth
data from Brisbane, Australia.
1.1.4.1 Brisbane Bluetooth data
The Brisbane City Council (BCC) region is equipped with more than 1200
Bluetooth media access control scanners (BMS), the locations of which are shown in
Figure 1.9.
Chapter 1: Introduction 12
Figure 1.9: Location of Bluetooth Scanners within the BCC region
Most Bluetooth observations are taken from cars equipped with Bluetooth
devices (Bhaskar et al., (2015)). In Brisbane City, traffic signal boxes at a traffic
intersection are generally connected to both magnetic loop detectors and Bluetooth
MAC Scanner (BMS). Figure 1.10 provides a snapshot of a Bluetooth equipped
vehicle approaching an intersection equipped with magnetic loop detectors and BMS.
The BMS detects the device MAC ID (of Bluetooth equipped car) and time-stamp of
detection within the scanning range of roughly 100 meters (Bhaskar & Chung, 2013).
Figure 1.10: BMS and loop detectors at an intersection in Brisbane city
The format specifications of raw Bluetooth data for the Brisbane region
(Bluetooth data from Brisbane City Council, 2016) are shown in Table 1.1. Here,
Select ID is the record number, Device ID is the encrypted MAC-ID of the Bluetooth
device, Area ID is the ID of the scanner location, the day and time-stamp are the
columns representing the day of the month (it is 7th day of March in Table 1.1) and
the time when the device was detected in the communication range of Bluetooth
Chapter 1: Introduction 13
scanner. The last column, Duration, represents the difference between the time of the
first and last discovery of the Bluetooth device. In this study, data from Duration was
not required for constructing trajectories and OD matrices. Table 1.1 shows that 35 is
the Device ID that traverses along the Area ID beginning from 10110 till 10277 at
approximately 7:30 A.M. -7:45 A.M.
Table 1.1: Sample Bluetooth data from Brisbane, Australia (Bluetooth data from Brisbane City Council, 2016)
Select ID Device ID Area ID Time-stamp Duration(s)
25055749 35 10110 2016:03:07 07:31:21 20
25055996 35 10224 2016:03:07 07:31:41 20
… … … … …
… … … … …
25113737 35 10277 2016:03:07 07:43:15 58
The raw Bluetooth data shown in Table 1.1 can be used to estimate: a) travel
times, b) turning proportions, c) retrieve trajectories, and d) OD flows between two
locations.
Figure 1.11 demonstrates the consistency of the Bluetooth turning proportions
as compared to Sydney Coordinated Adaptive Traffic System (SCATS) traffic counts
data from an intersection in the Milton area, Brisbane (Chung, 2016).
Figure 1.11: Comparison of turning proportions: Bluetooth vs SCATS (Chung, 2016)
Figure 1.12 illustrates the consistency of Bluetooth inferred trajectories
represented as a sequence of BMSs; that is, 1107-1168-1064-1513 (represented by
Chapter 1: Introduction 14
green coloured pins) over a period of four regular working weekdays in the month of
July 2014.
Figure 1.12: Consistency of Bluetooth trajectories during regular weekdays
1.2 RESEARCH PROBLEM
There are several challenges and problems associated with respect to OD matrix
estimation. Some of the most challenging research problems have been identified in
this study, and these are discussed in further detail in the following sections.
1.2.1 Problem of under-determinacy
Because the number of OD pairs is far greater than the number of equations
mapping the relationship between OD flows and link flows, there can be no unique
solution of OD matrix estimate (Antoniou, et al., 2016). In other words, when loaded
into the network, many OD matrices can reproduce similar set of link counts. For
instance, Figure 1.13 shows that the size of the OD matrix (i.e., 4*1) is greater than
the size of link flows vector (i.e., 2*1). Due to this imbalance, more than one OD
matrix can produce the same set of link flows. For example, if the link flows observed
at detector-1 and detector-2 are y1 and y2; flows between O1-D1, O1-D2, O2-D1 and
O2-D2 are represented by x1, x2, x3 and x4, respectively. Therefore, y1 is the result
of OD flows x1 and x2, and y2 is the result of x3 and x4. Thus, multiple combinations
of x1 and x2 can produce the same y1 flows. This is also the case with link flows in
another detector. This example clearly highlights the problem of under-determinacy in
traffic counts-based OD matrix estimations.
Chapter 1: Introduction 15
(a)
(b)
Figure 1.13: Demonstration of under-determinacy problem using (a) example network, (b) feasible solutions
Researchers previously introduced a target OD matrix within the objective
function (Cascetta & Nguyen, 1988; Yang, 1995) to minimise the problem of under-
determinacy and maintain structural consistency in the solution estimates. It was
assumed that an a priori (target) matrix contains important structural information; that
is, the patterns of trip distribution. Because the actual OD matrix is generally
unobserved, the structural consistency within the estimates can be preserved by
minimising the deviations between target and estimated matrices. However, by doing
so, the solution search space tends to be biased around target OD matrix, and this may
not improve the quality of the OD estimate because the target matrix is often
constructed from outdated surveys (Yang, 1995).
1.2.2 Mapping relationship between link flows and OD flows
The mapping relationship between link flows and OD flows (or assignment
model) often relies on the assumptions of route choices between OD pairs. The
“assignment model” is in itself a broad area of research, and realistic assignment of
the traffic on the network remains a challenging research problem (Balakrishna, Ben-
Akiva, & Koutsopoulos, 2007; Ben-Akiva, Gao, Wei, & Wen, 2012; Shafiei, Gu, &
Saberi, 2018; Toledo et al., 2003). The most common issues related to the assignment
matrix formulation in OD matrix estimation methods are:
O1
O2
D1
D2
Detector-2 (y2)
Detector-1 (y1)
y1=1000y2=2000
x1=300x2=700x3=1200x4=800
x1=600x2=400x3=500x4=1500
x1=ax2=1000-ax3=bx4=2000-b
…………………
Observed Link counts
OD matrix-1 OD matrix-2 OD matrix-i
……
Chapter 1: Introduction 16
The assignment matrix and OD matrix are mutually dependent on each other (see
Equation (3)). The first estimated assignment matrix is generally dependent on the
prior OD matrix, and both the OD and assignment matrices are mutually updated
until convergence. However, if the structure of the prior OD matrix (generally an
outdated matrix) or target OD matrix is poor, then the convexity condition might
not be satisfied and a perfect Stackelberg condition (see Section 2.2.2.3) might not
be obtained (Kim, Baek, & Lim, 2001).
The link flows are a function of the OD flows. Their relationship (i.e., assignment)
is non-separable and generally obtained from simulation (Antoniou, et al., 2016).
As such, the bi-level method is also treated as a non-convex problem.
The objective of most assignment-based methods is to match the deviations
between observed link flows with the user-equilibrium flows. Because the
observed flows may not always represent a user-equilibrium state, the deviations
between both might not be justified (Yang, 1995).
1.2.3 Computation cost
The fourth challenge arises from the computational cost associated with the size
of the OD optimisation problem (Osorio, 2017). For a smaller sized network, such as
intersections and linear networks, an assignment matrix is not complex, as it does not
involve route choices. However, for the large-scale networks, an assignment matrix
(either analytical or simulation based) plays a crucial role in increasing the OD
dimensionality problem. The complexity further increases in the case of dynamic OD
matrix computation due to the additional temporal-dimension involved (Djukic, Van
Lint, & Hoogendoorn, 2012). For instance, a city like Brisbane has around 1,500
Traffic Analysis Zones (TAZs) that contribute to around two million OD pairs to be
estimated for static OD matrix. Estimating dynamic OD matrices for four consecutive
time periods implies that the number of OD variables is eight million, which demands
a high computational requirement. Note that for simulation-based optimisation
algorithms, dimensions of the order 200 are generally considered to be high-
dimensional (Wang, Wan, & Chang, 2016).
Bi-level formulation is computationally expensive due to the complex user-
equilibrium assignment required for every iteration. Furthermore, the inseparable non-
linear relationship between the assignment matrix and OD matrix is a major hindrance
Chapter 1: Introduction 17
for solving upper-level objective function, and updating the OD matrix is consequently
quite challenging. Some researchers have proposed linearizing the assignment as an
alternative solution. However, linearization requires two assignment solutions, which
means two simulations per iteration, which further adds to the computational cost
(Maher, Zhang, & Van Vliet, 2001).
1.2.4 Lack of potential performance measures
In the literature, less attention has been paid to developing statistical measures
for structural comparison of OD matrix estimates. Performance measures, such as
RMSE, etc., are widely used in practice because they are mathematically convenient
and simple to use. However, the major limitation of most traditional metrics is that
they compare individual cells of OD matrices and do not compute statistics on groups
of OD pairs that are correlated (Djukic, Hoogendoorn, & Van Lint, 2013). Section
1.1.3 emphasised the importance of the inherent structural information of an OD
matrix. See Figure 3.1 and Table 3.1, for an example of the values of traditional metrics
being the same for both OD matrices, although they are structurally different.
1.2.5 The need for typical OD matrices that represent typical travel patterns
Most traditional transport planning models focus on mode-specific, trip-
purpose-based, and time-of-the-day OD matrices, but are limited to weekday and
weekend patterns only (ATAP, 2016a). The Household Travel Survey (HTS) for South
East Queensland (SEQTS, 2010) was conducted for over 10 weeks from mid-April
through late-June and in July in 2009. However, the survey period avoided the days
during School/University holidays which means travel patterns during that period were
unobserved. In addition to that the seasonal variations in the demand patterns is also
generally unknown because the survey is conducted only during a particular period of
the year. To estimate typical OD matrices, the typical travel patterns need to be
identified first. In this context, the following questions with respect to travel patterns
are intriguing:
What are the major travel patterns observed other than weekdays and
weekends?
How do travel patterns during Saturdays differ from those of Sundays?
Chapter 1: Introduction 18
Are travel patterns during public holidays different from those on
weekends?
Do school holidays during weekdays having different patterns from
regular working weekdays?
Are there any seasonal trends in travel patterns?
Estimating typical OD matrices that are representative of the aforementioned
travel patterns is not easy using current state-of-the-art techniques because: a)
traditional surveys are expensive, and as such, they are only conducted for weekdays
and weekends; and b) traffic counts data from loop detectors are point-based
measurements, and as such, they cannot provide the trip distribution information
necessary for travel patterns analysis.
Thus, there is a great need for advanced sources of data that can provide seamless
trip distribution related information to better understand the structural changes in travel
patterns over a network in both space and time.
1.2.6 Unknown penetration rates of trips inferred from advanced data sources
Lastly, emerging traffic data sources, such as Bluetooth, etc., capture only a
fraction of the actual demand, providing incomplete information about trips, and most
importantly, the penetration rate of Bluetooth trips is unknown due to the
unavailability of ground truth (Bhaskar & Chung, 2013). Bluetooth observations are
also random due to many factors, such as the socio-economic characteristics of zones,
distance between the scanners, speed of vehicles, etc. (Michau, Nantes, & Chung,
2013).
In the past, few efforts have been made to exploit Bluetooth data, and these are
limited to travel time observations only (Antoniou et al., 2014; Barceló, Montero,
Bullejos, Serch, & Carmona, 2013). From the validation perspective of Bluetooth OD
flows, previous studies have been limited only to intersections due to the availability
of ground truth from observed entry and exit counts (Carpenter, et al., 2012; Chitturi,
Shaw, Campbell IV, & Noyce, 2014). In terms of trajectories, although Michau et al.
(2016) developed a method to estimate OD flows, the method is not practical, because
the penetration rate of Bluetooth counts is considered a proxy for the penetration of
OD flows, which is not true in general. This is because the trajectories inferred from
Chapter 1: Introduction 19
Bluetooth do not represent complete trip sequences. In other words, the actual origins
and destinations of the trips cannot be observed from Bluetooth. Zhou and
Mahmassani (2006) proposed method to avoid estimating penetration rates of link to
link split fractions. However, no technique has been proposed until now to use the
vehicle trajectories (say from Bluetooth) information into the OD estimation
formulation without the need to estimate the unknown penetration rates.
1.3 RESEARCH MOTIVATION
Although past studies (Michau, et al., 2016) have foreseen the practical
applications, there has not yet been any direct implementation of Bluetooth flows into
OD matrix optimisation models. The transport departments in most metropolitan
cities, especially in Brisbane, Australia (Department of Transport and Main Roads
(TMR) and Brisbane City Council (BCC)), are working towards data-driven
approaches for traffic demand estimation and prediction (TMR, 2017). Both TMR and
BCC have supported transport-related research by sharing encrypted Bluetooth data
with the Queensland University of Technology under a license agreement for many
years. Brisbane is one of the few cities in the world collecting massive quantities of
big traffic data over a larger spatial and temporal context (TMR, 2017). The challenges
of OD matrix estimation coupled with the availability of emerging data sources, such
as Bluetooth, forms the key motivation for exploring new perspectives into the OD
matrix estimation problem in this study. This research is focussed on estimating
vehicle trips through Bluetooth observations. Since most of the Bluetooth trips are
inferred from the Bluetooth equipped cars, the unit of travel demand can be considered
as car trips.
1.4 RESEARCH QUESTIONS, AIM, AND OBJECTIVES
Based on the research problems, this study aims to answer the following research
questions:
RQ1: How can the structural comparison of OD matrices be achieved?
RQ2: How can Bluetooth data be incorporated into the exiting OD matrix
estimation process?
RQ3: How can Bluetooth data be used to address the challenges of bi-level
optimisation methods?
Chapter 1: Introduction 20
RQ4: How can Bluetooth data be used to infer typical travel patterns of large-
scale networks?
This research aims to develop statistical metrics for the structural comparison of
OD matrices; develop methodological approaches to improve the quality of OD matrix
estimates using big-traffic data (Bluetooth and loop-detector); and cluster Bluetooth
based OD matrices to identify typical travel patterns for large-scale networks.
Corresponding to the research questions above, the objectives are:
Objective-1: To develop statistical metrics for the structural comparison of
OD matrices. While traditional metrics account for deviation of individual
OD flows, the developed metrics should account for the structure of OD
matrix/trips distribution.
Objective-2: To develop methods for incorporating structural information of
Bluetooth trips into the bi-level OD matrix estimation.
Objective-3: To advance the OD estimation methodology by relaxing the
dependence on assignment matrix using big traffic data.
Objective-4: To devise a methodological approach for clustering Bluetooth
based OD (B-OD) matrices and identify typical travel patterns for large-scale
networks using a case-study application on the BCC region.
1.5 RESEARCH METHODOLOGY
Followed by a comprehensive literature review in Chapter-2, the methodology
was systematically defined using five tasks that address the objectives and research
questions, as shown in Figure 1.14. These tasks include:
Task-1: This task was used to develop new statistical metrics and addresses RQ-
1 and Objective-1.
Task-2: This task was used to develop assignment-based methods and addresses
RQ-2 and Objective-2.
Task-3: This task was used to develop non-assignment-based method and
addresses RQ-3 and Objective-3.
Task-4: This task was used to develop a detailed methodological approach to
cluster B-OD matrices, and identify typical travel patterns for large scale
Chapter 1: Introduction 21
networks with a case study application on the BCC region. This addresses RQ-
4 and Objective-4.
Further insights into the individual tasks are discussed in the following sections.
Figure 1.14: Research methodology framework
1.5.1 Task-1
This task focussed on developing statistical metrics for the structural comparison
of OD matrices after discussing the limitations of existing metrics. The fundamental
concepts of the proposed metrics; that is, the Mean Geographical Window based
Structural Similarity Index (GSSI) and Mean Normalised Levenshtein distance for OD
matrices (NLOD) were borrowed from other disciplines and extended to exploit the
structural information of OD matrices. In the end, the robustness of the proposed
metrics is tested through sensitivity analysis.
Chapter 1: Introduction 22
1.5.2 Task-2
This task aimed to develop an assignment-based OD matrix estimation methods
using additional structural knowledge of Bluetooth trips. Here, the Bluetooth trips
were incorporated through structural comparison of estimated and Bluetooth OD/path
flows within the objective function formulation. Task-2 was further divided into two
methods - B-OD method and B-SP method.
Bluetooth OD (B-OD) method: Here, the objective function formulation
included the structure of Bluetooth trips expressed in terms of Bluetooth OD
flows. It is further divided into two scenarios – ideal and near-ideal. Both
scenarios are based on B-OD flows built with the assumption that the
Bluetooth trip ends represented the true trip ends, and is suitable for
networks (in cities such as Brisbane) highly equipped with Bluetooth
scanners.
o The ideal scenario assumed that the structure of the Bluetooth based
OD matrix represented the exact structure of true OD matrix with a
fixed (20%) penetration rate of Bluetooth OD flows.
o The near-ideal scenario assumed randomness in the structure of a
Bluetooth based OD matrix developed by randomly selecting 20% of
Bluetooth trajectories.
The B-OD method was tested for different percentages of Bluetooth connected
OD pairs in a controlled environment established in Aimsun.
Bluetooth subpath (B-SP) method: Here, the objective function formulation
included the structure of Bluetooth trips expressed in terms of subpath
flows. The B-SP method was close to reality because it included the actual
Bluetooth observations, which were only a sample, random and incomplete.
Since it is based on subpath flows, this method works even when Bluetooth
trip ends do not represent true trip ends. This method was tested for different
penetration rates of Bluetooth inferred trajectories in a controlled
environment established in Aimsun Next. This method can be applied over
networks equipped with less density of scanners.
Chapter 1: Introduction 23
1.5.3 Task-3
This task developed a non-assignment-based OD matrix estimation method.
Specifically, the task focussed on:
A methodology to replace the traditional assignment-based mapping
relationship between link flows and OD flows with observed turning
proportions-based relationship.
Maintaining the structural consistency using additional knowledge of the
Bluetooth OD structure in the objective function (similar to near-ideal
scenario in the B-OD method used in Task-2).
In this part of research, the experiments were designed for different percentages
of Bluetooth connected OD pairs. The methodology was tested on a sample network
with sufficient route choice options and a real network. To validate the results, the true
observations of OD flows, link flows and Bluetooth OD flows were obtained from
simulation in Aimsun Next (2019), and are compared with the results of both non-
assignment-based and assignment-based methods.
1.5.4 Task-4
This task developed a methodological approach to identify typical travel patterns
by clustering multi-density B-OD matrices. The methodology to cluster B-OD
matrices specifically included:
Deploying proposed statistical metrics (explained in Chapter 3) as structural
proximity measures in a simple three-level approach. The methodology
identifies optimum DBSCAN parameters and clusters multi-density datasets
(OD matrices).
A practical demonstration of the proposed clustering methodology with a
case study application using real Brisbane Bluetooth data from 415 days.
1.6 SIGNIFICANCE AND SCOPE
This thesis contributes to the future of travel demand estimation by exploiting
the additional knowledge from emerging data sources, such as Bluetooth. The
significance of this research is two-fold:
Chapter 1: Introduction 24
1. Improve the current practice of OD demand estimation: State-of-the-art
techniques and practice depend greatly on observed traffic counts for traffic
demand estimation. However, accurate estimation of traffic counts close to
observed ones does not guarantee the correct estimation of traffic demand
due to under-determinacy. Thus, additional knowledge from big traffic data
should fill this gap.
2. Alternatives to demand modelling: Traffic modelling is a computationally
intensive and an expensive process. There is growing interest in alternative
techniques that are mostly data-driven. With the availability of pervasive
data sets, a good blend of theoretical and empirical models should reduce
the dependence on developing complex mathematical models for traffic
simulation. Data-driven empirical models that use partial observed OD/path
flows information can relax the assumptions involved in an assignment
matrix that maps the relationship between unobserved OD flows and
observed traffic counts. This has huge computational benefits, especially in
the space of dynamic OD estimation.
While this study focusses on the major issues of state-of-the-art techniques and
practice, it has the following limitations:
1. This study is limited to static OD demand estimation only. However, it can
be extended to dynamic and quasi-dynamic models.
2. This study focusses only on improving the objective function formulation
and does not focus on new solution algorithms. Classical gradient descent
algorithm was used for testing the proposed approaches. However, the
proposed objective function can be readily used in state-of-the-art
algorithms, such as simultaneous perturbation stochastic approximation
SPSA (see Section 2.3).
3. The non-assignment-based methodology was based on the assumption that
turning proportions are available at all intersections. The formulation was
based on Bluetooth OD flows and not on Bluetooth subpath flows.
Chapter 1: Introduction 25
1.7 DEFINITIONS
The definitions of a few key terms used in this study are provided below:
Assignment matrix: is the mapping relationship between the OD matrix and link
flows. Thus, it is also referred to as the link-proportion matrix because it represents the
proportion of the OD flows passing through a link.
Bluetooth subpath: is the Bluetooth trajectory represented as the sequence of
BMSs that is a part of the actual trip sequence.
Geographical window: refers to the group of OD pairs within the geographical
boundaries of a higher-order OD pair.
Link flows/traffic counts: the flows observed on any link are referred to as link
flows.
Local window: the local window from an OD matrix refers to a sub-matrix that
consists of group of OD pairs.
OD matrix: is a tableau representation of travel demand (in terms of trips)
between different origin and destination pairs. Each cell of the OD matrix represents
an OD pair and the value refers to OD pair demand or simply OD flows.
Path flows: refer to the portion of OD flows passing through any path.
Path proportion matrix: represents the proportions of OD flows passing through
a path.
Structure of the OD matrix: is defined as the arrangement of and the correlation
that exist between OD pairs within the OD matrix.
Structure of Bluetooth trips: refers to the inherent structural information present
within a group of Bluetooth inferred trips.
Structural correlation: is the correlation that exists between group of OD pairs
or paths when they share similar activities, travel costs, or zones of similar geography,
etc.
Target/Prior OD matrix: refers to the best historical estimate developed from an
outdated travel survey.
Turning proportion: refers to the ratio of the turning volumes to the approach
volumes at an intersection.
Chapter 1: Introduction 26
Typical OD matrix: refers to an OD matrix that represents a typical travel pattern
observed in the OD matrices belonging to a certain type (cluster).
1.8 THESIS OUTLINE
The chapters for the remainder of the thesis are outlined below:
Chapter 2: provides a comprehensive review of the studies pertaining to OD
matrix estimation, with special attention paid to the problem formulation, solution
algorithms, OD matrix structural information, statistical performance measures, and
the types of measurements widely used in OD estimation problems.
Chapter 3: discusses the development of statistical metrics for structural
comparison of OD matrices. The robustness of the proposed metrics is further tested
through sensitivity analysis.
Chapter 4: discusses the development of an assignment-based OD matrix
estimation methodology using additional structural knowledge of Bluetooth trips.
Chapter 5: discusses the development of non-assignment-based OD matrix
estimation methodology using observed turning proportions and Bluetooth structural
knowledge.
Chapter 6: proposes a methodological approach for clustering B-OD matrices
and identifying typical travel patterns with a case study application using Brisbane
Bluetooth data.
Chapter 7: provides the conclusion for this research, and also includes the future
scope and recommendations.
Chapter 1: Introduction 27
Chapter 2: Literature Review 28
Chapter 2: Literature Review
This chapter provides a comprehensive review of the literature with respect to
the OD matrix estimation problem. First, the methods of OD matrix estimation are
broadly classified in Section 2.1. A comprehensive review of previous studies is then
provided from five different perspectives: problem formulations (Section 2.2), solution
algorithms (Section 2.3), OD matrix structural information (Section 2.4), statistical
performance measures (Section 2.5), and indirect/partial measurements of OD flows
(Section 2.6). Finally, summary of the chapter is provided in Section 2.7.
2.1 BACKGROUND OF OD MATRIX ESTIMATION
Willumsen (1978) classified the methods of origin-destination matrix estimation
into three major categories: a) survey-based, b) trip-distribution model-based, and c)
traffic counts-based.
a) Survey-based: Here, the OD matrix is estimated through direct household
surveys and/or road side interviews. The survey is generally too expensive and
cumbersome and almost impossible to ‘truly’ capture the entire demand,
especially for larger networks. Therefore, researchers apply different sampling
techniques, such as cluster sampling, geographic, and demographic
stratification, etc., to collect data (generally a travel diary) on a smaller scale
and use grossing-up factors for the estimation of a full OD matrix (Willumsen,
1978). The OD matrix from this method approximates OD patterns for a
particular period, and as such, is often outdated for current practical
applications.
b) Trip distribution model-based: Trip distribution models estimate the trip
interchanges between zones based on land use, transportation characteristics,
and the distance between the zones. Models used in the trip distribution stage
are generally gravity models based on the gravitational theory of Newtonian
physics (Martin & McGuckin, 1998; Wilson, 1967). The generic form of the
gravity model is shown in Equation (4).
Chapter 2: Literature Review 29
(4)
Where, is the OD demand of wth OD pair (oth origin and dth destination);
are the trips produced from oth zone; are trips attracted to dth zone;
is the friction factor (existing/estimated travel times or distances) indicating the
temporal/spatial separation between the two zones; is the trip-distribution
adjustment factor for the interchanges between two zones; and refers to the
number of destinations, respectively. The drawback of these models is that
model calibration is expensive and they are generally not transferable over
space and time.
c) Traffic counts-based: The third method that became quite attractive in the early
80’s is estimating OD matrices from observed traffic counts. Traffic counts
data that were limited to the study of traffic control, accident studies,
maintenance planning, or road construction and intersection improvements
were later extended to OD demand estimation by researchers like Robillard
(1975). Traffic flows are generally used in two ways to estimate the OD matrix:
a) to calibrate the parameters of demand models; and b) in optimisation
formulation, where the OD matrix estimation is solved as an inverse of the
assignment problem (Cascetta, 1984).
Among the above-mentioned methods, the traffic counts-based method has
gained more importance, as it is cost effective and can provide seamless traffic data
that facilitates tracking of traffic evolution in a convenient and efficient way (Cascetta,
1984). Over the last three decades, almost all studies have been based on these indirect
measurements, although the quality of an OD matrix estimated based on this approach
is still being questioned (Stathopoulos & Tsekeris, 2003). Several researchers have
addressed the problem of OD estimation from different perspectives, such as temporal
dimension – static (Yang, 1995) to time-dependent/dynamic (Ashok, 1996), spatial
dimension – intersection (Cremer & Keller, 1981) to large scale networks (Osorio,
2017), traffic assignment – uncongested (Hazelton, 2000) to congested networks
(Frederix, Viti, & Tampère, 2011), optimisation algorithms – deterministic (Cascetta,
1984) to stochastic (Ma & Qian, 2018), and performance measures – simple cell-based
Chapter 2: Literature Review 30
(Cascetta, 1984) to a structure-based (Djukic, et al., 2013). Nevertheless, most
methods are still dependent on traffic counts-based formulations only.
2.2 PROBLEM FORMULATION
The OD matrix adjustment became a relevant research and practical problem
after the research community started to treat it as an optimisation problem in the early
1980’s (Cascetta, 1984; Fisk & Boyce, 1983). The mathematical foundations were
initially laid by considering it a static problem and then extended to dynamic and quasi-
dynamic conditions, as further discussed in detail in the following sub-sections.
2.2.1 Static OD formulation - uncongested networks
Previously, research contributions were limited to uncongested networks
assuming proportional assignment, .Proportional assignment assumes that link cost
(travel time) is not dependent on link flows, and is thus independent of demand. (Bell,
1983; Robillard, 1975) The following sub-sections provide the different mathematical
models proposed for static OD demand estimation in uncongested traffic conditions.
2.2.1.1 Information Minimisation/Entropy Maximisation
In these methods, the OD matrix is estimated by minimising the measure of
distance (or maximising the entropy) from the target/historical trip matrix adhering to
the constraint that the observed flows are reproduced back on some of the links after
assigning the estimated OD demand onto the network (Van Zuylen, 1978; Van Zuylen
& Willumsen, 1980; Willumsen, 1984b).
(5)
Equation (5) minimises the information matrix, IM and the corresponding
solution is shown in Equation (5a):
(5a)
Where, is the proportion of passing through lth link and is the
corresponding weight factor.
Chapter 2: Literature Review 31
2.2.1.2 Maximum Likelihood approach
The nature of OD demands and traffic observations are random and are
subjective to sampling and measurement errors respectively. Therefore, statistical
inference to OD matrix estimates has become relevant, with methods such as the
maximum likelihood (ML) approach being developed (Spiess, 1987).
In the Spiess (1987)’s ML approach, the likelihood of observing the “target” OD
matrix and the observed traffic counts is maximised upon the condition of the
estimated OD matrix. The elements of the target OD matrix are obtained from a simple
random sampling and assumed to follow a multinomial distribution. For larger
samples, the distribution can be Poisson. The traffic observations are also assumed to
be Poisson distributed. The distributions of both the target OD matrix and traffic counts
are assumed to be independent; thus, the likelihood of observing both sets is the same
as the product of the two likelihoods. The likelihood of observing the target OD
demand, and observed traffic flows, is expressed in Equation (6):
(6)
The solution of Equation (6) is that maximises the product of likelihoods.
Since maximising the logarithm of Equation (6) is same as maximising the likelihoods,
it is convenient to use a logarithmic form for estimating the solution. Assuming that
the sample OD flows follow a MNL distribution, and is the sampling
fraction for trips. For example, is obtained by observing an independent Poisson
process with a mean of , the probability of observing is
; (6a)
(6b)
The logarithm of the likelihood/probability can then be expressed by
Equation 6c,
(6c)
Similarly, if the sample of observed traffic counts is very small, then Poisson
distribution can be used to express as shown in Equation 6d,
Chapter 2: Literature Review 32
(6d)
In Equation 6d, is the flow volume in lth link resulting after assigning the
OD flows, .
Assuming a proportional assignment where, is the proportion of Xw flowing
in lth link, the OD demand estimation problem is formulated as shown in Equations 6e,
6f, and 6g.
(6e)
(6f)
(6g)
The value of X is estimated by maximising Equation (6e) with respect to X.
2.2.1.3 “Generalised Least Squares” (GLS) approach
This approach is similar to the ML approach, but assumes that the target OD
matrix and traffic volumes are generated by some sort of probability distribution
functions. By estimating the parameters of these distributions, the OD matrix is finally
estimated. The attraction feature of GLS approach is that it allows the combination of
traffic survey and observed traffic counts to estimate an OD demand matrix in addition
to accounting for the relative accuracies of both data sources. If the traffic counts and
target OD matrix are assumed to follow multivariate normal distributions, then the
GLS estimator coincides with a maximum likelihood estimator. Since the OD matrix
and observed flows are probabilistic in nature, the direct estimate (target OD) of OD
matrix and of flow vector Y can be expressed in matrix notations, as shown in
Equations (7) and 7a:
(7)
(7a)
In Equation (7), the mean values of , and are assumed to be zero, with no
distributional assumptions. It is generally assumed that is same as . The
Chapter 2: Literature Review 33
dispersion matrices of and are V and W, respectively. In Equation (7a), is
the function of and where is the proportional assignment matrix, the cells of
which express the proportions of the OD demand flowing in each of the network
links. The proportional assignment method simplifies the estimation process by
assuming that the is independent of . Thus, the other advantage of a statistical
approach is that it can accommodate sampling variance and also the temporal
fluctuation of the OD demand if it is significant (Cascetta, 1984). According to
Cascetta (1984), even “heavier” approximations of dispersion matrices can produce
results better than those of a maximum entropy estimator.
The vectors and are mutually independent, and inverse of the
corresponding dispersion matrices; that is, V and W, are used as weight factors
and , respectively, as expressed in Equation (2).
The estimate, X is obtained by solving Equation (8).
(8)
(8a)
2.2.1.4 Bayesian approach
Maher (1983) proposed a Bayesian approach to estimate the posterior probability
of X from the prior probability distribution, ( ) and observed traffic flows,
. In this approach, the target OD matrix is considered a prior probability
function of the estimated OD matrix (note that no distribution is assumed for prior OD
flows in MLE). If the prior OD matrix is completely reliable, and then however
remarkable the random observations of traffic counts are, they will not have any effect
on the estimated OD matrix. Observed flows will have some impact only when there
is little confidence in the prior OD matrix. Bayesian techniques can be used when there
are varying degrees of belief for different OD pairs. For example, recent transport
survey data can establish more confidence on prior OD flows of few OD pairs. At an
intersection level, turning flows for few approaches are known to be more accurate;
for example, if turning movements are banned/impossible for certain movements.
Although the statistical properties are similar, the role played by differentiates
Bayesian from ML and GLS techniques (Maher, 1983). In ML and GLS approaches,
is the parameter of the likelihood functions, and in Bayesian technique,
Chapter 2: Literature Review 34
is the random variable with given prior distributions, ( ). The posterior
probability of observing is conditional on the observed traffic counts. The
expressions for posterior probability and optimisation function to estimate from the
feasible solution set ℛ are shown in Equations (9) and (10), respectively.
(9)
(10)
2.2.2 Static OD formulation - congested networks
Although the above-mentioned models laid strong mathematical foundations for
the OD matrix estimation problem, they also suffer from a few limitations. Firstly,
traditional entropy maximisation models never consider the probabilistic nature of OD
flows and link flows. Normality also does not hold well for larger flow values. Thus,
a Bayesian approach might fail if normal distribution is considered for both OD flows
and link flows. Normal distribution is widely considered because it is compatible with
proportional assignment. Poisson distribution can replace normal distribution, but at
the cost of computational effort.
The second and major limitation is that these models are developed assuming
uncongested network conditions. The strong assumption that the route choice
proportions are independently determined outside the OD demand estimation process
is unrealistic, as it implies that there is a great need to account for congestion in OD
flow estimation. The following sections discuss some of the seminal contributions on
congested networks.
2.2.2.1 Nguyen (1976)’s approach
One way to address the issues related to proportional assignment is to assume an
equilibrium assignment in the modelling framework. The first equilibrium-based
approach was proposed by Ngyuen in the late 70’s. According to this method, the
anticipated solution matrix is one where the assigned network can reproduce travel
times close to the ones that correspond to the observed link flows. The relationship
between link flows and travel times is given by a link cost function ( C(Y)) and
it is assumed that it is a known monotone increasing function (Nguyen, 1976, 1977).
The drawback of this approach is that it is not strictly convex, w.r.t and it does not
guarantee a unique solution. The generic form of Ngyuen (1976)’s approach is shown
Chapter 2: Literature Review 35
in Equation (11. The set of equations (11a)-(11c) detail the formulations of Ngyuen
(1976)’s equilibrium model.
(11)
(11a)
; w ϵ W (11b)
; l ϵ L (11c)
(11d)
Where, , , , , are link flows on link l , travel cost on link l,
average cost of travel for wth OD pair, set of paths for wth OD pair, path flow on path
k, and Kronecker Delta function for link l in path k, respectively.
At equilibrium, the path travel cost is given by Equation (11e, where, is
the equilibrium link flow on link l.
(11e)
Equation (11e is convex with variable , but might not be strictly convex w.r.t.
. This implies that it might not lead to a unique solution for . To address this, some
researchers have proposed the generic form shown in Equation (12 to reduce the
solution search space.
(12)
For example, Gur (1980a) (see Equation (13)) and Jornsten and Nguyen (1979)
(see Equation (14)) proposed to solve for closest to the target matrix from the set
of optimal solutions, ℛ.
(13)
(14)
Chapter 2: Literature Review 36
2.2.2.2 Combined Distribution and Assignment
Instead of solving this as two separate equations (i.e., Equations (11) and (12)),
Fisk and Boyce (1983) proposed a combined formulation known as combined
distribution and assignment (CDA). While it is promising, it has the following
drawbacks: first, it assumes that there are no inconsistencies in the observations of
traffic counts; and second, it requires all links in the network to contribute to the
observed link flows. The generic form is shown in Equation (15) and the detailed
formulation for CDA is shown in Equation (15a).
(15)
(15a)
Where, is the cost on the shortest route for wth OD pair, and are the
weight factors for the two objectives. The rest of the terms have meanings as defined
previously.
2.2.2.3 Basic bi-level formulation
The previous approaches solved for X either using two separate formulations or
through combined formulation. However, in bi-level formulation, the equilibrium
assignment and OD matrix are solved as two sub-problems, mutually dependent on
each other. This approach is similar to a Stackelberg condition in game theory, where
leaders are given the first choice to estimate X in accordance with their constraints to
minimise their objective function while considering the reaction of the follower; that
is, user-equilibrium assignment (Yang, Sasaki, Iida, & Asakura, 1992).
Bi-level means that the OD matrix estimation is not a straight-forward
optimisation in a single formulation. A bi-level approach can be used as an efficient
approach to estimate the OD matrix and route choice simultaneously under congested
traffic conditions (Yang, et al., 1992).
The advantages of bi-level formulation are that; first, the model always results
in a feasible solution, but not with a guarantee that it is close to the ground
observations, even if the traffic counts are inconsistent. Second, the model only
Chapter 2: Literature Review 37
requires a subset of link flows. This means that all links in the network need to
contribute for observed link flows. Third, the route choice proportions are determined
endogenously, and equilibrium link flows and OD matrix are determined
simultaneously (Yang, 1995).
The generalised framework of the bi-level problem is shown in Equation (16).
Upper level:
(16)
Lower level: (16a)
Most studies assume a generalised least squares or entropy maximising model in
the upper-level, and equilibrium assignment as a lower-level problem.
In comparison, Nguyen (1976)’s and Fisk and Boyce (1983)’s CDA approaches
neglect the second term (i.e., is zero) of the objective function shown in Equation
(16). In contrast, Spiess (1990) considered to be zero, and assumed equilibrium link
flows.
Despite its advantages, there are a few limitations to the bi-level approach. Bi-
level programming problems are generally difficult to solve because the objective
function in the upper-level can be evaluated only after solving the optimisation
problem in the lower level. This framework is non-convex and non-differentiable, and
as such, may not lead to a global optimum solution. Because the second term is convex,
Spiess (1990) relaxed the first term by assuming that the target matrix is almost
accurate and traffic counts can be used to arrive at an OD matrix estimate as close as
possible to the target matrix. Although it is suitable for large networks, the
methodology limits the solution to a local optimum only.
To overcome the limitations of bi-level formulation, Yang (1995) incorporated
a network equilibrium model in terms of variational inequalities, and claimed that, “the
convexity of the upper level formulation is so strong that it is most likely to converge
at global optimum”. Yang (1995) proposed a GLS formulation for the upper level, as
shown in Equation (17). Note that in GLS, and are and , respectively.
(17)
Chapter 2: Literature Review 38
; s.t. (17a)
Lower level: (17b)
subject to ; w ϵ W (17c)
(17d)
Where, is the user-optimal link flows vector; is the set of feasible link
flows solutions for OD matrix, X; and C(Y) is the vector of network link travel costs.
The meanings of the other terms were provided previously.
2.2.2.4 Stochastic bi-level formulation
Although the equilibrium assignment approach might capture congested traffic
conditions, it still lacks the ability to estimate an OD matrix X that can reproduce the
observed flows due to errors and inconsistencies of the observed link flows. Fisk
(1989) mentioned that no OD matrix X assigned to the network can satisfy the
observed link flows, because most of the models assume that traffic counts are
available from all links, and that inconsistencies are removed by certain pre-processing
techniques. To address this, some researchers have proposed a stochastic approach in
the bi-level formulations. Jörnsten and Wallace (1993) considered traffic flows to be
random variables. Because the user equilibrium-based models assume that the user
perception of travel costs does not vary among the travellers, it is more realistic to
consider randomness through stochastic user equilibrium. For instance, in-between-
driver variability, expressed as (a dispersion parameter to describe road users’
perception of travel costs, while larger values of mean little between-driver
variations in perceived costs), can be considered within the logit models for stochastic
loading (Maher, 1998; Maher, et al., 2001).The upper-level formulation is similar to
Equation (17) and the lower-level stochastic formulation is shown in Equation (18).
Lower level:
(18)
Where, is the satisfaction function arising from stochastic loading
based on link flow . It is calculated as shown in Equation (18a, where, is the path
cost though kth path between wth OD pair (Maher, 1998).
Chapter 2: Literature Review 39
= (18a)
Some researchers proposed using even in the upper level formulation (Lo &
Chan, 2003; Wang et al., 2016), as shown in Equation (19).The lower level formulation
is similar to Equation (18a.
(19)
In Equation (19), is the dispersion matrix of , and the other terms have their
usual meanings.
2.2.2.5 StrUE bi-level formulation
Dixit, Gardner, and Waller (2013) introduced stochasticity into the user-
equilibrium through the concept of strategic equilibrium (StrUE), and stated that the
path travelled by each user, in a given demand scenario, is chosen regardless of the
realized travel demand on a given day. Because user-equilibrium link flows are
dependent on demand and its distribution, insensitivity to demand realisation implies
that the actual link flow observations may not be from the user-equilibrium state. In
other words, it can be considered that the user-equilibrium exists stochastically across
all demand realisations.
In the StrUE-based bi-level framework, the upper level provides the total mean
demand and its variance to the lower level (StrUE model), which in turn provides the
mean and variance of link flows to the upper level. Incorporation of higher order
variables; that is, mean and variance, facilitates the optimisation model to incorporate
the daily variations in the link flows (Wen, Cai, Gardner, Dixit, & Waller, 2014). The
upper level formulation tries to minimise the deviation between: a) the mean of the
observed link flows and the estimated mean of the link flows, and b) the standard
deviation of observed link flows and estimated standard deviation of the link flows, as
described in Equations (20) (Wen, et al., 2014).
Upper level:
(20)
Lower level:
(20a)
Chapter 2: Literature Review 40
s.t. (20b)
Where, and are the expected and standard deviations of total demand, T;
and are the expected and standard deviations of ; that is, observed link
flows on link l,; g(T) is log-normal density function of total demand, T; pl is the
proportion of T on link, l; cl is the travel cost for link, l; is the link cost for link,
l at free flow condition; and are Bureau of Public Roads (BPR) parameters
(Manual, 1964). Note that in Equation (20), total demand T is the variable and not OD
demand, which is later estimated from proportions that are assumed to be known.
2.2.2.6 Single-level formulation
While many efforts, as discussed above, have been made with respect to
randomness, errors, and inconsistencies of observations and user-equilibrium models,
the bi-level framework still only depends on traffic counts-based methods.
Considering the limitations of bi-level methods and the under-determinacy problem of
traffic count-based formulations, the search for better methods has always existed.
With the availability of additional data from emerging data sources such as Bluetooth
trajectories, etc., the ability to transcend from purely traffic counts-based methods has
recently taken place.
Michau, et al. (2016) proposed a methodological framework to relax the need
for assignment formulation, which means that there is no need for a bi-level
framework. He developed a link dependent OD matrix (Link-OD) method that directly
includes assignment information through the observations of Bluetooth inferred
trajectories. The observed trajectories are represented in terms of a Link-OD matrix.
The CDA method proposed by Fisk and Boyce (1983) is also a single-level
formulation. However, the only difference between the CDA and Link-OD methods is
that the latter implicitly includes assignment through observed path flows, while the
former estimates it though optimisation. The single-level formulation proposed by
Michau, et al. (2016) is expressed as shown in Equation (21).
(21)
Chapter 2: Literature Review 41
(21a)
(21b)
Where represents the Link OD matrix, is the portion of Bluetooth OD
flows between oth origin and dth destination flowing on link, l; and is the ratio of
Bluetooth counts to observed traffic counts on link, l. Objectives F1, F2, and F3
explicitly deal with the other conditions; that is, consistency constraint, Kirchhoff’s
law, and total variation, respectively. The consistency constraint ensures that the total
flow should be greater than the Bluetooth flow. Kirchhoff’s law (from physics)
conserves the flows at an intersection, and the total variation function lays down the
constraint that two paths with close origins and the same destinations should be the
same. The estimated Link-OD is then converted to the OD matrix through the
formulation of Equation (22).
(22)
Where, E1 is the incidence matrix; that is, a network-based information
connecting nodes to the links ((Michau, et al., 2016) for further details). Note that E1
and Q are multiplied through a Hadamard product.
However, the major limitation of this method is that the penetration rate of
Bluetooth trips is assumed to be equal to the penetration rate of Bluetooth link counts.
This is not true, because the penetration rate of Bluetooth trips is generally unknown
due to unavailability of the ground truth.
2.2.3 Dynamic OD formulation
The methods described in the previous sub-sections have one thing in common
– they are all based on static formulations and cannot capture the dynamics of traffic
flows, such as hourly demand variation, etc. The dynamic expressions of OD demand
are more appropriate compared to static OD demand for real time applications.
However, during the same period that witnessed great surge in static model
formulations there was also growing interest towards capturing traffic dynamics from
time-varying measurements. Several researchers intended to use time-dependent
Chapter 2: Literature Review 42
variables for estimating OD flows as another means to tackle the under-determinacy
problem (Cremer & Keller, 1981; Cremer & Keller, 1987).
The pioneering work in defining the dynamic relationships between time
dependent OD flows and traffic flows can be attributed to Cascetta, Inaudi, and
Marquis (1993), who extended the concept of GLS estimator from static to dynamic
conditions and proposed two methods – simultaneous and sequential. The
“simultaneous” method considers time-dependent link flows from all time intervals in
a single set and estimates time-dependent OD flows in one step. On the other hand, the
“sequential” method estimates the OD matrix for each time interval based on link
counts and OD flows from the previous intervals. Although the “sequential” method
provided the foundation for up-coming dynamic models, it lacks predictive
capabilities. Thus, it is not suitable for real-time applications, such as predicting OD
flows for future time-steps.
Early works on dynamic OD estimation were based on the state space modelling
framework, especially Kalman filter (KF) algorithms (Ashok, 1996). Although the
Kalman filter algorithm first appeared in the transportation field in the early 70’s, it
was limited to the estimation of traffic densities (Gazis & Knapp, 1971). Okutani and
Stephanedes (1984) extended its formulation by considering an auto-regressive
process. However, the method was not appropriate because it considered OD flows as
state variables and the Kalman filter algorithm assumes normal distribution for state
variables. Because OD flows cannot be considered normally distributed, they cannot
form state variables. Most research works (Cremer & Keller, 1981) during that period
were also constrained to small-scale (closed) networks; that is, intersections, and the
dynamic relationship between the state variables and observed traffic measurements is
not complicated because there is no role of travel time in assignment.
However, the decade between 1990-2000 witnessed several research works
related to the dynamic OD demand estimation for open networks (Ashok, 1996; Bell,
1991; Chang & Wu, 1994; Hai, Akiyama, & Sasaki, 1998; Hu, 1996; Kang, 1999; Van
Der Zijpp, 1997). Among them, the concept of deviations of OD flows proposed by
Ashok (1996) has been cited by many other researchers. Ashok (1996) proposed the
use of deviations of OD flows from historical estimates as state variables, and
conducted experiments on open networks (linear). Ashok (1996) method is an
Chapter 2: Literature Review 43
extension of Okutani and Stephanedes (1984), but limited to auto-regressive
formulation.
Most of the previous works assumed the complete availability of information
related to input and output flows, which might not be always true. However, the
beginning of the 21st century witnessed several advancements in information,
communication, and technology that seem to provide additional traffic data. Observed
time-dependent traffic data, such as sample OD demand (AVI), turning ratios, travel
times, and trajectories of probes have begun to find their space into the measurement
equations of state space models (Antoniou, Ben-Akiva, & Koutsopoulos, 2006;
Asakura, Hato, & Kashiwadani, 2000; Barceló Bugeda, Montero Mercadé, Marqués,
& Carmona, 2010; Dixon, 2000; Dixon & Rilett, 2002; Kwon & Varaiya, 2005;
Mishalani, Coifman, & Gopalakrishna, 2002).
Many improvements have been made with respect to KF-based methods. Zhou
and Mahmassani (2006) addressed the limitation of the KF method with respect to
auto-regression assumption (as previously used by Ashok (1996)) that considers OD
flows to be stationary. This assumption is not true, because most of the time, prevailing
demand patterns are different from that of regular patterns.
Most efforts have not focussed on extending the bi-level framework to dynamic
OD space due to lack of analytical dynamic traffic assignment models (Ashok & Ben-
Akiva, 2000). KF-based techniques are predominantly based on a fixed assignment
matrix; however, the possibility of obtaining equilibrium assignment from traffic
simulation models has encouraged some researchers to extend the bi-level framework
to dynamic OD space (Cipriani, Florian, Mahut, & Nigro, 2010, 2011; Lu, Rao, Wu,
Guo, & Xia, 2015; Tavana, 2001; Zhou & Mahmassani, 2007; Zhu, 2007).
Until the year 2010, most studies were limited to linear networks only because
it relaxed the additional complexity of route-choice dimension in the dynamic
assignment formulation and the associated computational burden. However, some
researchers proposed alternative methods that are computationally efficient for larger
networks. To exploit the sparsity of dynamic assignment matrix (Bierlaire & Crittin
2004) developed LSQR algorithm as an alternative to the KF method for large-scale
problems. Verbas, Mahmassani, and Zhang (2011) tried to solve the non-linear
problem in the upper level bi-level formulation through robust optimisation software-
KNITRO. Barceló, Montero, Bullejos, Linares, and Serch (2013) suggested a subset
Chapter 2: Literature Review 44
of the most likely used paths obtained as a result of dynamic user equilibrium to relax
the complexity of dynamic assignment. Their work demonstrated further improvement
in computational efficiency because the Bluetooth-based travel time facilitated a linear
Kalman filter instead of a non-linear Kalman filter. Djukic (2014) proposed the
principal component analysis method to reduce the dimensionality problem of
dynamic OD estimation. Frederix, Viti, and Tampère (2013) tried a different approach
for solving large-scale network problems, and proposed a hierarchy-based approach
where the OD demand estimation is performed on each level separately. The outputs
of higher-level estimation are used as inputs for OD demand estimation at the lower
level.
2.2.4 Quasi-Dynamic formulation
There has been a growing interest in estimating OD matrices using quasi-
dynamic conditions. Considering the dimensionality of dynamic OD matrix estimation
problem, Cascetta, Papola, Marzano, Simonelli, and Vitiello (2013) proposed the
quasi-dynamic approach to minimise the imbalance between knowns (link flows and
OD flows mapping equations) and unknowns (OD flows) by using quasi-dynamic-
based generalised least squares to estimate time-dependent OD matrices. A quasi-
dynamic condition refers to a state that lies in-between static and dynamic conditions
of traffic flows. Traffic dynamics are known to change within-the-day and day-to-day
due to different travel-activity patterns. However, the quasi-dynamic assumption states
that for a larger reference period (say whole day), the distribution shares of OD flows
remain constant even though the number of origin flows change, as shown in Equation
(23):
(23)
In Equation (23), the OD flow between oth origin and dth destination during time-
slice, t is given by ; the trips generated from oth origin during t is ; and the
proportion of trips generated from oth origin to dth destination during t is given by
. The quasi-dynamic assumption states that the factor affecting changes
inherently during the larger time-period (say within a day); however, the factors
affecting are relatively constant.
Experiments by Cascetta, et al. (2013) showed that the quasi-dynamic
assumption yielded better results compared to the simultaneous estimator of dynamic
Chapter 2: Literature Review 45
OD estimation. Dynamic OD matrix estimation/prediction algorithms, such as a
Kalman filter, performed better when the prior time-dependent OD matrices were
estimated using quasi-dynamic approach instead of simultaneous estimators.
Aggregating time-dependent OD matrices estimated based on quasi-dynamic approach
seemed to provide better estimates of OD matrices for larger time periods (say for a
peak-period or even daily OD matrix).
While the quasi-dynamic assumption worked well for the off-line OD matrix
estimation, some efforts have also been made to introduce this concept into online
estimation/prediction algorithms such as a Kalman filter. Marzano, Papola, Simonelli,
and Papageorgiou (2018) proposed a quasi-dynamic augmented extended Kalman
filter, with the results showing better improvement compared to both a simultaneous
estimator, as well as the quasi-dynamic-based generalised least squares technique.
Bauer et al. (2018) also extended the quasi-dynamic assumption to traffic
assignment, by assuming that the proportion of path flows generated from an origin on
any time-period of the day remains constant on days of specific category. Equation
(24) demonstrates this assumption at path-flows level.
(24)
Here, has the same meaning as state previously; represents the
proportion of passing through path kth to reach dth destination ; and represents
the path flows between oth origin and dth destination through kth path.
2.3 THE SOLUTION ALGORITHMS
Many solution algorithms have been proposed to solve the OD matrix estimation
problem (Antoniou, et al., 2016). It is hard to justify which algorithm is better than
another due to the unavailability of ground truth. Prior to direct implementation of any
practical applications the database of the OD matrices needs to be developed through
off-line estimation techniques. The solution algorithms used for off-line OD matrix
estimation can be broadly categorised into four types: fixed-point approaches (Cascetta
& Postorino, 2001), gradient-based techniques (Spiess, 1990), stochastic-optimisation
methods (finite different stochastic approximations, see (Spall, 1992)); and
evolutionary-algorithms (genetic algorithms, see (Kim, et al., 2001)).
Chapter 2: Literature Review 46
Among the above-mentioned methods, gradient-based techniques are quite
popular due to their computational efficiency for large-scale networks. There have
been many improvements suggested in gradient-descent frameworks, such as iterative
estimation-assignment (Yang, et al., 1992), constrained descent method (Florian &
Chen, 1995), mini-batch gradient descent (Li, Zhang, Chen, & Smola, 2014), extended
gradient method (Shafiei, Nazemi, & Seyedabrishami, 2015), projected gradient
method (Lundgren & Peterson, 2008b), and the stochastic gradient method (Masip,
Djukic, Breen, & Casas, 2018).
Gradient-based techniques fundamentally assume that the assignment is locally
constant. However due to a non-linear relationship between the assignment matrix and
OD matrix, it is not possible to directly compute the gradient of link flows deviation.
Several heuristics have been proposed to approximate the gradient, such as the
sensitivity analysis based method (Yang, 1995), simultaneous perturbation stochastic
approximation (SPSA) path search methods (Cipriani, et al., 2011), adaptive SPSA
(Cantelmo, Cipriani, Gemma, & Nigro, 2014), weighted SPSA (W-SPSA) (Lu, Xu,
Antoniou, & Ben-Akiva, 2015), and cluster-wise SPSA (c-SPSA) (Tympakianaki,
Koutsopoulos, & Jenelius, 2018). However, the limitations of the SPSA-based method
is that it is based on several algorithmic parameters and due to this, larger the size of
the network, higher is the computational time (Bullejos, Barceló Bugeda, & Montero
Mercadé, 2014); it is sensitive to the selection of initial parameter values and
adjustment of these values is a cumbersome task (Tympakianaki, et al., 2018); it can
be trapped in the local minima; and the approximated gradient values can be very noisy
leading to the convergence and stability issues (Tympakianaki, et al., 2018).
Addressing the challenges of SPSA based techniques, recently metamodels have been
proposed that approximate the assignment during every iteration and minimise the
objective function through gradient computations (Osorio, 2019).
2.4 OD MATRIX STRUCTURAL INFORMATION
Because traffic counts-based OD estimation is an under-determined problem,
most studies have emphasised methods to bind the “structure” of the OD matrix during
the OD estimation process. The structural knowledge related to OD flows has either
been used as constraints or through deviations from target OD matrix in the objective
function. For instance, Van Zuylen (1978) used the Brillouin information measure to
formulate the total OD information contained in the observed link flows and estimated
Chapter 2: Literature Review 47
the OD matrix by minimising this information. To reduce the under-determinacy
problem, Gur (1980a) proposed using a target trip matrix to provide additional
information on the structure of OD matrix. Willumsen (1984a) proposed a method to
determine whether the structure of an estimated OD matrix is close to that of true OD.
Based on the ratios of link flows, he introduced scale factor ( ) as a proxy to assess
the structure of estimated OD matrix. The formulation is shown in Equation (25).
(25)
In the equation (25), and are the estimated and observed link flows on lth
link on a network with L number of total links. Furthermore, Yang et al. (1992) used
a correlation coefficient, between the observed and estimated link flows in
addition to the scale factor ( ) to explain the structural degradation of OD flows. For
example, X and are the estimated and target OD matrices respectively. Four different
cases are possible from the combinations of and as follows:
Case-1: If and , then both X and are
structurally similar to each other.
Case-2: If and is small, then X and are
structurally different, with random variations.
Case-3: If or and , then X and
have the same structure; however, the total demand in X is greater or
lower compared to .
Case-4: If and is small, then X and are
structurally different at a larger random scale.
However, the limitation of this approach is that the statistical indicators,
and are comparing link flows to interpret the structural variation in OD
matrices, which is generally not true due to one to many relationships between link
flows and OD flows.
Some other efforts have been used to account for the additional structural
information. Bierlaire and Toint (1995) proposed matrix estimation using the structure
explicitly method to exploit the structural information from parking surveys to
Chapter 2: Literature Review 48
improve the structure of estimated OD matrices. Kim et al. (2001) stated that if the
structure of the target OD matrix is different from the structure of true OD matrix, then
the bi-level solution might meet a perfect Stackelberg condition. They proposed the
OD matrix structure as the ratio of OD flows to origin flows, and used it as a constraint
to preserve the structure of OD demand during OD matrix optimisation. Stathopoulos
and Tsekeris (2005) emphasised the necessity to incorporate information degradation
of demand patterns in the general OD estimation framework to account for short-term
and long-term traffic dynamics. Djukic, et al. (2013) emphasised the structural
correlations that exist between the OD pairs and the significance of accounting for
them when comparing OD matrices.
The importance of OD matrix structural information has been widely
acknowledged in the dynamic OD estimation/prediction problem. To incorporate the
structural information (spatial and temporal trip making patterns) of OD demand,
Ashok (1996) formulated state space model in terms of deviations of OD flows from
historical estimates. The structural deviations of OD flows preserve the structural
integrity and capture the uncertainties that can occur due to conditions, such as severe
weather conditions, special events, and travellers’ reactions to information
management measures, etc. Following Ashok (1996) work, most dynamic OD
formulations are based only on the deviations of OD flows. Djukic et al. (2013)
discussed the importance of considering the structural differences between the OD
matrices within a dynamic OD matrix estimation process and as a performance
measure to benchmark various dynamic OD estimation methods.
2.5 STATISTICAL PERFORMANCE MEASURES
The role of statistical performance measures is very significant in OD matrix
estimation. The quality of estimated OD matrices and performance of
estimation/prediction methods is assessed based on the results of statistical
performance measures (Ciuffo & Punzo, 2010; Hollander & Liu, 2008). Some notable
statistical metrics are:
root mean square error (RMSE) is the most common indicator used by
researchers (Ashok & Ben-Akiva, 2002; Barceló Bugeda, et al., 2010; Tamin
& Willumsen, 1989);
Chapter 2: Literature Review 49
normalised root mean square error (RMSN) (Antoniou, Ben-Akiva, &
Koutsopoulos, 2004);
mean square error (MSE) (Cascetta, 1984);
mean absolute error percent (MAE%) (Nanda, 1997)
mean absolute error ratio (MAER) (Kim, Kim, & Rilett, 2005)
mean absolute percent error (MAPE) (Cools, Moons, & Wets, 2010);
global Theil measure (GU) (Barceló, Montero, Bullejos, Linares, et al., 2013);
maximum possible absolute error (MPAE) (Yang, Iida, & Sasaki, 1991);
relative error (RE) (Gan, Yang, & Wong, 2005);
total demand deviation (TDD) (Bera & Rao, 2011);
correlation coefficient ( ) (Yang, et al., 1992); and
R-squared (R2) (Tavassoli, Alsger, Hickman, & Mesbah, 2016a).
For a thorough review on the statistical measures that are widely used in
transport applications see (Ciuffo & Punzo, 2010; Hollander & Liu, 2008).
The formulations of some of the metrics are described in the following equations.
Note that the comparison is made with the target OD matrix, .
(26)
(27)
(28)
(29)
(30)
Chapter 2: Literature Review 50
(31)
The GEH (named after Geoffrey E. Havers) statistic is mostly preferred by
transport practitioners (Lu, Rao, et al., 2015). If 85% of the flow values have a GEH
of less than 5, the model is considered to perform well. Equation (32) represents the
GEH.
(32)
An expression similar to Equation (32) can be used for OD matrices comparison,
as shown in Equation (33).
(33)
The percentage of OD pairs that have a GEH equal to or less than 5 is computed
to indicate the level of proximity between the two OD matrices.
Some authors have used goodness of Theil’s fit (GU) to compare the target OD
flows with estimated flows (Barceló, Montero, Bullejos, Serch, et al., 2013). Theil’s
inequality is popularly used when comparing two time-series. It lies between 0 and 1.
The value of ‘0’ means there is a perfect fit between two-time series and the value of
“1” implies there is a discrepancy. Equation (34) shows the formulation of GU for
comparing OD matrices.
(34)
There has been growing attention paid to the development of statistical
performance measures that can account for intrinsic details of OD matrix estimation
and the “structure” of OD matrices. For instance, (Bierlaire, 2002) proposed a total
demand scale to measure the intrinsic under-determinacy of the OD matrix estimation
problem that arises due to uncertainty in the network topology and assignment. Ruiz
Chapter 2: Literature Review 51
de Villa, Casas, and Breen (2014) extended the Wasserstein metric (popularly used in
mass-transportation problems) to compare the structural differences between OD
matrices by accounting for the network topology. Djukic, et al. (2013) extended the
Structural Similarity index (SSIM), popularly used in the structural comparison of
images to compare OD matrices.
The SSIM is still theoretical in nature and has a few limitations with respect to
OD matrices comparison, which are further discussed in detail in Section 3.2. In
regards to the Wasserstein metric, Ruiz de Villa et al. (2014) mentioned that “One of
the main drawbacks of this method is in computing the Wasserstein distance on large
networks”. Thus, this method is generally considered impractical for large networks.
Another study that focussed on the similarity of OD matrices is the eigenvalue-
based measure (EBM) by Tavassoli, Alsger, Hickman, and Mesbah (2016b). Here, the
similarity of OD matrices is analysed by comparing their corresponding vectors of
eigenvalues. The lower the distance value, the greater the similarity.
(35)
Where, eig (X) and eig ( ) are the vectors containing the eigenvalues of square
matrices, X and , respectively.
The entropy measure (E) measures the similarity between OD matrices (Ros-
Roca, Montero, Schneck, & Barceló, 2018). The formulation for the entropy measure
is shown in Equation (36).
(36)
2.6 INDIRECT/PARTIAL MEASUREMENTS OF OD FLOWS
Limited traffic observations from single data source generally lead to non-unique
solutions and large estimation errors. Thus, there is a great need for effective modelling
approaches that include provisions for measurements from alternative data sources
(Zhou, 2004). The different types of sensor data that are widely used in OD matrix
estimation problems are discussed below (see Figure 2.1) for a diagrammatic
representation of the few sensors that can aid in estimating flows; for example, for OD
pairs with O1, O2 as origins, and D1, D2 as destinations, respectively).
Chapter 2: Literature Review 52
Figure 2.1: Pictorial representation of some of the widely-used sensor types in OD estimation problem
2.6.1 Point sensors
Point sensors are the most used detectors in transport models. These sensors
include inductive loop detectors generating inductive signatures, laser-based detection
systems (for vehicle-length), and video-based vehicle signatures. These technologies
assist with indirectly identifying anonymous vehicles from their physical features.
However, they do not directly assist with OD demand estimation, because they do not
provide any traffic information beyond detection points and are generally limited to
shorter road segments (Kwon & Varaiya, 2005; Mishalani, et al., 2002).
2.6.2 Point to point sensors (AVI data)
Automatic vehicle identification (AVI) data is widely used in traffic control and
management. The sensors that provide AVI data can detect vehicles at multiple
locations in a network. These are generally license plate/tag-based, mobile phone–
based, global positioning systems (GPS), Bluetooth, and Wifi scanners. Vehicle
detections are used in extracting the turning counts (Alibabai & Mahmassani, 2008),
travel times (Barceló, Montero, Bullejos, Linares, et al., 2013), vehicle trajectories
(Michau, Nantes, Chung, Abry, & Borgnat, 2014), and sometimes partial observations
of OD flows (Antoniou, et al., 2006; Dixon & Rilett, 2002). Further details about the
data from some of these sensors are discussed below.
2.6.2.1 License plate/tag reader–based AVI system
These systems generally consist of two CCD cameras located at a distance of 5-
10 kilometres. The AVI camera/e-rag reader that is fixed above the lane captures a still
picture of the license plate/electronically reads the tag of the passing vehicle (Kwon &
Varaiya, 2005). The travel time of the vehicle is generally calculated by detecting the
CellBMS Scanningradius
Toll road
Arterial road
Chapter 2: Literature Review 53
same vehicle (license plate) between two consecutive AVI cameras (Asakura, et al.,
2000). AVI data in combination with traffic counts has been previously used for
dynamic origin-destination matrix estimation (Van Der Zijpp, 1997). The partial
observed trajectories from AVI data is another source of information that can be used
in OD estimation (Kwon & Varaiya, 2005). Zhou and Mahmassani (2006) developed
exploited AVI data without the need to know the market penetration rates.
2.6.2.2 Mobile data
Some recent researchers (Calabrese, Di Lorenzo, Liu, & Ratti, 2011) have
proposed ways to determine information from mobile phone location data as a source
of data for OD demand estimation. Mobile phone operators have access to the locations
of mobile phone devices, and they primarily use this information for management and
billing purposes. The basic geographic unit for mobile phone-based data is referred to
as a “cell”. Mobile phone-based trips are generally expressed as a sequence of cell-
IDs. As the sample of “active” mobile phones is less than the number of idle mobile
phones, another geographical unit – location area (collection of cells) is generally used.
The data associated with location area is called the location area update. The most
popular type of data available from mobile phone-based datasets is call detail records.
The main attributes of call detail records data sets are: the details of the connection
event (call, message or internet), time-stamps of start and end of the event, duration of
the event, the location of the connected tower, and the cell Id. Mobile phone-based
observations have been used in different areas of transport analysis, such as the
division of traffic analysis zones (Dong et al., 2015), identifying trip end locations
(Ahas, Silm, Järv, Saluveer, & Tiru, 2010), OD matrix estimation (Alexander, Jiang,
Murga, & González, 2015), and human mobility patterns (Jiang, Ferreira, & González,
2017).
With respect to OD matrix estimation, mobile-phone based observations have a
few challenges that need to be addressed. First, obtaining mobile phone data is very
sensitive and expensive. Second, the details of mobile phone trajectories seem to be
very coarse (Perera, Bhattacharya, Kulik, & Bailey, 2015). Third, they might not be
able to capture the actual origins and destinations of users (Iqbal, Choudhury, Wang,
& González, 2014).
Chapter 2: Literature Review 54
2.6.2.3 Bluetooth data
Data collection in the field of transportation has become much easier with the
advent of Bluetooth scanners. The quality of the data is as good as licence plate
recognition and video captured data (Blogg, Semler, Hingorani, & Troutbeck, 2010).
Bluetooth data has a higher penetration rate compared to other technologies, such as
GPS, etc. (Gabriel, 2016).
Bluetooth is a key technology for in-car communication and infotainment
systems and has been identified as a complimentary data source for transport
applications, such as travel time/speed estimation (Bhaskar, Qu, & Chung, 2015;
Khoei, Bhaskar, & Chung, 2013; Respati, Bhaskar, Zheng, & Chung, 2017), pedestrian
mobility patterns (Abedi, Bhaskar, & Chung, 2013, 2014; Abedi, Bhaskar, Chung, &
Miska, 2015), trajectories identification (Michau, Nantes, et al., 2017), and OD
demand estimation (Michau, et al., 2016). The validity of Bluetooth OD data has been
confirmed in the past by using data from other sources, such as video and automatic
license plate recognition OD data (Blogg, Semler, Hingorani, & Troutbeck, 2010) and
vehicle tracking using time lapse aerial photography (TLAP) (Chitturi, et al., 2014).
Interested readers can refer to Bhaskar and Chung (2013) for a fundamental
understanding of Bluetooth MAC scanner (BMS) data as complementary transport
data.
2.7 SUMMARY OF LITERATURE REVIEW
In summary, this comprehensive review of the literature identified the following
major research gaps:
1. Most studies have focused on developing formulations and solution
algorithms for improving the quality of OD matrix estimates.
Specifically, these studies adopted bi-level framework for OD
estimation. The focus has also shifted from static to dynamic, and
recently, to quasi-dynamic formulations. However, there has been less
focus on exploiting the higher-dimensions of OD flows; that is, the
structural information of OD matrices that cannot be neglected in either
the OD matrix estimation process or in the formulation of statistical
performance measures.
Chapter 2: Literature Review 55
2. Most studies are entirely dependent on traffic count-based observations
because loop detectors are the dominant source of traffic data. Although
advancements in technology seem to provide additional data sources,
their integration and contribution into the existing transport models
seems to be still challenging.
By addressing these gaps, this study aims to develop statistical methods to
exploit the structural information of OD matrices for the comparison of OD matrices
and develop methods to incorporate the structural knowledge of Bluetooth trips into
the OD matrix estimation process in the forth-coming chapters.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 57
Chapter 3: Development of Statistical
Metrics for the Structural
Comparison of OD Matrices
This chapter begins with a background (Section 3.1); introduces and discusses
the limitations of SSIM (Section 3.2); develops GSSI (Section 3.3); introduces
traditional Levenshtein distance, extends its formulation for the comparison of OD
matrices (NLOD), and compares it with Wasserstein metric (Section 3.4); performs a
sensitivity analysis for the proposed GSSI and NLOD (Section 3.5); and finally
provides summary of the chapter in Section 3.6.
3.1 BACKGROUND
Mathematical formulations of some of the widely used traditional metrics for
comparison of OD matrices were previously discussed in Section 2.5. These metrics
compare the individual cells of OD matrices and compute a single statistic value by
aggregating/averaging the deviation over individual cells. However, they lack the
ability to capture structural information about the matrices. To demonstrate this,
consider an example of comparing OD matrices M1 and M2 with a reference OD
matrix MR (Figure 3.1). Here, M1 is simply 1.1 times MR, and M2 is chosen randomly.
The results of comparing matrices M1 and M2 with MR using traditional metrics (MSE,
RMSE, GU, and MAE) are presented in Table 3.1. The first column of Table 3.1
presents the metrics, and the second and third columns are the values from metrics for
both cases, respectively. When compared with the same reference matrix, visual
representation illustrates that the demand distributions (or structure) of M1 are closer
than that of M2. This is obvious, because matrix M1 is a scaled version (1.1 times) of
the reference matrix. In this example, it is demonstrated that traditional metrics yield
the same results for both cases (Table 3.1) and fail to capture the structural differences
between OD matrices. The importance of structural comparison therefore demands the
need for new metrics in addition to existing traditional ones. Addressing this need, the
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 58
Structural Similarity index (SSIM) is applied in the literature, the details for which are
presented in the following section.
Figure 3.1: Comparison of MR with OD matrices M1 and M2
Table 3.1: Comparison results using the traditional metrics
Traditional
Metrics
Comparison of
(M1, MR)
Comparison of
(M2, MR)
MSE 17370 17370
RMSE 131.8 131.8
GU 0.05 0.05
MAE 0.10 0.10
3.2 STRUCTURAL SIMILARITY (SSIM) INDEX
The SSIM is borrowed from the field of image processing. Wang et al. (2004)
discussed the limitations of traditional metrics to capture structural differences in
images. They proposed the SSIM index as a quantitative measure to compare the
quality of two natural images and observed that statistical measures such as MSE may
fail to measure the structural degradation of one image with respect to another. As
shown in Figure 3.2a, two images estimated from two different algorithms, namely
gradient ascent and gradient descent, can each have the same MSE of 2,500 but
different SSIM values of 0.9337 and -0.5411, respectively.
Djukic, et al. (2013) applied the SSIM rationale in the context of an OD matrix
and demonstrated that two OD matrices can have same MSE value but different SSIM
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 59
values. For instance, Figure 3.2b shows an MSE of 69 each, while the SSIM values
are 0.8724 and 0.9702.
Figure 3.2: (a) Comparison of Images (source Wang et al., 2004) vs (b) comparison of OD matrices (source Djukic et al., 2013)
The formulation for local SSIM, as provided by Djukic, et al. (2013), is based
on the product of three individual formulations (Equations (37, 37a and 37b) related
to the mean, standard deviations, and coefficient correlations between the groups of
OD pairs.
(37)
(37a)
(37b)
;
> 0; (37c)
Assuming and C3=C2/2
; [-1<=SSIM<=1]; (37d)
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 60
; [-1<=MSSIM<=1] (37e)
Where,
and represent the two OD matrices to be compared; while and
represent the group of OD pairs within th local window in both matrices. The concept
of local windows is further explained in Section 3.2.1.
compares the mean values ( ) of the group of OD pairs in
both matrices;
compares the standard deviations ( of the group of OD pairs
in both matrices;
compares the structure by computing correlation between the
normalised group of OD pairs in both matrices. Normalised and with unit
standard deviation and zero mean are equal to and , respectively;
are constants to stabilise the result when either the mean or
standard deviation is close to zero. is generally assumed to be . Previous
studies have suggested values of and for and , respectively (Pollard,
Taylor, van Vuren, & MacDonald, 2013). For the analysis conducted in this research,
the OD values in the SSIM window were not all zero; hence, both and were
assumed to be zero.
The parameters are used to adjust relative importance of mean,
standard deviation and structural components, respectively. Generally, they are
assumed to be equal to 1.
is the structural similarity of the local windows from both
matrices.
is the overall similarity of OD matrices, X and , computed by
taking the average of the SSIM values of number of local windows.
The range of values for SSIM or MSSIM can be between -1 and 1. The value of
1 implies that matrices are the same, while the reverse is true when value is -1.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 61
3.2.1 Local sliding window
The local window is generally a square box of size far less than that of OD
matrix. It is often referred as “local sliding window” because the traditional SSIM
computes statistics on the local window (consisting group of pixels or OD pairs) that
slides pixel by pixel or cell by cell over the entire image or OD matrix. The concept of
sliding was originally used for the comparison of images where it would allow SSIM
to compute local statistical characteristics so that local image distortions were better
accounted for (Brooks, Zhao, & Pappas, 2008). For ease of explanation, consider the
example presented in Figure 3.3. Here, two 4 × 4 OD matrices, X and Y, are presented
in columns one and two, respectively. These two OD matrices are compared using
SSIM. The local sliding window of 2 × 2 sub-matrix is considered and represented as
coloured cells. This window slides cell by cell over the entire OD matrix, and in the
current example, results in 9 matrix comparison pairs, as illustrated in Figure 3.3 5.3.
The local SSIM computes the structural similarity between the sub-matrices
corresponding to the windows from both OD matrices. The final SSIM value,
represented as mean SSIM (MSSIM), is computed by averaging all local SSIM values
computed for all sliding windows. In the example, the SSIM value for local window
in Figure 3.3a is 0.5963 and MSSIM over all local 9 SSIMs is 0.6777.
Figure 3.3: An example of sliding window for SSIM calculation.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 62
The differences between the structural comparison in images and OD matrices
include:
In images, the nearby pixels are correlated with respect to the contrast
and other features. However, in an OD matrix, the correlations between
the OD pairs depend on many factors. Generally, OD pairs sharing
similar activities, trip attractions, trip productions, distances, travel cost
or similar geographical locations, etc., are correlated. According to
Djukic (2014) correlations between OD pairs are reflected in their
demand volumes (especially if volumes are high) and by matrix
reordering, correlated OD pairs can lie in the same neighbourhood; that
is, all high volume OD pairs on one side and remaining on the other side.
Djukic (2014) proposed to re-order the OD matrix (i.e., sorting each row
of OD matrix in the order of OD pair volumes). However, if the
arrangement of zonal IDs in both matrices are different upon re-ordering,
then reordering is avoided.
The cell of an OD matrix is equivalent to the pixel of an image. However,
the pixels values range between 0 and 255 for greyscale images, but the
range of OD flows is large, and it depends on many factors such as
activities, distance etc.
Although the formulation of SSIM seems to be holistic, its existing application
still has the following shortcomings.
Firstly, SSIM is sensitive to the size of the local window, and as such, there is
no clear consensus on the final MSSIM value. To circumvent this ambiguity, Djukic
(2014) suggested computing the SSIM over the entire OD matrix without using any
local window. However, doing so will result in a statistical estimation that is less
sensitive to structural changes within the OD matrix. According to law of large
numbers, the variance of the sample tends to decrease if the sample size increases.
Since larger window dimensions imply a greater number of OD pairs to be compared,
the variance (distortion) and covariance (correlation distortion) parameters that capture
structural changes within and between OD matrices should be reduced. In other words,
SSIM is less sensitive to correlation distortions when the covariance is captured for
larger window sizes.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 63
To demonstrate the sensitivity of SSIM towards window size, consider a mean
SSIM (MSSIM) value computed using different window sizes (3×3 to 20×20) for
Monday and Sunday, and Monday and Tuesday OD matrices pairs constructed from
the BCC data. Figure 5.4 presents the results, where the blue line is for Monday and
Sunday and the orange line is for the Monday and Tuesday comparison. The x-axis
represents the size of the local window and y-axis shows the MSSIM value. The order
of OD pairs is the same in the matrices for Sunday, Monday, and Tuesday. As the size
of sliding window increases, the sensitivity of SSIM towards subtle differences within
the OD matrix decreases. The MSSIM values increase as the sliding window size
increases. Similar results were observed by Brooks et al. (2008) when comparing
images using different window sizes. The rate of increment of MSSIM values was less
for the Monday and Tuesday pair compared to the Monday and Sunday pair. This is
due to similar travel patterns between Monday and Tuesday (both working days) and
less similar patterns between the Monday and Sunday pair. There is no clear consensus
reported in the literature regarding the level of acceptability of the sliding window size
and the resulting SSIM values.
Figure 3.4: Sensitivity of MSSIM towards local window size
Second, the local SSIM value computed on a group of OD pairs does not have
any physical meaning or significance attached to it unless they are correlated. The
group of OD pairs sharing similar structural properties or travel patterns are generally
correlated. Djukic (2014) tried to capture these correlations among the OD pairs from
their flow values (especially if volumes are high) by matrix reordering (i.e. sorting
each row of OD matrix in the order of OD pair volumes). However, the structural
properties of OD matrix include many other underlying factors such as the distribution
(3X3) (6X6) (8X8) (15X15) (20X20)Mon and Sun 0.7337 0.7892 0.807 0.8164 0.8292Mon and Tue 0.9939 0.9975 0.9985 0.9986 0.9986
0.7
0.75
0.8
0.85
0.9
0.95
1
MSS
IM va
lues
Size of the sliding window
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 64
of trips, geographical integrity, network topology etc., if accounted, could capture
better OD structural information.
To this end, this study develops mean geographical window-based SSIM (GSSI)
as an extension to Djukic (2014)’s SSIM approach. It is further discussed in the
following section.
3.3 MEAN GEOGRAPHICAL WINDOW-BASED SSIM (GSSI)
The application of the SSIM was undertaken in this study by first arranging the
origins and destinations of the OD matrix in order of geographical similarity, and
subsequently defining the windows for a SSIM analysis consistent with the
geographical boundaries. Here, the window size varied with the geographical
boundaries considered in the rearranged OD matrix. This is different from the
traditional SSIM application, where the size of the window is fixed. The window
associated with the geographical boundary is termed as a geographical window and
the SSIM computed over the geographical windows is termed as geographical window
based SSIM, hereafter. This process is explained with the help of an example from the
Brisbane City Council (BCC), as detailed below.
The proposed geographical window has a physical significance associated with
it, to ensure geographical integrity and capture spatial correlation by computing
statistics on all lower zonal level OD pairs belonging to the same higher zonal level
OD pair. For instance, the higher zonal level is SA4, with SA3 as the lower level for
the BCC region. The size and shape of a geographical window is defined by the
number of SA3 zonal pairs present within the respective SA4 zonal pair. Therefore, in
this approach, the local geographical window need not always be a square matrix.
Figure 3.5 shows that each cell of the OD matrix represents a SA3 level OD
pair. Here, the OD matrix is rearranged so that the SA3 level origins (rows) and
destination (columns) can be grouped into respective SA4 level. For instance, SA3 (1)
to SA3 (j) from SA4 (1) level are arranged together. The SA4 level boundaries now
define the geographical SSIM windows. The yellow shaded region represents a
window covering OD pairs from SA4 (1) to SA4 (2).
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 65
Figure 3.5: An example to illustrate the proposed geographical window-based approach
Figure 3.6 demonstrates the application of the SA4 based geographical windows
for comparing SA3 (20 × 20) OD matrices of a Monday (Figure 3.6a) and a Sunday
(Figure 3.6b). The SA4 zones used in designing geographical windows are: Brisbane
East, Brisbane North, Brisbane South, Brisbane West, and Brisbane Inner. For
example, consider the geographical window of SA4 OD pair Brisbane East and
Brisbane North, which consists of SA3 OD pairs 30,101 to 30,201, 30,202, 30,203,
and 30,204; 30,103 to 30,201, 30,202, 30,203, and 30,204. These SA3 OD pairs are
geographically correlated because they belong to same SA4 origin (Brisbane East) and
SA4 destination (Brisbane North). Here, Brisbane East and Brisbane North consist of
2 and 4 lower level (SA3) zones, respectively. The size of corresponding local
geographical window is 2 × 4.
The local SSIM values are calculated for all geographical windows exclusively,
and the mean geographical window based SSIM (GSSI) was the average of all local
SSIM values. In the above example, the total number of geographical windows
considered is equal to the number of higher order OD pairs, which is 25. GSSI for
Sunday-Monday matrices pair is 0.7231. See Table 3.2 for the local geographical
window based SSIM and GSSI.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 66
(a)
(b)
Figure 3.6: Splitting (a) Monday and (b) Sunday OD matrices into geographical (SA4) windows
Note that the afore-mentioned example is explained from the perspective of the
statistical zones used in Australia. However, the proposed geographical window-based
approach holds good for any other study region with its own hierarchical zonal
structure. Although the method demonstrated geographical windows using SA4 zones
on SA3 OD pairs, any combination of higher and lower level OD pairs can be used for
the same purpose; for instance, SA3 OD pairs can be used as higher level geographical
windows for SA1 OD pairs, etc. The geographical window based SSIM approach has
the following advantages over traditional SSIM.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 67
3.3.1 Structural comparison of local travel patterns
While the GSSI value provides the overall structural comparison, the local
geographical window based SSIM value has its own practical significance. For
instance, it provides opportunities to compare the local travel demand distribution
(travel patterns) between different suburbs of a region that a sliding local window is
not capable of determining. Figure 3.7 illustrated that Sunday travel patterns differed
majorly for the suburb pair Brisbane South to Brisbane North. This is reflected by a
local SSIM value of 0.4653 (see Figure 3.7 (left) and the bold value in Table 3.2). On
the other hand, for another suburb pair Brisbane South to Brisbane West the Sunday
travel patterns are similar (if not exact) to that of Monday, with a SSIM value of 0.8037
(see Figure 3.7 (right) and the bold value in Table 3.2).
Figure 3.7: Insights into local travel patterns using geographical local window: (left) Brisbane South to Brisbane North and (right) Brisbane South to Brisbane West
Table 3.2: GSSI and local SSIM values: Monday vs Sunday B-OD matrices
Brisbane
East
Brisbane
North
Brisbane
South
Brisbane
West
Brisbane
Inner
Brisbane East 0.8319 0.2437 0.7650 0.9517 0.7755
Brisbane North 0.3311 0.7353 0.4034 0.7378 0.6299
Brisbane South 0.7771 0.4653 0.8062 0.8037 0.8117
Brisbane West 0.8340 0.7754 0.7562 0.8884 0.8165
Brisbane Inner 0.7716 0.6265 0.8257 0.8385 0.8750
GSSI 0.7231
30201 30202 30203 30204 30401 30402 30403 3040430301 26 54 206 122 30301 23 371 117 4830302 74 178 312 93 30302 135 65 594 22830303 42 54 195 85 30303 51 37 231 10630304 55 104 238 76 30304 71 25 443 16330305 32 40 219 65 30305 184 9 505 6030306 11 25 100 36 30306 38 8 90 26
30201 30202 30203 30204 30401 30402 30403 3040430301 15 32 50 63 30301 16 289 82 4530302 46 163 163 79 30302 86 26 473 21830303 11 33 56 53 30303 44 25 156 7530304 6 36 76 35 30304 54 15 193 3430305 8 24 43 18 30305 102 7 263 2430306 6 14 36 24 30306 31 4 75 21
Origin Dest
Origin Dest Brisbane North Brisbane West
Local SSIM=0.4653 Local SSIM=0.8037
Origin Dest
Brisbane South
Origin Dest
Brisbane South
SUNDAY SUNDAY
MONDAY MONDAY
Brisbane North
Brisbane South
Brisbane West
Brisbane South
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 68
3.3.2 Geographical window vs sliding window
The size of the geographical window is defined by the size of the SA4 suburb
(i.e., the number of SA3 OD pairs present in a SA4 OD pair). Thus, the local window
has a physical meaning associated with it, since it takes geographical integrity into
account through physical SA4 boundaries. Regarding size, the proposed geographical
windows are not of fixed dimensions. They are different sizes, such as 2 x 2, 2 x 4, 4
x 4, 6 x 6, etc., as shown in Figure 3.6. However, the GSSI values so computed are
proven to be equivalent to sliding local windows of smaller dimensions, as explained
below.
The sliding window equivalence of geographical window is demonstrated in
Figure 3.8 for weekends and Figure 3.9 for weekdays. Figure 3.8 illustrates the
comparison between the Monday OD matrix, with 40 OD matrices, from both
Saturdays (Figure 3.8a) and Sundays (Figure 3.8b). A similar analysis with nearly 45
OD matrices from Tuesday, Wednesday, Thursday, and Friday is illustrated in Figure
3.9. The GSSI value is shown to be equivalent to that of a 2 × 2 sliding window for
weekends and to that of a 3 × 3 sliding window for weekdays. Both Figure 3.8 and
Figure 3.9 demonstrate 12 different plots (11 of which correspond to sliding windows
of sizes ranging from 2 x 2 to 20 x 20, and one is based on GSSI). For each plot, the
x-axis corresponds to different daily OD matrices and y-axis reflects GSSI values.
(a) (b)
Figure 3.8: GSSI vs sliding windows based MSSIM for weekends
(a) (b)
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1 5 9 13 17 21 26 30 34 38
MG
eoS
SIM
OD matrices
Saturdays vs Typical Monday
2X23X34X46X68X810X1012X1214X1416X1618X1820X20GeoSSIM 0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1 5 9 13 17 21 25 30 34 39
MG
eoSS
IM
OD matrices
Sundays vs Typical Monday
2X23X34X46X68X810X1012X1214X1416X1618X1820X20GeoSSIM
0.94
0.95
0.96
0.97
0.98
0.99
1
1 5 9 13 17 21 25 29 35 39 43
MG
eoSS
IM
OD matrices
Tuesdays vs Typical Monday
2X2
3X3
4X4
6X6
8X8
10X10
12X12
14X14
16X16
18X18
20X20
GeoSSIM 0.94
0.95
0.96
0.97
0.98
0.99
1
1 5 10 14 18 22 26 30 36 40 44
MG
eoSS
IM
OD matrices
Wednesdays vs Typical Monday
2X2
3X3
4X4
6X6
8X8
10X10
12X12
14X14
16X16
18X18
20X20
GeoSSIM
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 69
(c) (d)
Figure 3.9: GSSI vs sliding windows based MSSIM for weekdays
3.3.3 Computational efficiency
Computationally, GSSI was proven to be 10-11 times more effective as
compared to MSSIM computed using sliding window of size 2 × 2. The test was
performed on a Dell computer with Intel(R) Core(TM) i7-4770 CPU, 16GB RAM
(3.40GHz). Figure 3.10 illustrates that the computational time of GSSI required to
compare 415 OD matrices with Monday OD matrix (Figure 3.6a) was 3.92 seconds,
and that of a 2 x 2 sliding window based MSSIM was 39.5 seconds. This is because
the comparison of 20 × 20-dimension OD matrices via 2 × 2 sliding window had to be
performed (20-1) × (20-1) = 361 times. On the other hand, GSSI was an average value
of all of the local SSIM values computed 25 times.
Figure 3.10: Comparison of computational costs: Sliding windows based SSIM vs SSIM
3.4 LEVENSHTEIN DISTANCE
The distribution of the origin flows to different destinations provides insights
into the structural knowledge of travel patterns. For example, the preference of
destinations could differ on different types of days, such as the choice of destinations
0.94
0.95
0.96
0.97
0.98
0.99
1
1 5 9 13 17 21 25 29 34 38 42 46
MG
eoSS
IM
OD matrices
Thursdays vs Typical Monday
2X2
3X3
4X4
6X6
8X8
10X10
12X12
14X14
16X16
18X18
20X20
GeoSSIM 0.94
0.95
0.96
0.97
0.98
0.99
1
1 5 9 13 17 21 25 29 35 39 43
MG
eoSS
IM
OD matrices
Fridays vs Typical Monday
2X2
3X3
4X4
6X6
8X8
10X10
12X12
14X14
16X16
18X18
20X20
GeoSSIM
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 70
during Mondays differing compared to that during a Sunday. This is due to different
activities and their schedules during both days. Even if destination choices are the same
during both days, the number of trips could differ. This implies that the structure of
traffic flows differs if destination choices and the number of trips differ between the
same set of OD pairs. OD estimation problem is another example, where target and
estimated OD matrices hardly differ in the structure (order of destination choices).
Comparing OD matrices from this perspective requires a statistical metric that
can exploit this additional structural information. For this purpose, an extended
traditional Levenshtein distance (details provided in Section 3.4.1) is proposed as a
new approach (presented in Section 3.4.2) to suit its applicability for the structural
comparison of OD matrices.
3.4.1 Traditional Levenshtein distance
Levenshtein distance, developed by Levenshtein (1966), is a measure of
proximity between two strings, mainly applied to compare sequences in the linguistics
domain, such as plagiarism detection and speech recognition, and in molecular biology
for comparing sequences of macro molecules, etc. For transport applications, the
metric is used to compare license plates (Oliveira-Neto, Han, & Jeong, 2012) and
cluster activity-travel patterns (Zhang, Kang, Axhausen, & Kwon, 2018).
The Levenshtein distance calculates the least expensive set of insertions,
deletions, or substitutions required to transform one string into another. For example,
when comparing two strings, such as “MONDAY” and “SATURDAY”, one of the
optimum ways is to insert the letters “S” and “A” and substitute “M”, “O” and “N”
with “T”, “U” and “R”, respectively leading towards a generalised Levenshtein
distance (GLD) of 5 (assuming a unit distance for each operation), as shown in Figure
3.13.
Figure 3.11: Example to demonstrate Generalised Levenshtein Distance
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 71
To understand the GLD technique and its formulation, in the following example,
X represents any string expressed as X = where, is the ith character of
X. The substring of X is represented as that includes characters from to where
1 ≤ i ≤ j ≤ q. While its length is defined as =j-i+1, it is termed as null string (ε)
if =0. Any general edit operation for a pair of characters (a, b) is expressed as
.
If string X is the result of the operation to string Y, then it can be written
as Y X via . The notations for the three operations are expressed as follows:
Insertion: if ;
Deletion: if ; and
Substitution: if ; a and b
If S is defined to = , as the sequence of edit operations to
transform Y X and then the cost associated with each edit operation as
. The GLD is the minimum total cost required to transform Y to X (see
Equation (38).
GLD (X, Y) = (38)
The normalised Levenshtein distance (NLD) is the GLD normalised by the sum
of the lengths of two strings (Equation (39). This metric always lies between 0 and 1
(Yujian & Bo, 2007).
NLD (X, Y) = (39)
Algorithm 1 presents the pseudo code for computing GLD and NLD for two
strings X and Y, where X =x1…xq and Y = y1…yp (Heeringa, 2004). The lengths of
strings X and Y are q and p, respectively. For ease of explanation, the matrix
demonstration of Algorithm 1 is given in Figure 3.12. The computation of GLD via
Algorithm 1 for two strings presented in Table 3.3 is illustrated in Figure 3.13.
Table 3.3: Algorithm 1 for Normalised Levenshtein distance for strings comparison (see Figure 3.12)
Create an empty matrix “K” of size of size (p+1) *(q+1), where the row and
column headers correspond to character of the string Y and X, respectively.
Assign values 0....q and 0….p to the first row and first column, respectively
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 72
for j = 1 to q
for i= 1 to p
Estimate cost as
Set the cell K (i, j) = min (K (i-1, j) + 1, K (i,j-1) + 1, K (i-1, j-1) + Ci, j ) Where:
o K (i-1, j) + 1 represents the cell value immediately above the current
cell plus 1
o K (i, j-1) + 1 represents the cell value immediately to the left of
current cell plus 1
o K (i-1, j-1) + represents the cell value immediately in diagonal
above and to the left of current cell plus the cost
The GLD is the value of the cell K (p+1, q+1) and the NLD =
The explanation to the above pseudo code in terms of edit operations is shown
with a matrix demonstration in Figure 3.12. Here, we can see that there are multiple
paths (i.e. different combination of arrows) possible to arrive at the final K(p+1,q+1).
Each path is a combination of editing operations represented as the following moves
on the matrix grid: downward movement along the diagonal is for substitution
operations, eastward movement is for deletion operation, and vertical downward
movement is for insertion operation (Oliveira-Neto, et al. (2012)).
Figure 3.12: Matrix demonstration of traditional Levenshtein approach (Algorithm 1)
x 1 x 2 . x j-1 x j . . x q
0 1 2 . j-1 j . . q
y 1 1 K(2,2)
y 2 2
. .
y i-1 i-1 K(i-1,j-1) K(i-1,j)
y i i K(i,j-1)
K(i,j) = Min {K(i-1,j)+1 ,K(i,j-1)+1, K(i-1,j-1)+C i,j }
. .
. .
y p p K(p+1,q+1)
Stri
ng Y
String X
Strin
g Y
String XMatrix of size (p+1,q+1)
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 73
In the traditional Levenshtein approach, the numbering of rows and columns
of matrix (K) commence with “0”. This is done to facilitate the comparison of the first
character from both strings X and Y and store the value in K (2, 2) (see Figure 3.12).
The comparison is made by traversing the matrix row by row and then column wise
until all characters in both strings are compared. Because the overall comparison of all
characters ends at the last cell of the matrix, K (p+1, q+1) is chosen as the GLD value.
Figure 3.13: Comparison of strings “Monday” and “Saturday” using GLD
In literature, the use of Levenshtein distance for transport applications is
relatively scarce. Oliveira-Neto, et al. (2012), for instance, applied this technique to
compare license plates. Here, the sequence of characters on the license plates observed
at upstream and downstream stations were compared. Zhang, et al. (2018)) applied
Levenshtein technique to compare the sequences of trip purposes and cluster activity-
travel patterns. Other researchers have used similar techniques (such as sequence
alignment method (SAM)) to compare any two activity-travel patterns (as by
Allahviranloo and Recker (2015)); and sequence of trips (as by Crawford, Watling,
and Connors (2018)). The commonality among these studies is that they were similar
to comparison of one-dimensional strings with unit cost for each operation. However,
OD matrices are two-dimensional arrays consisting of OD flows between different
origin and destination pairs, which means direct application of such traditional
techniques is not possible. In light of this, the following section proposes a detailed
methodology to extend the applicability of traditional Levenshtein distance for the
structural comparison of OD matrices.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 74
3.4.2 Proposed Levenshtein distance for structural comparison of OD matrices
As discussed in the previous section, the Levenshtein distance is an effective
metric to identify differences in the order/arrangement of any string. For the
applicability of Levenshtein distance on OD matrices comparison we propose to:
a) Consider each row of an OD matrix independently. The values in each row
corresponds to the flow from an origin to different destinations. For a given
origin we define a ‘string’, where each character is a destination ID arranged
in the descending order of OD flows and is referred as ‘sorted row’. To
compare the structure of two OD matrices, we compare the order of destination
IDs in each sorted row of the OD matrix.
b) Include OD flows in the formulation of Levenshtein distance, the details for
which are presented later.
Hereon, the proposed modified approach is termed as Levenshtein distance for OD
matrices. Before describing the proposed formulation, let us consider an example as
shown in Figure 3.14a, where two OD matrices X (reference matrix) and Y (query
matrix), each of dimensions M * M, are to be compared. Here, the origin IDs are
expressed as O1, O2, O3 and O4 and destination IDs are expressed as N, E, W and S
(thus, M=4 in this example). In Figure 3.14b, the rows of each matrix are sorted
individually in descending order of their OD volumes. For instance, for origin O1
(row-1) of matrix Y, the sequence of destinations in descending order of demand is S,
W, N and E with 16, 12, 10 and 9 trips, respectively (refer in Figure 3.14b for
matrix Y).
Figure 3.14: Example to demonstrate Levenshtein distance application for OD matrices comparison
N E W S N E W SO1 3 4 6 10 O1 10 9 12 16O2 7 4 5 11 O2 17 10 13 11O3 12 8 5 6 O3 11 14 12 18O4 13 7 9 6 O4 12 13 19 15
Dest Origin
(Dest., Trips) Choice 1
(Dest., Trips) Choice 2
(Dest., Trips) Choice 3
(Dest., Trips) Choice 4
Dest Origin
(Dest., Trips) Choice 1
(Dest., Trips) Choice 2
(Dest., Trips) Choice 3
(Dest., Trips) Choice 4
O1 (S,10) (W,6) (E,4) (N,3) O1 (S,16) (W,12) (N,10) (E,9)O2 (S,11) (N,7) (W,5) (E,4) O2 (N,17) (W,13) (S,11) (E,10)O3 (N,12) (E,8) (S,6) (W,5) O3 (S,18) (E,14) (W,12) (N,11)O4 (N,13) (W,9) (E,7) (S,6) O4 (W,19) (S,15) (E,13) (N,12)
b) Row sorted Reference (left) and Query matrices (right)
(Y) Query Matrix
a) Original OD matrices: Reference(X) and Query(Y)
(X) Reference Matrix
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 75
For OD matrix Y, the sorted set of destination IDs and the corresponding
demand from nth origin is expressed as = ( ) =
[ ]. Here, and are the ith preferred
destination and its corresponding demand value, respectively from nth origin of Y.
Similarly, we express = ( ) for matrix X. The null pair is represented as (ε,
0). Length of the sets, ( ) and ( ) is each. If ) is the result of any
edit operations to ( ), then it can be written as ).
3.4.2.1 Proposed edit operations
As compared to the traditional Levenshtein approach, the edit operations in the
proposed Levenshtein distance for OD matrices is different in the following ways:
a) We compute cost in each of the edit operations in terms of flows because OD
demand is another attribute besides the destination IDs.
b) Destination IDs in both the OD matrices are same, while their order varies, so
we do not need any substitution operation.
c) We propose additional edit operation –absolute trips-difference that accounts
for the differences in the OD flows when the ith preferred destination is same in
both sorted rows.
Any edit operation towards the transformation of to can be expressed as
. Following are the possible operations:
1) Absolute trips-difference: if the destination ID, D, is same in both and ,
i.e., , then associated cost is the absolute difference in the
demand = | .
2) Insertion of trips i.e., : Here, the destination ID, is inserted
in . The associated cost is the demand, .
3) Deletion of trips, i.e., : Here, the destination ID, is deleted
from . The associated cost is the demand, .
Let, S be the sequence of edit operations or edit sequence to
transform , and the cost (in terms of trips) associated with each edit operation
are , respectively. Then, Levenshtein distance for OD matrices
computed for nth row (LODn) is the minimum total cost needed for .
(Equation (40)). As the minimum cost is required, so it is an optimization problem.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 76
Refer Figure 3.15 that demonstrates two possible combinations of edit operations for
of the example shown in Figure 3.14b.
While the LODn formulation is an absolute comparison of sorted rows, we can
have a relative comparison with respect to each row of the OD matrix. This can be
achieved by considering trip productions (sum of OD flows in a row) from both sorted
rows during comparison. This relative comparison is a normalised version of LODn,
and can be expressed between a scale of 0 and 1. It is referred as NLODn and is
expressed as shown in Equation (41). Here, NLODn is obtained by normalising over
the sum of origin flows for nth row from both matrices. If the number of origins is N,
then we have N values of LODn and NLODn.
The overall comparison between the OD matrices is obtained through mean
Levenshtein distance i.e. LOD is the average of all LODn values, and the mean
normalised Levenshtein distance i.e. NLOD is the average of all NLODn as shown in
Equation (42) and Equation (43), respectively.
LODn ( , ) = (40)
NLODn ( , ) = (41)
LOD ( , ) = (42)
NLOD ( , ) = (43)
To explain the possible combinations of the edit operators, an example is
presented in Figure 3.15. Consider the sorted rows, and from the previous
example (refer Figure 3.14b). The transformation of can be achieved by
multitude of edit operation combinations. Two such possible combinations are
presented in Figure 3.15. The cost of operation in Figure 3.15a is higher than that of Figure
3.15b; that is, a total cost of 54 trips (NLOD2 = 54/ (27+51) = 0.69) and 46 trips
(NLOD2 = 46/ (27+51) = 0.59), respectively.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 77
Figure 3.15: A possible combination of edit operations vs minimum total cost of edit operations
3.4.2.2 Algorithm to compute Levenshtein distance for OD matrices
The Algorithm 2 presented in Table 3.4 demonstrates the approach adopted to estimate
Levenshtein distance for comparing OD matrices Y and X each of size M * M. LODn
and NLODn are estimated for each origin (n= 1 to M) individually that is later
aggregated to estimate LOD and NLOD, respectively.
Note that when destination IDs are different, the total cost ( ) in Algorithm 2
is estimated as the sum of demands i.e. | . One can argue why the cost is not
the average of the two demands. Average is always lower than summation, and to be
conservative we would like to have a higher cost for different destination IDs.
The self-explanatory matrix demonstration of Algorithm 2 is illustrated in Figure
3.16. Similar to traditional approach, the numbering of rows and columns of matrix
(L) commence with 0. However, to account for the OD flows, in Algorithm 2, we
replace the first row and column with cumulative sum of trips distributed to the
destinations of sorted reference and query rows.
Table 3.4: Algorithm 2 for Levenshtein distance for OD matrices (see Figure 3.16)
For each origin n (n = 1 to N) =0
Define ) and ) where,
) = [ ] and
)=[ ],
S N W E S N W E11 7 5 4 11 7 5 4
11 trips (deletion) 11 trips (deletion)N W E N W E
7 5 4 7 5 4
10 trips (absolute trips-differnce) 10 trips (absolute trips-differnce)N W E N W E
17 5 4 17 5 4
8 trips (absolute trips-differnce) 8 trips (absolute trips-differnce)N W E N W E
17 13 4 17 13 4
4 trips (deletion) 11 trips (insertion)N W N W S E
17 13 17 13 11 4
11 trips (insertion) 6 trips (absolute trips-differnce)N W S N W S E
17 13 11 17 13 11 10
10 trips (insertion)N W S E17 13 11 10
(a) =54 trips (NLOD2=0.69) (b) =46 trips(NLOD2=0.59)Total cost of edit operations
One of the possible ways of edit operations
Optimal edit operations for minimum total cost
Total cost of edit operations
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 78
Create an empty matrix L of size (M+1)*(M+1), where the row header and column header corresponds to ) and ) respectively (refer Figure 3.16).
Assign cumulative flows [ ] and
[ ] to the first row and column, respectively for j = 1 to M
for i = 1 to M
Estimate cost as
Set the cell L(i,j) = min (L(i-1,j)+ , L (i,j-1)+ , L (i-1,j-1)+ Ci,j ) Where: a) L(i-1,j) + represents the cell value immediately above the current
cell plus b) L(i, j-1) + represents the cell value immediately to the left of current
cell plus . c) L(i-1, j-1) + represents the cell value immediately in diagonal above
and to the left of current cell plus the cost . The local Levenshtein distance i.e. = L (M+1, M+1) and Normalised
Levenshtein distance is = / .
Mean Levenshtein distance values are computed as LOD = ( /N and NLOD = ( /N.
Figure 3.16: Matrix demonstration of Algorithm 2
Similar to the traditional Levenshtein approach, we have multiple possible
paths (i.e. different combination of arrows) to arrive at the final L(M+1,M+1). Each
path is a combination of editing operations represented through the following moves
Matrix of size (M+1,M+1) . . .
. . .
0 . . .
. . .
L(i,j) L(i,j+1)
L(i+1,j)
L(i+1,j+1) = Min {L(i,j+1)+ , L(i+1,j)+ , L(i,j)+C ij }
. . .
. . .
L(M+1,M+1)
Sorte
d Q
uery
row
f
or o
rigin
n
Sorted Reference row for origin n
(
)
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 79
on the matrix grid: downward movement along the diagonal is for absolute-trips
difference operation, eastward movement is for deletion operation, and vertical
downward movement is for the insertion operation.
The application of Algorithm 2 on in the example shown in Figure
3.14 is presented in Figure 3.17. Here, the direction of arrows points towards the
optimal combination of edit operations for minimum total cost. The last cell of matrix
L i.e. L(5,5) is the value of =46 trips. This value is same as the operations shown
in Figure 3.15b; that is, first we have a deletion operation (east ward arrow); two
consecutive absolute trips-difference operations (diagonal downward arrows); one
insertion operation (vertical downward arrow); and finally one more absolute trips-
difference operation (diagonal downward arrow).
Figure 3.17: Matrix (L) demonstration for
3.4.3 Levenshtein vs Wasserstein distances
The mathematical formulation for most traditional metrics is straightforward and
does not involve an optimisation approach. On the contrary, the NLOD comparison is
based on optimisation formulation, and as a result, it is computationally expensive
compared to GSSI. In the literature, the Wasserstein distance is another metric that can
structurally compare OD matrices based on optimisation formulation. Thus, this
section compares the proposed Levenshtein distance with the Wasserstein distance in
the context of OD matrices comparison.
3.4.3.1 Wasserstein distance
The Wasserstein distance is primarily used in mass transportation problems. It is
based on the Monge-Kantorovich mass transportation technique initially developed by
French mathematician Monge (1781) and major advances to it were later added by
Soviet mathematician Kantorovich (1942). The Wasserstein distance is defined as the
S N W E11 7 5 4
0 11 18 23 27
N 17 17 28 21 26 30
W 13 30 41 34 29 33
S 11 41 30 37 40 44
E 10 51 40 47 50 46
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 80
minimum cost required to optimally transfer objects from one set of locations to
another set of locations. Thus, in terms of formulation, the Wasserstein distance can
be expressed as follows (Equation (44)):
Wasserstein distance (s, h) = [ ] ; (44)
For example, in Equation (44), is the amount of sand transferred via distance
between the locations of sand (s) and holes (h), as discussed in the example
shown in Figure 3.18. Here, the Wasserstein distance is used to calculate the minimum
amount of work (optimum cost) required to transfer “sand” (amount in kgs) into
“holes” (capacity in kg). While the grid lines are the paths to be traversed between the
locations of the “sand” and “holes”; the x and y axes represent the distance in meters.
The amount of sand to be transferred from locations s1, s2, and s3 is 5, 6, and 9 kg,
respectively. The capacity of holes h1, h2, and h3 is 6, 3, and 11 kg, respectively.
Figure 3.18: Demonstration of Wasserstein distance through an example
In this example, if v (in kg) is the amount of sand to be transferred from its
location (si) to the hole location (hj) via distance (d) in meters, then cost (c) is computed
as v*d (kg-meters). The total minimum cost is achieved from the optimal combinations
of si and hj, as shown in Table 3.5. The Wasserstein distance is then computed as the
total cost (in kg-meters) divided by the total amount (in kg), which is equal to 33/20 =
1.65 meters.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 81
Table 3.5: Computation of Wasserstein distance for the example problem
si hj v d c=v*d
s1 h1 3 2 6
s1 h3 2 3 6
s2 h1 3 3 9
s2 h2 3 1 3
s3 h3 9 1 9
Total Wasserstein distance in kg-meters 33
Mean Wasserstein distance in meters 1.65
3.4.3.2 Wasserstein distance for OD matrices comparison
Ruiz de Villa et al. (2014) used the concept of the Wasserstein distance for
structural comparison of OD matrices by accounting for network topology in terms of
travel time. This is solved as a linear programming problem (see Villani, (2003) for
further detail). It is defined as the minimum vehicle-minutes required to assign trips
between OD pairs of query matrix (XQ) with a distribution similar to that of reference
OD matrix (XR) and vice-versa. The Wasserstein distance of matrix XQ to the matrix
XR is defined as shown in Equation (45).
Wasserstein (XQ, XR) = [ ] ; (45)
Here, XQ and XR are the OD matrices to be compared; are the pair of
OD pairs; is the volume of traffic assigned from . The travel cost between the
OD pairs is given by and defined as the mean travel time between the centroids.
For example, origins and destinations of OD pairs and are , and , ,
respectively. Then is computed as follows (Equation (46).
= ( , ) + ( , ) (46)
Like Equation (45), the Wasserstein distance of matrix XR to the matrix XQ (i.e.,
Wasserstein (XR, XQ)) is computed. The minimum of Wasserstein (XQ, XR) and
Wasserstein (XR, XQ) gives the final comparison between OD matrices XQ and XR.
To demonstrate the Wasserstein approach with an example (Figure 3.19), if O1
and O2 are origins; D1 and D2 are destinations in both matrices XR and XQ. While OD
pairs from XQ considered in the analysis are (O1-D2)Q and (O2-D2)Q, those from XR
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 82
reference are (O1-D1)R and (O2-D1)R. In this example, it is assumed that one vehicle
corresponds to one trip and travel time between origin and destination is not included
in the comparison of OD matrices.
Figure 3.19: (a) Sample network and (b) OD matrices XR and XQ with their corresponding paths and travel costs.
In this example, the paths traversed by vehicles in XR and XQ are shown in Figure
3.19(b). For instance, l1-l4 is the path for OD pair (O1-D2)Q. Here, the trips of XQ are
assigned using the distribution of XR in Case-1, and vice-versa in Case-2, respectively
as discussed below.
Case1: Optimal assignment of XQ (i.e. OD flows (O1-D2)Q and (O2-D2)Q ) using
the distribution of XR
1) Assignment of (O1-D2)Q : Here, 80 trips between (O1-D2)Q need to originate
from O1, travel via the paths of XR before reaching D2. The only paths used
in XR are l2 and l5 with the distribution of 60 and 100 flows between OD
pairs (O1-D1)R and (O2-D1)R, respectively. Thus, either 80 trips can be
assigned to l5 or 60 trips to l2 and 20 trips to l5. Since the latter option results
in the optimal assignment, this is chosen. Now, the travel cost between OD
pairs, (O1-D2)Q and (O1-D1)R is =10 minutes (since distance between O1-
O1 is zero and distance between D2-D1 is 10 minutes) and the distance
between (O1-D2)Q and (O2-D1)R is =2+10=12 minutes (here 2 is the
travel cost between O1-O2 and 10 is between D2-D1). Thus, the cost
associated with this assignment is 60*( ) + 20*( ) = 840 veh-minutes.
2) Assignment of (O2-D2)Q : OD pair (O2-D1)R can still accommodate 80 trips
after assigning 20 trips from the above assignment. This implies that
remaining 80 trips from (O2-D2)Q can be assigned to (O2-D1)R via l5. The
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 83
distance between OD pairs (O2-D2)Q and (O2-D1)R is =10 minutes. Thus,
the cost associated with this assignment is 80*( ) = 800 veh-minutes.
Thus, the total travel cost between XQ and XR is 840+800 = 1640 veh-minutes.
In terms of travel time per trip, it is 1640/(80+80) = 10.25 minutes per trip.
Case2: Optimal assignment of XR using the distribution of XQ
1) Assignment of (O1-D1)R: The distance between (O1-D1)R and (O1-D2)Q is
=10 minutes and between (O1-D1)R and (O2-D2)Q is =2+10=12
minutes. Since < , the optimum method is to assign 60 trips of (O1-D1)R
via path of (O1-D2)Q i.e. l1-l4. Thus, the cost associated with this assignment
is 60*( ) = 600 veh-minutes.
2) Assignment of (O2-D1)R: Following the above assignment, (O1-D2)Q can
only accommodate 20 more trips. Thus, amongst 100 trips of (O2-D1)R, 20
are sent to (O1-D2)Q and the rest of the 80 trips are assigned to (O2-D2)Q.
The distance between (O2-D1)R and (O1-D2)Q is =2+10=12 and between
OD pairs (O2-D1)R and (O2-D2)Q is =10 minutes Thus, the cost
associated with this assignment is 20*( ) + 80*( ) = 1040 veh-minutes.
Thus, the total cost between both OD matrices in Case-2 is 600+1040 = 1640
veh-minutes. In terms of travel time per trip, it is 1640/(60+100) = 10.25 minutes per
trip.
From the above two cases, the Wasserstein distance between the two OD
matrices is the minimum distance value of 10.25 minutes per trip.
Note: In this example, both cases yielded the same results. This might not be
possible if both OD matrices have unequal OD flows. For cases of different OD
volumes, Ruiz de Villa et al. (2014) proposed building a virtual OD pair to equalise
the total OD flows.
Although both the Levenshtein and Wasserstein metrics are optimisation-based,
they differ from each other, as discussed below.
First, LOD computes the structural differences between OD matrices in terms of
OD flows. On the other hand, the Wasserstein metric is expressed in terms of travel
cost.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 84
Second, the Wasserstein approach ignores the travel time between the origin and
destination zones when comparing OD matrices. If the purpose is to compare OD
matrices from different solution algorithms but from the same time-period, then it
might be justified to ignore the travel time. However, if the OD matrices to be
compared belong to different time instances/days, then the travel time cannot be
ignored. This is because the travel time between two locations could differ for different
days (e.g., during Sunday and Monday or a regular weekday and weekday during a
school holiday), and even within a day, such as peak/off-peak periods. Thus, the travel
time between the zones plays a significant role and cannot be neglected. On the other
hand, LOD has no such issues, as it is not based on travel time.
Third, the Wasserstein metric is computationally expensive compared to LOD.
This is because the solution search space for the Wasserstein metric (Equation (45) is
spread over the entire OD matrix. That is, the travel cost for all combinations of OD
pairs need to be checked for an optimum distance; whereas the local LOD is computed
separately for each row, and as such, the solution search space is constrained to OD
pairs originating from a specific origin only.
To compare the computational strength of two metrics, a Monday matrix was
compared with a Sunday matrix (see Figure 3.6). As mentioned before, evaluating
travel time is an issue with respect to Wasserstein metric; thus, the experiment is
conducted using travel distance between the zones. The test was conducted on a Dell
computer with Intel(R) Core(TM) i7-4770 CPU, 16GB RAM (3.40GHz) and the time
taken for computation was 0.33 seconds for LOD and 1690 seconds for the
Wasserstein approach. According to Ruiz de Villa et al. (2014), sparseness in OD
matrix could reduce the computational cost of Wasserstein method. However, if OD
matrices are not sparse (as shown in Figure 3.6) the Wasserstein approach is not
computationally efficient.
3.5 SENSITIVITY ANALYSIS OF GSSI AND NLOD
The primary aim of the sensitivity analysis was to test the robustness of the proposed
metrics - GSSI and NLOD. If both metrics were well designed, then their
similarity/distance values should increase/decrease as OD matrices are structurally
closer to each other and decrease/increase otherwise. Also, the structure component of
both metrics should observe no change for uniform scaling effects.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 85
While the sensitivity of NLOD can be tested directly from its formulation, there is no
explicit representation of the “structure” component of NLOD. This is because NLOD
captures both “skeleton” and “mass” together, the structural information of OD
matrices (i.e. the “skeleton”), in terms destination preferences, is implicitly considered
in its formulation. However, the sensitivity of it’s latent “structure” component can be
analysed by deploying NLOD on the normalised OD flows. To this end, two set of
experiments are designed to analyse the sensitivity of both NLOD and its latent
“structure” component to different structural changes within OD matrices. The first
experimental set up performs sensitivity analyses towards uniform scaling effects; and
the second set of experiments analyse their sensitivity towards random scaling effects
in OD flows.
The study site, data and the design of experimental set up are briefly discussed
below.
1) Study site and data: The Brisbane City Council (BCC) region is the study
area, and the data consists of Bluetooth observations observed from more
than 845 Bluetooth scanners located within the study region (refer Figure
1.9). The reference OD matrix, X is the Bluetooth based OD matrix (20 x
20) observed on Monday, 7th March 2016 (refer to Behara, Bhaskar, and
Chung (2018)). The OD pairs are represented at statistical area (SA)-3 level.
More details about the development of Bluetooth OD matrix from Bluetooth
observations is described by Michau, et al. (2014) (also refer Appendix-A)
The query OD matrices are developed specific to the experiments and are
obtained by perturbing the reference OD matrix. The details for which are
provided in section 3.5.1.
2) Experiments: We generally encounter three possible situations while
comparing OD matrices. They are:
Situation-1: OD matrices have the same structure and different OD flows;
Situation-2: OD matrices are structurally different and have different OD
flows; and
Situation-3: OD matrices are exactly similar
It would be interesting to see how the structure component of both metrics
perform in all these situations. The structure component of GSSI has an explicit
formulation (similar to Equation (37b); however, it is implicit in the formulation
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 86
of NLOD. Thus, the performance of NLOD’s structural component is tested by
deploying NLOD on the normalised OD flows. This is done to nullify the effect
of mass/OD flows while comparing OD matrices. Two experiments for each
metric are designed for the sensitivity analysis. They are as follows:
1) Uniform scaling effect- Here, the query matrices have the same
skeleton/structure as that of reference OD matrix while the mass/OD flows
vary. If the uniform scale factor is one, then both OD matrices are exactly
similar. Thus, abovementioned situation-1 and situation-3 are tested here.
2) Random scaling effect- Here, the skeleton/structure and mass/OD flows vary
between query and reference OD matrices. Thus, abovementioned situation-
2 is tested here.
3.5.1 Experimental criteria
3.5.1.1 Criteria for uniform scaling effects
Here, sensitivity of GSSI and NLOD along with their corresponding structural
components are tested for different uniform scaling percentages. The reference OD
matrix, X is compared with Yi where Yi = *X, and is chosen from [0.1, 0.2,
0.3…1.9, 2.0].
1) The condition for GSSI’s structure component to be robust:
a) The value should be equal to one for any value of between 0.1 and 2.
2) The conditions for GSSI to be robust:
a) It should increase with increase in scaling percentage for 0.1 <= < 1.
b) It should be equal to one for .
c) It should decrease with increase in scaling percentage for 1 < <=2.
3) The condition for NLOD’s structure component to be robust:
a) The results should be zero for any value of i.e. 0.1 <= <= 2.
4) The conditions for NLOD to be robust:
a) NLOD should decrease with increase in scaling percentage for 0.1 <=
< 1.
b) NLOD should be zero for .
c) NLOD should increase with increase in scaling percentage for 1 <
<=2.
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 87
3.5.1.2 Criteria for random scaling effects
Here, sensitivity of GSSI and NLOD, and their structural components are tested for
four different cases of random scaling percentages i.e. = [5%, 10%, 15%, 20%] over
three types of demand scenarios. These demand scenarios are generally encountered
in traffic demand modelling (refer Djukic et al. (2015)) and are as follows:
1) Outdated surveys (low demand),
2) The best historical estimates (medium demand), and
3) Congested traffic conditions (high demand).
Note that the scenarios are named as low (l), medium (m), and high (h) in
reference to the total daily demand on the network, and do not refer to the demands of
individual OD pairs. In each case of the demand scenario, reference OD (X) is
compared with 100 replications of query ODs ( ). The details of the demand scenarios
are as follows:
Low demand scenario: Here, GSSI/NLOD compare X and where,
and i . For instance, if =20%, then ranges
between 60% and 80% of X, and similarly for other values of .
Medium demand scenario: Here, GSSI/NLOD compare X and
where, and i . For instance, if =20%,
then ranges between 80% and 100% of X, and similarly for other values of .
High demand scenario: Here, GSSI/NLOD compare X and where
and i . The OD matrices for the high demand
scenario represent demand during congested periods. Say, high daily demand can be
witnessed during major events, such as Commonwealth games etc. For instance, if
=20%, then, ranges between 105% and 125% of X and, similarly for other values
of .
The conditions for both GSSI, NLOD and their structural components to be robust
towards random effects are:
1) They should reflect the random structural differences that exist between the OD
matrices. The GSSI (and its structural component) values should decrease/
increase with increase/decrease in the magnitude of random scaling effects for
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 88
all three demand scenarios; and the vice-versa for NLOD and its structural
component.
3.5.2 Results of uniform scaling effects:
The results of uniform scaling for GSSI and NLOD along with their corresponding
structural components are shown in Figure 3.20(a) and Figure 3.20(b), respectively.
The plots illustrated that GSSI and NLOD satisfied the conditions specified in section
3.5.1.1 i.e. GSSI values increased from 0.04 to 1 for 0.1 <= < 1 and decreased from
1 to 0.64 for 1 < <=2; and NLOD values decreased from 0.8 to 0 for 0.1 <= < 1
and increased from 0 to 0.3 for 1 < <=2. Similarly, the structural components of
GSSI and NLOD remained unaffected (i.e. equal to 1 and equal to 0, respectively) for
both scaling-up and scaling-down cases (as described in section 3.5.1.2). Thus, it is
proved that both metrics and their structural components are robust towards uniform
scaling effects.
(a)
(b)
Figure 3.20: Results of uniform scaling for GSSI and NLOD
0.000.100.200.300.400.500.600.700.800.901.00
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
GSS
I/ st
ruct
ure c
ompo
nent
va
lues
Scaling factor
GSSI GSSI's structure component
0.000.100.200.300.400.500.600.700.800.90
0.10.20.30.40.50.60.70.80.9 1 1.11.21.31.41.51.61.71.81.9 2
NLO
D/ S
truct
ure c
ompo
nent
va
lues
Scaling factor
NLOD Structure component of NLOD
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 89
3.5.3 Results of random scaling effects
The box plots shown in Figure 3.21 demonstrate that as the magnitude of random
fluctuations increase, the similarity measure by both GSSI and its structural component
decrease. For instance, the values for GSSI, as illustrated in Figure 3.21 (a), for low
demand scenario are 0.7759, 0.7675, 0.7489 and 0.7307 for = 5%, 10%, 15%, 20%,
respectively. The results showed similar decreasing trend for all three demand
scenarios in Figure 3.21 (a) and Figure 3.21 (b).
(a)
(b)
Figure 3.21: Results of random scaling effects for (a) GSSI and (b) its structure component
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 90
The plots shown in Figure 3.22(a) demonstrate that as the magnitude of random
fluctuations increase, the distance measure by NLOD and its structural component also
increase. For instance, the values for NLOD, as illustrated in Figure 3.22(a), for low
demand scenario are 0.14, 0.24, 0.31 and 0.39 for = 5%, 10%, 15%, 20%,
respectively. The results showed similar increasing trend for all three demand
scenarios (Figure 3.22(a) and Figure 3.22(b)). Thus, the results prove that NLOD and
its structure component are robust towards random scaling effects.
(a)
(b)
Figure 3.22: Results of random scaling effects for (a) NLOD and (b) its structure component
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 91
From the results of both experiments, it can be concluded that GSSI and NLOD
are sensitive to the structural differences within the OD matrices and are robust
statistical measures. Following the sensitivity test, a real case study analysis to
demonstrate the practical application of NLOD.
To further demonstrate their potential over the limitations of traditional metrics,
the same example discussed in Section 3.1 could be considered. The proposed metrics,
NLOD and GSSI were deployed to see if they could account for structural differences
between M1 and M2 in comparison to MR (see Figure 3.1). The results of the GSSI
(considering one window of size 4*4) and NLOD are presented in Table 3.6. Both
metrics identified the differences and this indicates that M1 was structurally closer to
MR than M2. Note that the GSSI is a similarity value, which means the higher the
similarity, the lower the distance value.
Table 3.6: Structural comparison of sample OD matrices using the proposed metrics
Proposed metrics M1 and MR M2 and MR
GSSI 0.9910 0.8213
NLOD 0.0476 0.0734
3.6 SUMMARY
To summarise, the chapter began with a discussion on the limitations of
traditional metrics that are generally based on cell by cell comparison and often neglect
OD matrix structural information within their formulations. To overcome this
problem, this chapter adopts and extends two existing metrics: Structural Similarity
Index (SSIM) and Levenshtein distance. The proposed metrics named, Mean
Geographical window based SSIM (GSSI) and Mean Normalised Levenshtein
Distance for OD matrices (NLOD) exploit the structure of OD matrices and provide
comparison results with physical significance.
Compared to traditional SSIM, the GSSI technique is computationally effective;
can capture local travel patterns and preserves geographical integrity. Further,
proposed NLOD is an optimisation-based metric and is computationally better than
another popular metric – Wasserstein distance. While GSSI computes statistics on
Chapter 3: Development of Statistical Metrics for the Structural Comparison of OD Matrices 92
group of OD pairs that are geographically correlated, NLOD performs analysis on OD
pairs belonging to one specific origin.
The sensitivity of the proposed metrics is further tested towards uniform scaling
and random scaling effects. The findings of the sensitivity analysis suggest that GSSI
and NLOD approaches are robust statistical metrics and have potential for practical
applications involving OD matrices comparison.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 93
Chapter 4: Assignment-based OD
Matrix Estimation:
Exploiting the Structure of
Bluetooth Trips
This chapter presents the background about the issues related to traffic counts-
based OD estimation methods in Section 4.1, description of the study network in
Section 4.2, Matlab-Aimsun bi-level framework in Section 4.3, OD estimation using
the additional structural knowledge of B-OD flows in Section 4.4, using structural
knowledge of B-SP flows in Section 4.5, comparison of B-OD and B-SP methods in
Section 4.6, demonstration of the B-SP method for lower penetration rates of Bluetooth
trajectories in Section 4.7 and finally, summary of the chapter in Section 4.8.
4.1 BACKGROUND
This chapter discusses assignment-based methods for estimating OD matrices based
on the structural knowledge about Bluetooth trips and observed link counts. As
discussed in Chapter 2, most OD matrix estimation methods are dependent on traffic
counts observations only. One of the key challenges of the traffic counts-based method
is the problem of under-determinacy. In the past, several efforts (Bierlaire & Toint,
1995; Gur, 1980b; Kim, et al., 2001) have been made to minimise this problem by
maintaining structural consistency within the OD matrix estimates. This is generally
achieved by either incorporating target OD information within the objective function
formulation or using additional constraints based on trip productions and attractions in
the solution algorithms. However, this additional information is based on outdated
surveys and the solution is always biased. Nevertheless, with the availability of
additional up-to-date structural information from emerging data sources, such as
Bluetooth, the current problem of under-determinacy can be reduced, and may
therefore improve the quality of estimated OD matrices.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 94
In this light, the current chapter proposes methods to incorporate this additional
information within the objective function formulation of bi-level optimisation. The
structural knowledge from Bluetooth trips can be represented in two ways: Bluetooth
OD (B-OD) flows, and Bluetooth subpath (B-SP) flows. Two methods, namely B-OD
structure-based method and B-SP structure-based method are proposed to exploit the
structural knowledge of B-OD flows and B-SP flows, respectively. The B-OD method
is applicable for networks that have a good connectivity of Bluetooth scanners. For
instance, the sub-network comprising regions in and around the Brisbane inner city
has very good connectivity of Bluetooth scanners. In the situations where the
penetration rate of Bluetooth trajectories is low, the B-SP method can be implemented.
Both the proposed methods depend on Aimsun simulation for assignment, and due to
which they are also referred to as assignment-based methods.
4.1.1 B-OD structure-based method (or B-OD method)
This method was designed from the structural perspective of B-OD flows and
further divided into two scenarios: an ideal scenario and near-ideal scenario. In the
ideal scenario, the Bluetooth OD demand sample rate was assumed to be η=20% of
the true OD flows, and this scenario is termed as “ideal” because of the following
reasons. First, the trips-ends inferred from Bluetooth are assumed to be the actual
origins and destinations of the trips. Second, the penetration rate of Bluetooth OD
flows is assumed to be fixed (here, η=20%), and; thus, the structure of the B-OD is
assumed to be an exact representation of the true OD structure.
However, the penetration rate of Bluetooth OD flows might not be same for all
OD pairs in reality. Thus, the near-ideal scenario was designed to include randomness
in the penetration rate of B-OD flows. To do this, 20% of total trajectories were
randomly selected to represent Bluetooth trips (trajectories), which were further used
to construct an observed B-OD matrix of random structure. Despite introducing
randomness in the B-OD flows, this approach is referred to as near-ideal because it is
assumed that the Bluetooth trip ends are the actual origins and destinations (in other
words it is assumed that Bluetooth trajectories infer complete paths traversed by
vehicles), and the penetration rate of Bluetooth trajectories is assumed to be known
(i.e. 20%).
Both scenarios were tested for different percentage connectivity (Ω) of OD pairs
with Bluetooth. However, the ideal case scenario was meant to be the proof of the
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 95
concept; it was experimented for Ω=25%, 50%, 75% and 100%, and tested for one
random prior OD demand only. On the other hand, the near-ideal case scenario was
tested for Ω=20%, 40%, 60%, 80%, and 100%, and based on three random prior OD
demands.
4.1.2 B-SP structure-based method (or B-SP method)
The formulation of the B-SP method was similar to the B-OD method; however,
with the difference being that it incorporates the structural knowledge directly from
Bluetooth subpath flows (B-SP) and not from B-OD. The underlying concept behind
the B-SP method formulation is that the actual observations of Bluetooth trajectories
might not be the complete representation of trips, and trip ends might not be the actual
ones. In other words, the Bluetooth paths are only subpaths of actual paths traversed
by vehicles. Thus, exploiting the structure of Bluetooth trips from the perspective of
subpaths is more realistic.
The fundamental difference between B-OD flows and B-SP flows is further
explained with an example network, as shown in Figure 4.1a. The true OD for the
sample network is shown in Figure 4.1c. The path flows per OD pair are shown in the
Table 4.1.
Figure 4.1: Sample network (with installed BMS), paths and OD matrices
Table 4.1: Path flows for example network
O1D1 O1D2 O2D1 O2D2
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
20 30 50 100 75 25 100 150 50 150 150 100
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 96
Assuming Ω = 100%, all OD pairs (i.e., O1-D1 to O2-D2), and all paths (i.e.,
P1 to P12) are Bluetooth connected. This implies that the path connecting any OD pair
can be completely represented as a sequence of Bluetooth scanners. For instance, the
path P1 of O1-D1 (see Figure 4.1b) can be represented as Bv1–B5-Bv3. Here, Bv1 and
Bv3 indicate virtual Bluetooth scanners that directly connect to zonal centroids, and
thus are essential for building B-OD matrices. In other words, the B-OD method
depends on the complete sequence of Bluetooth inferred trajectories.
The ideal scenario considers B-OD to be 20% of the true OD and is shown in
Figure 4.1d, and the structures of true OD and B-OD (ideal scenario) are the same. The
B-OD for the near-ideal scenario (Figure 4.1e) is developed by randomly selecting
20% of total trajectories; that is, randomly selecting 200 out of 1,000 trips (note that
the sum of all OD flows in true OD matrix is 1,000). The random selection ensures the
structure of B-OD is random and differs from that of true OD.
To explain the concept of B-SP flows, assume that the virtual scanners; that is,
Bv1, Bv2, Bv3 and Bv4 are not present, and the scanner B5 is either unavailable or not-
working. In such situations, the Bluetooth trajectories are not the complete
representation of actual trips, and as such, they can only provide trip information at the
subpath level. For instance, in Figure 4.1a, trips through paths P1, P4, P7, and P10 are
not available (due to unavailability of B5 and the virtual scanners), and the available
subpaths are only B1-B3-B4, and B1-B2-B4.
The complete paths that pass through subpaths B1-B3-B4 are P2, P5, P8, and
P11. Similarly, the paths consisting of subpaths B1-B2-B4 are P3, P6, P9, and P12.
The true subpath flows for B1-B3-B4 and B1-B2-B4 are shown in Table 4.2. Thus, the
total subpath flows are (30+75+150+150) + (50+25+50+100) = 630.
The experiments related to B-SP method (see section 4.5.3) are being conducted
for different penetration rates of Bluetooth trajectories that can be explained with the
help of the same example as follows. In this example, a random selection of η = 10%
means 63 out of 630 sub-trajectories (since each sub-trajectory corresponds to one unit
of subpath flow value) are selected randomly, and let’s say, it yielded B-SP flows of
20 and 43 for subpaths B1-B3-B4 and B1-B2-B4, respectively. Since it is a random
selection, the penetration rate of subpath (B-SP) flows for individual subpaths is also
random. For instance, the penetration of B-SP flows is 5% and 19% for flows in
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 97
subpaths B1-B3-B4 and B1-B2-B4, respectively, and are different from the overall η
=10% (see 3rd column of Table 4.2).
Table 4.2: Demonstrating the difference between true and Bluetooth subpath flows for the given
example
Subpaths True subpath flows B-SP flows
B1-B3-B4 30+75+150+150 = 405 20 (5% of 405)
B1-B2-B4 50+25+50+100 = 225 43 (19% of 225)
4.2 STUDY NETWORK AND DATA
To test the proposed methodology, the study network should have the following
properties:
1. It should be realistic and representative of the existing infrastructure;
2. It should have sufficient route choice options;
3. It should have a combination of at least two different types of road hierarchy
i.e. motorway and arterial;
4. OD pairs should have sufficient overlap between the paths;
5. It should have sufficient Bluetooth connectivity; and
6. Loop detectors to be located on major paths.
The analysis for this study was performed in Aimsun Next (2019), traffic
simulation controlled environment. A synthetic Brisbane city network was built from
the open street map imported into Aimsun Next (Figure 4.2a) that comprised 15
centroids, 24 loop detectors (red squares in Figure 4.2a), and 51 Bluetooth scanners
(blue circles in Figure 4.2a). The loop detectors are placed on the major roadways such
as Pacific Motorway, Clem Jones Tunnel, Coronation Drive, Inner City Bypass, and
Kelvin Grove Road etc. The OD matrix was designed at a zonal level equivalent to
Statistical Area 2 (SA2) (ASGS, 2017) and was 15 x 15 in size. Internal trips were
excluded in the analysis; thus, the total number of OD pairs considered was 15*15-15
=210.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 98
(a)
(b)
Figure 4.2: (a) Study site installed with Bluetooth scanners and loop detectors (b) spatial structure of Brisbane City core network
Figure 4.2b shows the spatial structure of Brisbane City, its neighbouring
suburbs, and the primary transport network. The 15 zonal centroids shown are: 1) West
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 99
End-South Bank-Highgate Hill; 2) Gabba; 3) Brisbane (BNE) Inner East; 4) New
Farm; 5) Fortitude Valley; 6) Spring Hill; 7) Central Business District (CBD); 8)
Newstead-Bowen Hills; 9) Kelvin Grove–Herston; 10) Red Hill–Milton–
Auchenflower; and five external zonal centroids; that is, 11) Ext-1, 12) Ext-2, 13) Ext-
3, 14) Ext-4, and 15) Ext-5, respectively.
To check the efficiency of the proposed methods, the OD matrix estimates
resulting from these methods were compared with those of Xtrue, using RMSE
(Equation (47)), StrOD (Equation (48)), and GSSI (Equation (49)), as described below:
(47)
(48)
(49)
Where, and are the means of the OD vectors X and Xtrue; and
are the OD flows from th geographical window of X and Xtrue ; and , and
and are the mean and variances of and , respectively. See the
notations section for information about the other terms.
Because GSSI depends on the knowledge of higher zonal level OD pairs, the 15
statistical zones shown in the Figure 4.2a were further classified into higher level zones
based on their geographical proximity (see Figure 4.2b). The OD matrix that is split
into geographical windows is illustrated in Figure 4.3 (refer Section 3.3 for further
details about geographical window concept).
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 100
Figure 4.3: Splitting the study OD matrix into geographical windows
The study network was loaded with a true OD vector (Xtrue) and the link flows
that thus resulted from Xtrue were the observed link flows ( l) at lth link. The total link
flows from the selected 24 links are represented by vector = [ 1... l… L]. Note that
the analysis conducted in this chapter assumed no errors in observed link flows. The
prior OD matrix considered for both methods was generated using Equation (50).
Xprior = (50)
By generating three replications of Xprior from Equation (50), three random prior
OD matrices were generated, namely Xprior1, Xprior2, and Xprior3. The RMSE, StrOD,
and GSSI values of the three Prior ODs as compared to the Xtrue are shown in Table
4.3.
Table 4.3: Comparison of Xprior with Xtrue for all three replications
Replications RMSE
(Xprior, Xtrue)
StrOD
(Xprior, Xtrue)
GSSI (Xprior1, Xtrue)
Replication-1 14.02 0.8142 0.7248 Replication-2 13.24 0.8178 0.7406 Replication-3 12.34 0.7964 0.7297
4.2.1 Development of observed B-OD flows ( )
The B-OD method depends on observations of Bluetooth OD (B-OD) flows
(represented by ). Because this method is further categorised into ideal and near-
ideal, the way is generated is different for both scenarios. Equation (51) represents
the way is generated for the ideal scenario of the B-OD method.
Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 Z10 Z11 Z12 Z13 Z14 Z15WestEnd-SouthBank-Highgate Hill Z1Ext-5 Z2Gabba Z3BNE Inner East Z4New Farm Z5Valley Z6Spring Hill Z7CBD Z8Newstead-Bowen Hills Z9Ext-2 Z10Ext-4 Z11Ext-1 Z12Kelvin Grove-Herston Z13RedHill-Milton-Auchenflower Z14Ext-3 Z15
HZ1 HZ2 HZ3 HZ4 HZ5 HZ6
HZ6
HZ5
HZ4
HZ3
HZ2
HZ1
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 101
= (51)
The true OD flows for OD pairs that are Bluetooth connected are represented by
. Thus, for 100% connectivity = Xtrue. From Equation (51), it can be said
that StrOD ( , ) = 1 for the ideal scenario. However, for the near-ideal scenario,
the approach described in Figure 4.4 is adopted to generate random B-OD flows. After
averaging over 100 replications, the StrOD ( , Xtrue) was observed to be 0.8778.
Figure 4.4: Generation of for the near-ideal scenario of the B-OD method
4.2.2 Development of observed B-SP flows ( )
For the B-SP method, the B-SP flows ( ) are used as observations in addition to
the observed link counts (Y) in the objective function. Since this method tests the
effectiveness of Bluetooth SP-based structural information, the observed B-SP flows
are generated for different penetration rates ( ) of Bluetooth sub-trajectories, as shown
in the flowchart depicted in Figure 4.5. The vector corresponding to a “ ”
penetration rate is produced by averaging over I=5 replications.
Replication-100
Replication-i
Bluetooth trajectories
Aimsun modelXtrue BMS data
Average B-OD flows ( )
Start
Replication-1
B-OD flows ( )
B-OD flows ( )
B-OD flows ( )
Stop
Random selection of Ƞ=20% trajectories
Development of OD matrices
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 102
Figure 4.5: Generation of for the B-SP method
The development of B-SP vector ( ) is explained as follows:
First, assign Xtrue in the study network model in Aimsun next. The resulting
trajectories are stored as a complete sequence of scanner IDs. The first and
last scanner IDs of the complete trajectory are directly linked to the actual
origin and destination zones of the complete trip. The resulting number of
trajectories in this study was 5,273.
Convert the trajectories to sub-trajectories by de-selecting the first and last
scanner ID from the complete trajectory sequence (this is done because the
actual Bluetooth trajectories do not always begin with and end into true trip
ends). The resulting number of sub-trajectories after this process was 3,875
for this study.
Now, randomly select ƞ % of the sub-trajectories. The sub-trajectories with
the same sequence of BMS IDs are identified as a Bluetooth subpath, and
their total number refers to the subpath flows. The vectors of such subpath
flows, averaged over 5 replications, forms B-SP flows vector ( ).
Replication-I
Replication-i
Bluetooth trajectories
Aimsun modelXtrue BMS data
Average B-SP flows ( )
Start
Trimming the ends of trajectories to form
sub-trajectories
Replication-1
B-SP flows ( )
B-SP flows ( )
B-SP flows ( )
Stop
Random selection of Ƞ%sub-trajectories
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 103
4.3 BI-LEVEL FRAMEWORK: MATLAB - AIMSUN
INTEGRATION
The OD matrix estimation algorithms for both the B-OD method and B-SP
method were based on a bi-level framework where the objective function was
minimised in the upper level and user-equilibrium assignment in the lower level. The
codes for optimisation were written in MATLAB (2017 version), and Aimsun next
(2019) was used to run the microscopic simulation. The default parameter values were
used for both demand scenarios and experiments in Aimsun Next. A Python script,
Autorun.py (see Appendix E) was written to integrate the optimisation model (in
MATLAB) with the traffic assignment (in Aimsun next). However, MATLAB is the
primary platform that writes OD data into Aimsun next OD format, runs the
simulation, executes the Python script, and reads the simulation outputs for further
optimisation process. The integration of both platforms is further shown in Figure 4.6.
Figure 4.6: MATLAB-Aimsun integration framework
4.4 B-OD METHOD: OD MATRIX ESTIMATION USING B-OD
STRUCTURE
This section details the approach adopted for the B-OD method, along with the
design of the experiments. The B-OD method differs from traditional traffic counts-
based approach from the way objective function is modified using additional structural
Aimsun simulator
OD matrix in Aimsun
formatAimsun.m
Start
OD estimation algorithm coded
in Matlab
Is convergence achieved? End
Python script
Simulation outputs
Link flows &Assignment
matrix
Yes
No
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 104
knowledge from B-OD matrix (see Figure 4.7 for the comparison between traditional
method and proposed B-OD method).
Figure 4.7: (a) Traditional link counts-based method vs (b) proposed B-OD method
The basic difference between the two flow charts lies in the two additional input
boxes; that is, the and IX (represents random Ω% of OD pairs that are Bluetooth
connected), and the objective function formulation.
4.4.1 Objective function formulation
This study proposes an approach to integrate the observed B-OD structural
information into the traditional formulation, as shown in Equation (52).
Aimsun model
Xprior
Startk=1
Simulated link flows ( ) and
assignment ( )
Step length,
End
Is convergence
achieved?
Prior Step length prior
Upd
ate
OD
mat
rix
No
Yes
(a)
Obs. Link flows,
Update step length, based on and
Aimsun model
Xprior
Startk=1
Simulated link flows ( ) and
assignment ( )
Step length,
End
Is convergence
achieved?
Prior Step length prior
Update OD matrix
No
Yes
Bluetooth OD vector,
Bluetooth OD connectivity matrix, Ix
(b)
Obs. Link flows,
Update step length, based on and
Upd
ate
OD
mat
rix
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 105
(52)
; (52a)
Where, is the vector of OD flows from Bluetooth observations. The matrix IX
is an incidence matrix that selects only those random Ω% of OD pairs that are
Bluetooth connected. Thus, when it is multiplied with X, the vector X is transformed
to X*. The formulation comprises of two sub-functions: The deviation of user
equilibrium link flows (Y) from the observed flows ( ), and the structural comparison
of the estimated (X*) and observed Bluetooth ( ) OD flows expressed as through
StrOD ( , X*). See Equation (53) for the formulation of StrOD ( , X*).
StrOD( , X*)= (53)
In Equation (53), is a vector of dimensions equal to that of with each cell value
equal to ; similarly corresponds to . Here, the constant c is used to convert a
similarity measure StrOD ( , X*) into a dissimilarity measure i.e. .
The dissimilarity measure acts as a scaling factor to the main objective function i.e.
deviation of traffic counts. The range of values for StrOD ( , X*) lies between -1 and
1, which means this part of the objective function ranges from c+1 to c-1. For to
be stable, the minimum value of c needs to be greater than 1. Assuming c=2, the second
part of the formulation becomes , and this means that when
structures of and are same, , is equal to “1”, reduces to a
traditional link counts deviation; that is, . This implies that
simulated trip distribution matches the actual distribution, and simply minimising
traffic counts deviations should be sufficient to estimate OD. On the other hand, when
the structures of and are extremely opposite, reaches its minimum
value of “-1”, and the objective function multiplies (c+1)2 times; that is,
, if c=2. This implies that deviation between traffic counts has
amplified by c+1=3 times due to extreme variations in the distribution of trips. In other
words, considers any structural differences between the
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 106
estimated/simulated and observed trip distribution from the perspective of subpath
flows.
The advantages of the proposed formulation are two-fold. First, the Bluetooth
OD observations are up-to-date. Second, the need for unknown weight factors is
relaxed because StrOD ( , X*) is a normalised value. Note: since Bluetooth
observations are only a fraction, StrOD (which is the structure component of GSSI)
suits well because it captures the structure through correlation coefficient and does not
require Bluetooth penetration rates that are generally unknown.
4.4.2 OD matrix estimation algorithm
This study adopted the gradient descent algorithm proposed by Spiess (1990) for
the OD matrix estimation. The property of this algorithm is that it always has the
direction of the largest yield with the goal of minimising the objective function. It was
coded in MATLAB and run for different experiments of the B-OD method which are
further discussed in Sections 4.4.3.
The gradient descent optimisation method is based on two major factors: search
direction and step-size ( ). One of the ways to arrive at the search direction is by
computing the gradient of the objective function at the current solution. On the other
hand, the step size ( ) parameter determines the number of iterations required for the
convergence. Lower step length values ensure that the path of the gradient is smooth
but computationally expensive. On the other hand, higher values of step length can
lead to higher values of the objective function, and the convergence could be affected.
Thus, both search direction and step-size play a crucial role in the gradient descent
optimisation. Regarding the proposed objective function formulation, the search
direction and step-sizes are further discussed in detail, as follows.
4.4.2.1 Search direction
The gradient of the objective function is computed using Equations (54) and its
subsets.
(54)
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 107
(54a)
The derivation of the StrOD ( , IXX) with respect to X is further explained as
follows:
First, the formulation is simplified into Equation (55), and the
derivative of StrOD ( , IXX) with respect to X is given by Equation (55)a.
(55)
(55a)
4.4.2.2 Step-size
After determining the search direction, the step size needs to be defined for
updating OD matrix, Xk for the next iteration, “k+1” i.e. Xk+1. The updating step is
performed using Equation (56). Here, Z1 and X in refer to the values
corresponding to iteration k.
= (56)
(56a)
Because OD flows are always non-negative, and optimum can be derived by
solving Equation (57) subject to the constraint shown in Equation (56)a.
(57)
However, in the current study, the bold-driver technique is proposed to adapt the
value of step-size ( ) to the value of objective function. This technique is commonly
used in annealing the learning rate (Battiti, 1989; Vogl, Mangis, Rigler, Zink, & Alkon,
1988). According to this approach, a prior value of step-size is chosen that is
modified in every iteration based on the value of the objective function in consecutive
iterations. For instance, if the value of the objective function in the (k-1)th step is less
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 108
than that of value in the current step, k (i.e., Z1(k-1) < Z1(k)) then = .
Otherwise, reset the optimisation parameters (i.e., Xk) to that of (k-1)th iteration (i.e.,
Xk=Xk-1) and set = * . The values of and are generally chosen
based on the examination of convergence.
For the experiments discussed in this chapter, the values of were chosen as
either 1.25 or 1.5, and were tested for values from 0.7, 0.8 and 0.9, respectively.
In the current study, the maximum number of iterations was predetermined to be 20,
which represents the termination criteria for the optimisation problem (in the past,
researchers such as Bullejos, et al. (2014) also conducted convergence for 20
iterations).
The sequence of steps for the proposed B-OD method are explained below:
Step 1: Choose prior OD demand (Xprior), observed B-OD flows, , and link
flows, .
Step 2: Set k=1; Xk = Xprior and = .
Step 3: Run Aimsun_matrix.m function (refer Function 2, Appendix D) in
MATLAB that converts Xk to OD in Aimsun format (say, ) and then
loads the study network in Aimsun with demand followed by
dynamic user equilibrium (DUE) assignment. After executing Aimsun.m (see
Function 3, Appendix D), the following simulation outputs are obtained: a)
SQLITE database of link flows (see Function 4, Appendix D); and b)
assignment matrix text file (see Function 6, Appendix D).
Step 4: Aggregate the link flows (Yk) from the SQLITE database for one hour
(note that the OD matrix input is also for one hour).
Step 5: For the Bluetooth connectivity rate of Ω%, estimate .
Step 6: Compute Z1 using Equation (52), and calculate the gradient of Z1
using Equation (54).
Step 7: For k>1, if Z1(k) <= Z1(k-1) then = and go to Step 8; else
= ,and set Xk=Xk-1 and GOTO Step 3.
Step 8: k=k+1; update the demand (Xk) for the next iteration using Equation
(56).
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 109
Step 9: Check for termination criteria, and if it is not met, go to Step 3. Else
terminate the optimisation and value of Xk is the final estimated OD matrix
(Xest).
Step 10: Check the quality of estimated OD matrix, Xest with Xtrue using
RMSE, StrOD and GSSI.
4.4.3 Experiments – ideal and near-ideal scenarios of B-OD method
The experiments for the B-OD method were divided into ideal and near-ideal
scenarios and compared with the traditional traffic counts-based formulation, as
discussed below:
Traditional case: Here, the deviation between the observed and user-equilibrium
link flows were minimised.
The ideal scenario of the B-OD method was further tested for four different cases
(Case-1 to Case-4) where the structural comparison of the B-OD flows with OD matrix
estimates was deployed in the formulation for different values of Ω. The four different
cases were:
B-OD-ideal Case-1: Here, Ω was 25%. Thus, 53 OD pairs were randomly
selected to provide B-OD structural knowledge.
B-OD-ideal Case-2: Here, Ω was 50%. Thus, 105 OD pairs were randomly
selected to provide B-OD structural knowledge.
B-OD-ideal Case-3: Here, Ω was 75%. Thus, 158 OD pairs were randomly
selected to provide B-OD structural knowledge.
B-OD-ideal Case-4: Here, Ω was 100%. Thus, all 210 OD pairs contributed
towards B-OD structural information.
The experiments for the ideal scenario were tested for only one prior OD matrix;
that is, Xprior1.
The near-ideal scenario of B-OD method was further divided into five different
cases (Case-1 to Case-5) for different values of Ω.
B-OD-near-ideal, case-1: Here, Ω was 20%. Thus, the structural information
from 42 random OD pairs were only selected.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 110
B-OD-near-ideal, case-2: Here, Ω was 40%. Thus, only 84 OD pairs were
randomly selected.
B-OD-near-ideal, case-3: Here, Ω was 60%. Thus, only 126 OD pairs were
randomly selected.
B-OD-near-ideal, case-4: Here, Ω was 80%. Thus, only 168 OD pairs were
randomly selected.
B-OD-near-ideal, case-5: Here, Ω was 100%. Thus, all 210 OD pairs were
selected.
The experiments for the near-ideal scenario were tested for three prior OD
matrices: Xprior1, Xprior2, and Xprior3.
4.4.4 Results for the ideal scenario of B-OD method
In this section, the estimated OD matrices from all four cases of the ideal
scenario of B-OD method, traditional method, and prior OD (Xprior1) are compared
using performance measures – RMSE, StrOD, and GSSI, as further discussed in the
following sections.
4.4.4.1 RMSE results
Figure 4.8 shows that the RMSE results of the ideal-scenario cases were better
than those of Xprior (14.02) and the traditional approach (12.34). The percentage
improvement with respect to Xprior (as shown in Figure 4.9) was 11.98% for the
traditional method. On the other hand, the improvement increased from 20.11% to
31.38% as Ω increased from 25% to 100%, respectively for the ideal scenario cases.
Figure 4.8: RMSE w.r.t. Xtrue for the traditional and ideal scenario cases of the B-OD method
14.02
12.34
11.210.41
9.83 9.62
9
10
11
12
13
14
15
Prior Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
RMSE
Experiments
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 111
Figure 4.9: Percentage of improvement in RMSE w.r.t. Xprior for traditional and ideal scenario cases of the B-OD method
Compared to the traditional method, Figure 4.10 illustrates that the results for
the cases of the ideal scenario showed improvement in RMSE from 9.24% to 22.04%
as Ω increased from 25% to 100%, respectively.
Figure 4.10: Percentage of improvement in RMSE w.r.t. traditional method for ideal scenario case of B-OD method
4.4.4.2 StrOD results
On the other hand, the results based on the StrOD measure demonstrated the
level of structural consistency maintained within the OD matrix estimates for the
traditional, as well as ideal scenario experiments. Figure 4.11 shows that the StrOD
(Xprior, Xtrue) was 0.8142. Although the RMSE results showed improvement (11.98%)
in the previous section, the quality (in terms of structure) of the OD matrix estimated
from the traditional method did not show any improvement. Instead, there was a
decrease in the value from 0.8142 to 0.8107 in Figure 4.11. This could be attributed to
the fact that the traditional link counts-based method is highly under-specified, and as
such, although there was improvement in the RMSE, the structure of the matrix could
not be improved. The percentage degradation of the OD matrix structure was 0.43%
for the traditional method (see Figure 4.12).
11.98
20.11
25.75
29.8931.38
10.00
15.00
20.00
25.00
30.00
35.00
Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
% im
prov
emen
t in
RMSE
w.r.
t. Pr
ior O
D
Experiments
9.24
15.64
20.3422.04
5.00
10.00
15.00
20.00
25.00
Ω=25% Ω=50% Ω=75% Ω=100%% im
prov
emen
t in
RM
SE w
.r.t.
Trad
tiona
l OD
Experiments
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 112
On the other hand, the results showed an increase in the quality of OD estimates
as Ω increased from 25% to 100% for the ideal scenario cases from 0.8440 to 0.8880,
respectively (see Figure 4.11). The percentage improvement in the structure of the OD
matrix with respect to Xprior (except the traditional case), and with respect to traditional
method are shown in Figure 4.12 and Figure 4.13, respectively.
Figure 4.11: StrOD w.r.t. Xtrue for the traditional and ideal scenario cases of the B-OD method
Figure 4.12: Percentage of improvement in the StrOD w.r.t. Xprior for the traditional and ideal scenario cases of the B-OD method
Figure 4.13: Percentage of improvement in the StrOD w.r.t. traditional method for the ideal scenario cases of the B-OD method
0.8142 0.8107
0.844
0.87160.8805
0.888
0.80.810.820.830.840.850.860.870.880.890.9
Prior Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
StrO
D
Experiments
-0.43
3.66
7.058.14
9.06
-1.00
1.00
3.00
5.00
7.00
9.00
Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
% im
prov
emen
t in
StrO
D w.
r.t.
Prio
r OD
Experiments
4.11
7.51
8.619.53
2.00
4.00
6.00
8.00
10.00
Ω=25% Ω=50% Ω=75% Ω=100%
% im
prov
emen
t in
StrO
D w
.r.t.
Trad
tiona
l OD
Experiments
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 113
4.4.4.3 GSSI results
Figure 4.14 demonstrates that GSSI (Xprior, Xtrue) was 0.7248, which improved
to 0.7556 (with 4.25% improvement in Figure 4.15) using the traditional method. The
value of GSSI and its percentage improvement was even better for other cases of the
ideal scenario, as shown in Figure 4.14 and Figure 4.15, respectively. Figure 4.16
shows that the ideal scenario cases outperformed the traditional method with a higher
percentage improvement of 10.35% for the Ω=100% scenario.
Figure 4.14: GSSI w.r.t. Xtrue for the traditional and ideal scenario cases of the B-OD method
Figure 4.15: Percentage of improvement in the GSSI w.r.t. Xprior for the traditional and ideal scenario cases of the B-OD method
Figure 4.16: Percentage of improvement in the GSSI w.r.t. traditional method for the ideal scenario cases of the B-OD method
0.7248
0.7556
0.8026
0.8248 0.8269 0.8338
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Prior Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
MG
eoSS
IM
Experiments
4.25
10.73
13.80 14.0915.04
4.00
6.00
8.00
10.00
12.00
14.00
16.00
Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
% im
prov
emen
t in M
GeoS
SIM
w.
r.t. P
rior O
D
Experiments
6.22
9.169.44
10.35
6.00
7.00
8.00
9.00
10.00
11.00
Ω=25% Ω=50% Ω=75% Ω=100%
% im
prov
emen
t in
MGe
oSSI
M
w.r.t
. Tra
dtio
nal O
D
Experiments
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 114
4.4.5 Results for the near-ideal scenario of B-OD method
This section presents the comparison of the estimated OD matrices from all five
cases of the near-ideal scenario of the B-OD method, traditional method, and the Xprior
made using the performance measures of RMSE, StrOD, and GSSI. These experiments
were conducted for three different replications of Xprior; that is, Xprior1, Xprior2, and
Xprior3, as discussed in Table 4.3. Similar to the ideal scenario observations, the near-
ideal scenario experiments also showed improvement with respect to the traditional
method. The performance measures of RMSE, StrOD, and GSSI discussed below also
demonstrate this.
4.4.5.1 RMSE results
For experiments with Xprior1, RMSE was reduced from 14.02(prior) to
10.71(Ω=100%) (see Figure 4.17). Similarly, the results for the experiments initiated
with Xprior2 and Xprior3 also showed improvement.
Figure 4.17: RMSE results w.r.t. Xtrue- Near-ideal, B-OD method
The percentage improvement in RMSE with respect to prior OD matrices Xprior1,
Xprior2, and Xprior3 are illustrated in Figure 4.18 for the traditional method and all near-
ideal experiments. The percentage improvements for Ω=20% (13.34, 12.46, and 6.23)
and Ω=40% (13.41, 18.28, and 8.17) were better than the improvement for the
traditional method (11.98, 10.88, and 3.88). However, a significant increase in
improvement was observed at Ω=60% (18.90, 21.22, and 11.49), and continued to be
relatively stable for Ω=80% (22.40, 21.37, and 13.03) and Ω=100% (23.61, 22.05, and
13.03), respectively.
Prior Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 14.02 12.34 12.15 12.14 11.37 10.88 10.71Xprior2 13.24 11.8 11.59 10.82 10.43 10.41 10.32Xprior3 12.36 11.88 11.59 11.35 10.94 10.75 10.75
10
11
12
13
14
15
RMSE
( X ,
X tru
e)
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 115
Figure 4.18: The percentage of improvement in the RMSE w.r.t. Xprior for near-ideal B-OD method
The percentage improvements for RMSE with respect to the traditional method
are shown in Figure 4.19 for all near-ideal cases. The percentage improvement in
RMSE was significant at Ω=60% (for Xprior1 and Xprior3) and 40% (for Xprior2),
respectively.
Figure 4.19: The percentage of improvement in the RMSE w.r.t. traditional method for the near-ideal B-OD method
4.4.5.2 StrOD results
For experiments with Xprior1, the StrOD value improved from 0.8142 to 0.8604
(see Figure 4.20). Similarly, the results for experiments initiated with Xprior2 and Xprior3
also witnessed improvements.
Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 11.98 13.34 13.41 18.90 22.40 23.61Xprior2 10.88 12.46 18.28 21.22 21.37 22.05Xprior3 3.88 6.23 8.17 11.49 13.03 13.03
2.00
7.00
12.00
17.00
22.00
% im
prov
emen
t in
RMSE
w.r.
t. Pr
ior O
D
Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 1.54 1.62 7.86 11.83 13.21Xprior2 1.78 8.31 11.61 11.78 12.54Xprior3 2.44 4.46 7.91 9.51 9.51
0.00
3.00
6.00
9.00
12.00
15.00
% im
prov
emen
t in
RMSE
w.r.
t. Tr
adtio
nal m
ethod
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 116
Figure 4.20: StrOD results w.r.t. Xtrue- near-ideal B-OD method
The StrOD plots in Figure 4.21 showed a sudden rise in the structural
improvements at Ω=60% for Xprior1 (3.97) and Xprior3 (5.04), and at Ω=40% for Xprior2
(3.91), respectively.
Figure 4.21: The percentage of improvement in the StrOD w.r.t. Xprior for the near-ideal B-OD method
With respect to the results of traditional method, the percentage improvements
for StrOD are shown in the Figure 4.22 for all near-ideal cases. A significant structural
enhancement was seen for Ω greater than or equal to 60% for Xprior1 and Xprior1, and
Ω greater than or equal to 40% for Xprior2, respectively.
Prior Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 0.8142 0.8107 0.8219 0.824 0.8465 0.8567 0.8604Xprior2 0.8178 0.8196 0.8302 0.8498 0.8599 0.8608 0.863Xprior3 0.7964 0.8054 0.8172 0.8244 0.8365 0.8413 0.842
0.78
0.8
0.82
0.84
0.86
0.88
StrO
D ( X
,X t
rue)
Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 -0.43 0.95 1.20 3.97 5.22 5.67Xprior2 0.22 1.52 3.91 5.15 5.26 5.53Xprior3 1.13 2.61 3.52 5.04 5.64 5.73
-0.50
1.00
2.50
4.00
5.50
% im
prov
emen
t in
StrO
D w
.r.t.
Prio
r OD
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 117
Figure 4.22: The percentage of improvement in the StrOD w.r.t. traditional method for the near-ideal, B-OD method
4.4.5.3 GSSI results
Experiments with Xprior1 GSSI (Figure 4.23) improved from 0.7248 to 0.7969
and a similar improvement was observed for experiments with Xprior2 and Xprior3.
Figure 4.23: GSSI results w.r.t. Xtrue- near-ideal B-OD method
The comparison results with Xprior show that the near-ideal experiments
performed better than the traditional method (see Figure 4.24). The improvement was
more significant for Ω>=40% for Xprior2 and for Ω>=60% for Xprior1 and Xprior3,
respectively.
Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 1.38 1.64 4.42 5.67 6.13Xprior2 1.29 3.68 4.92 5.03 5.30Xprior3 1.47 2.36 3.86 4.46 4.54
1.00
2.50
4.00
5.50
7.00
% im
prov
emen
t in
StrO
D w.
r.t.
Trad
tiona
l meth
od
Prior Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%
Xprior1 0.7248 0.7545 0.7698 0.7729 0.7847 0.7946 0.7969Xprior2 0.7406 0.7705 0.7781 0.794 0.7946 0.7998 0.8033Xprior3 0.7297 0.7536 0.7615 0.7692 0.7820 0.7829 0.7842
0.72
0.74
0.76
0.78
0.8
GSS
I( X
,Xtr
ue)
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 118
Figure 4.24: The percentage of improvement in the GSSI w.r.t. Xprior for the near-ideal B-OD method
The percentage improvements for GSSI for the traditional method are shown in
Figure 4.25. The results illustrate that the rate of improvement increased from Ω=40%
for Xprior1 and Xprior2, and was almost stable after Ω=60% for Xprior3 respectively.
Figure 4.25: The percentage of improvement in the GSSI w.r.t. traditional method for the near-ideal B-OD method
4.4.6 Discussion
The experiments conducted for the ideal and near-ideal scenarios of the B-OD
method demonstrate the ability of Bluetooth trips (in the form of B-OD structure) to
improve the quality of OD estimates.
Although the ideal scenarios showed significant improvement, it is unlikely that
the structure of the observed B-OD ( ) are error free. As a remedy for this, the near-
Tradtional Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%
Xprior1 4.25 6.21 6.64 8.26 9.63 9.95Xprior2 4.04 5.06 7.21 7.29 7.99 8.47Xprior3 3.28 4.36 5.41 7.17 7.29 7.47
3.00
5.00
7.00
9.00
% im
prov
emen
t in
GSS
I w.
r.t. P
rior O
D
Ω=20% Ω=40% Ω=60% Ω=80% Ω=100%Xprior1 1.88 2.29 3.85 5.16 5.47Xprior2 0.99 3.05 3.13 3.80 4.26Xprior3 1.05 2.07 3.77 3.89 4.06
0.00
2.00
4.00
6.00
% im
prov
emen
t in
GSS
I w.
r.t. t
radi
tiona
l m
etho
d
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 119
ideal experiments were conducted based on random B-OD structure. Despite
introducing the randomness, the near-ideal scenarios demonstrated significant
improvement in the quality of OD estimates as the percentage of Bluetooth
connectivity increased from Ω=20% to Ω=100%. For instance, RMSE reduced from
14.02 to 9.62 for Ω=100% in the ideal scenario (see Figure 4.8), and the random
structure of performed fairly well by reducing the RMSE. For instance, RMSE
improved from 14.02 to 10.71 for Ω=100% in the near-ideal scenario for Xprior1 (see
Figure 4.17).
Moreover, the randomness in reduced the value of the StrOD ( , Xtrue) from
the value of 1 (ideal scenario) to 0.8778 (near-ideal scenario). Thus, the maximum
StrOD value of the estimated OD matrices could only reach up to 0.8778 in the near-
ideal and for 1 in the ideal scenarios, respectively. For the same reason, the StrOD
values for Ω=100% could not be improved beyond 0.8778 (for instance, StrOD=
0.8604, 0.863, and 0.8420 for Ω=100% for the experiments based on Xprior1, Xprior2,
and Xprior3 respectively), while on the other hand, the StrOD reached 0.8880 for the
ideal scenario (Figure 4.11). Similar improvements were observed for GSSI as well.
The percentage improvement in the RMSE, StrOD, and GSSI was higher for
Ω>=50% in the ideal scenario. Similarly, for the near-ideal experiments, sudden
improvements were observed at Ω=40% and Ω=60% among the three different
replications. This shows that even though the Bluetooth observations were random, a
significant improvement in the quality of OD estimates could be achieved for a
Bluetooth connectivity rate of between 40%-60%. In other words, the overall quality
of OD matrix estimate could be significantly improved if at least 84 to 126 OD pairs
out of a total 210 OD pairs were randomly connected by Bluetooth sensors.
4.5 B-SP METHOD: OD MATRIX ESTIMATION USING B-SP
STRUCTURE
This section discusses the development of the B-SP method followed by the set
of experiments and results to demonstrate its efficiency compared to the traditional
method (Figure 4.7a). The proposed B-SP method incorporated the observed subpath
flows in the form of the B-SP vector ( ) within the objective function formulation, as
shown in Figure 4.26.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 120
Figure 4.26: Proposed B-SP method
The differences between the B-SP method and B-OD method are outlined below:
While the B-OD method assumes a complete sequence of trajectories, the
B-SP method computes statistics on sub-trajectories that are randomly
selected to develop Bluetooth subpath flows. Because the B-SP method
depends on subpath flows, the percentage connectivity (Ω) of Bluetooth
connected OD pairs is not relevant.
Because the penetration rate of Bluetooth trips is low, random and unknown,
the experiments in B-SP method are performed for different penetration
rates of Bluetooth trajectories (η= 10% to 50%), while the B-OD method is
based on fixed penetration rate of 20%.%. In reality, the penetration rate of
Bluetooth trajectories greater than 20% is very unlikely. However, for
demonstration purposes the maximum value of η is chosen to be 50%. Refer
section 4.7 for lower penetrates rates-based B-SP method.
In B-OD method, the structural consistency in Xk (i.e., X for kth iteration) is
maintained by minimising the structural deviation between X*k and B* in
every iteration. However, in the B-SP method, the structural deviation
between the subpath flows; that is, Pk and is minimised in every iteration,
k. Note that Pk is the result of assigning Xk over the network.
Aimsun model
=Xprior
Startk=1
Est. link flows ( ) ;assignment ( ); path-
proportion matrix ( )
Step length,
End
Is convergence
achieved?
Prior Step length prior
Upd
ate
OD
mat
rix
No
Yes
Bluetooth Subpath flows
vector,
Obs. Link flows,
= *
Update step length, based on and
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 121
Third, in the B-OD method, the maximum improvement in the quality of
OD estimates is only limited to the quality of used in the objective
function; that is, StrOD ( , X*true). On the other hand, in the B-SP method,
the maximum improvements in the final OD estimates are controlled by
StrSP ( , Ptrue) for different values of ƞ (as shown in Figure 4.27).
Figure 4.27: StrSP ( , Ptrue) for different ƞ%.
4.5.1 Objective function formulation
The objective function (Equation (58)) is expressed in terms of the deviation
between the observed and estimated link flows and the structural comparison of the
estimated B-SP flows (P) with the observed B-SP flows ( ) expressed as StrSP ( , P).
(58)
; (58a)
StrSP ( , )= (58b)
The matrix in Equation (58)a is the path proportion matrix that maps OD
flows to path flows. In Equation (58)b, is a vector (of dimensions same as ) with
each cell value equal to mean of the vector ; similarly for . The range of StrSP
( , ) values lies between -1 and 1. The stability of this combined formulation can be
explained in a similar fashion to that of the B-OD method. When the structure of
and is the same, StrSP ( , )is equal to “1”, and the objective function reduces to a
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 122
traditional link count deviation; that is, . On the other hand, when
StrSP ( , ) reaches its minimum value of “-1”, the objective function becomes
(assuming c=2), a physical interpretation of which was explained in
Section 4.4.2.
4.5.2 OD matrix estimation algorithm
This study also adopted a gradient descent algorithm for the B-SP method. The
search direction was defined by computing the gradient of the objective function, Z2
(Equation (58)) at the current solution. The prior step-size ( ) was chosen as 0.005
and was adjusted for every iteration using the bold-driver technique with parameters
and as 1.05 and 0.9, respectively. The termination criteria for the OD matrix
adjustment was 20 iterations.
The gradient of the objective function, Z2 was computed using Equation (59)
and its subsets.
(59)
(59b)
The derivation of the StrSP ( , APX) with respect to X is further explained as
follows:
First, Equation (59)a is simplified into Equation (60), and the derivative of StrSP
( , APX) with respect to X is given by Equation (60)a.
(60)
(60a)
After determining the search direction (Equation (59)), the updating step is
performed using Equation (61). Here, Z2 and X in refer to the values
corresponding to iteration k.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 123
= (61)
(61a)
4.5.3 Experiments for B-SP method
This section discusses the experiments for the B-SP method designed for
different penetration rates (η) of Bluetooth trajectories, which are then further
compared with the traditional traffic counts-based approach. The experiments for the
B-SP method were divided into the following five cases
B-SP case-1: Here, η = 10% with 5 number of replications. Thus, only 388 out of
3875 sub-trajectories were randomly selected in each replication.
B-SP case-2: Here, η = 20% with 5 number of replications. Thus, only 775 out of
3875 sub-trajectories were randomly selected in each replication.
B-SP case-3: Here, η = 30% with 5 number of replications. Thus, only 1163 out
of 3875 sub-trajectories were randomly selected in each replication.
B-SP case-4: Here, η = 40% with 5 number of replications. Thus, only 1550 out
of 3875 sub-trajectories were randomly selected in each replication.
B-SP case-5: Here, η = 50% with 5 number of replications. Thus, only 1938 out
of 3875 sub-trajectories were randomly selected in each replication.
The experiments for the B-SP method were tested for three prior OD matrices
i.e. Xprior = Xprior1, Xprior2, and Xprior3.
4.5.4 Results for B-SP method
The results (Xest) of all five cases were compared with the OD estimates from
the traditional method and the prior OD (Xprior) using RMSE, StrOD, and GSSI and
are discussed in the following sections.
4.5.4.1 RMSE results
The plot illustrated in Figure 4.28 shows a gradual improvement in RMSE from
14.02 (prior) to 11.29 (for η=50%) for Xprior1. Similarly, the results for the experiments
initiated with Xprior2 and Xprior3 also showed improvement.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 124
Figure 4.28: RMSE w.r.t. Xtrue ,B-SP experiments
The rate of improvement with respect to Xprior (Figure 4.30) began to rise at η =
10%, and slight improvement was observed for η > 30% for all three prior ODs.
Similar observations were found with respect to the traditional method (Figure 4.31).
Figure 4.29: Percentage of improvement in RMSE w.r.t. Xprior for the traditional and B-SP experiments
Figure 4.30: Percentage of improvement in the RMSE w.r.t. traditional method
Prior Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 14.02 12.34 11.84 11.76 11.68 11.41 11.29Xprior2 13.24 11.80 11.34 11.26 11.16 11.08 10.93Xprior3 12.36 11.88 11.42 11.31 11.30 11.22 11.17
10
11
12
13
14
15
RMSE
Experiments
Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 11.98 15.55 16.12 16.69 18.62 19.47Xprior2 10.88 14.32 14.94 15.71 16.28 17.46Xprior3 3.88 7.58 8.47 8.56 9.25 9.64
3.00
8.00
13.00
18.00
% im
prov
emen
t in
RMSE
w.
r.t. P
rior
OD
ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 4.05 4.70 5.35 7.54 8.51Xprior2 3.87 4.56 5.42 6.07 7.39Xprior3 3.84 4.77 4.87 5.59 5.99
3.00
4.00
5.00
6.00
7.00
8.00
9.00
% im
prov
emen
t in
RMSE
w.
r.t. T
radt
iona
l O
D
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 125
4.5.4.2 StrOD results
The StrOD results in Figure 4.31 show that there was improvement in the quality
of structure as η increased from 10% to 50%. The rate of improvement appeared to be
better than the traditional method, even for a penetration rate of η=10% (Figure 4.32).
However, the next rise in the rate was observed after η>=30%. On The other hand,
some decent improvements were observed with respect to the traditional method, at
η=10%, and it was significant for η>=30% (Figure 4.33).
Figure 4.31: StrOD w.r.t. Xtrue for the prior, traditional, and B-SP experiments
Figure 4.32: Percentage of improvement in the StrOD w.r.t. Xprior for the traditional and B-SP experiments
Figure 4.33: Percentage of improvement in the StrOD w.r.t. traditional method
Prior Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 0.8142 0.8107 0.8307 0.8327 0.8405 0.8415 0.8438Xprior2 0.8178 0.8196 0.8383 0.8391 0.8405 0.8442 0.8468Xprior3 0.7952 0.8054 0.8227 0.8236 0.8280 0.8289 0.8316
0.7900
0.8000
0.8100
0.8200
0.8300
0.8400
0.8500St
rOD
Experiments
Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 -0.43 2.03 2.27 3.23 3.35 3.64Xprior2 0.22 2.50 2.60 2.77 3.23 3.54Xprior3 1.28 3.46 3.57 4.13 4.23 4.58
-0.50
0.50
1.50
2.50
3.50
4.50
% im
prov
emen
t in
StrO
D
w.r.t
. Prio
r O
D
ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 2.47 2.71 3.68 3.80 4.08Xprior2 2.28 2.38 2.55 3.00 3.32Xprior3 2.14 2.26 2.81 2.91 3.25
2.00
2.50
3.00
3.50
4.00
4.50
5.00
% im
prov
emen
t in
StrO
D
w.r.t
. Tra
ditio
nal
met
hod
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 126
4.5.4.3 GSSI results
The GSSI results in Figure 4.34 also demonstrate a decent improvement as η
increased from 10% to 50%. The results appear to be better than the traditional method,
even for a penetration rate of η=10%. The rate of improvement also seemed to be stable
after η>=30% (see Figure 6.35 and Figure 4.36).
Figure 4.34: GSSI w.r.t. Xtrue for the prior, traditional, and B-SP experiments
Figure 4.35: Percentage of improvement in GSSI w.r.t. Xprior for the traditional and B-SP experiments
Figure 4.36: Percentage of improvement in the GSSI w.r.t. traditional method for B-SP experiments
Prior Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 0.7248 0.7545 0.7696 0.7741 0.7807 0.7808 0.7813Xprior2 0.7406 0.7705 0.7780 0.7834 0.7851 0.7861 0.7864Xprior3 0.7297 0.7536 0.7651 0.7671 0.7728 0.7739 0.7741
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
GSS
I
Experiments
Tradtional ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 4.10 6.18 6.80 7.71 7.73 7.80Xprior2 4.04 5.06 5.78 6.01 6.15 6.19Xprior3 3.28 4.86 5.12 5.91 6.06 6.08
3.00
4.00
5.00
6.00
7.00
8.00
% im
prov
emen
t in
GSS
I w.
r.t. P
rior
OD
ƞ=10% ƞ=20% ƞ=30% ƞ=40% ƞ=50%Xprior1 2.00 2.60 3.47 3.49 3.55Xprior2 0.98 1.67 1.89 2.03 2.07Xprior3 1.53 1.79 2.55 2.70 2.72
0.00
1.00
2.00
3.00
4.00
% im
prov
emen
t in
GSS
I w.
r.t. T
radi
tiona
l m
etho
d
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 127
4.5.5 Discussion
The results of the B-SP method indicate that with the increase in the penetration
rate (η) of Bluetooth trajectories, the quality of OD estimates did improve. The
experiments in the B-SP method were more realistic compared to those of the B-OD
method, because the Bluetooth inferred trips were not the complete sequences of trips
(thus, they were termed as subpaths) with the actual trips ends generally being
unobserved. The knowledge about Bluetooth subpath flows seemed to perform better
even for η=10% (i.e., 388 out of 3875 trips). This shows that few samples of Bluetooth
trajectories; for example, through key corridors, such as major arterials and
motorways, should serve the purpose of enhancing the quality of OD estimates for
large scale urban networks. From the results of the experiments, significant
improvement with respect to both prior OD and traditional method was observed for
η >30% from the RMSE and StrOD comparison and for η >=30% the improvement is
almost stable from the GSSI comparisons.
4.6 COMPARISON OF B-OD AND B-SP METHODS
Although the design of the experiments for the B-OD (based on Ω) and B-SP
(based on η) methods were different, they could be compared when Ω=100% for B-
OD methods and η =20% for B-SP methods. The B-OD method at Ω=100% and 20%
trajectory penetration rate implies 1,054 out of 5,273 trajectories (of complete length)
were used. On the other hand, η =20% for B-SP method implies that 775 sub-
trajectories were used. Intuitively, the B-OD method should therefore yield results
better than the B-SP method because B-OD matrices were developed from complete
trajectories.
Comparisons of B-OD-ideal Case-4 (i.e., Ω=100%), B-OD-near-ideal case-5
(i.e., Ω=100%) with B-SP case-2 (i.e., η =20%) for Xprior1 are shown in terms of
RMSE, StrOD, and GSSI in Figure 4.37, Figure 4.38, and Figure 4.39, respectively.
The B-OD-ideal case-4 (i.e., Ω=100%) results were superior to the results from
other methods because the structure of the B-OD was an exact representation of the
true OD. The results from B-OD-near-ideal case-5 (i.e., Ω=100%) are next to that of
the ideal case despite having a random B-OD structure. The results from the B-SP
Case-2 (i.e., η =20%) were next in the order followed by traditional method and prior
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 128
OD. The computational time required for both B-OD and the B-SP methods is roughly
around 15 minutes for each experiment.
Figure 4.37: RMSE comparison of the B-OD (ideal, near-ideal) and B-SP methods with prior OD and traditional methods
Figure 4.38: StrOD comparison of B-OD (ideal, near-ideal) and B-SP methods
Figure 4.39: GSSI comparison of the B-OD (ideal, near-ideal) and B-SP methods
14.02
12.34
11.1810.71
9.62
9101112131415
RMSE
Experiments
0.8142 0.8107
0.84910.8604
0.888
0.8
0.82
0.84
0.86
0.88
0.9
StrO
D
Experiments
0.7248
0.7556
0.78410.7969
0.8338
0.7
0.75
0.8
0.85
GSS
I
Experiments
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 129
4.7 B-SP METHOD FOR LOWER PENETRATION RATES OF
BLUETOOTH TRAJECTORIES
Since the penetration rate of Bluetooth trajectories (ƞ) is generally very low and
random, the B-SP vector ( ) can have lower flow values for most subpaths. To address
the low sample rate, one of the ways is to generate a B-SP vector by combining B-SP
flows observed from several days of similar travel patterns. For instance, the observed
B-SP flows from “D” regular working Mondays can be used to develop a consolidated
vector of observed B-SP flows (denoted by ) for a typical working Monday.
Thus, consolidating observations from several days with similar travel patterns in
a controlled environment can be achieved through the following step-by-step
procedure (refer Figure 4.40).
Step-1: Develop a database of “n=5” OD matrices that are structurally similar to
each other. One of these matrices would be the base OD matrix (Xtrue) and the rest
are generated by randomly perturbing Xtrue with a standard deviation of 5%; and
set i=1.
Step-2: Load the Aimsun network with ith OD matrix= Xtrue,i and run r=5
replications. This implies that we have n*r = 25 simulations in total. The resulting
trajectories from each replication are stored as a complete sequence of scanner
IDs. The first and last scanner IDs of the complete trajectory are directly linked to
the actual origin and destination zones of the complete trip. The total number of
trajectories are identified to be 5,273 for this study.
Step-4: Convert the trajectories to sub-trajectories by de-selecting a few scanner
IDs from the beginning and ending of the complete trajectory sequence (this is
done because the actual Bluetooth trajectories do not always begin with, and end
into true trip ends). From the resulting number of sub-trajectories identify total
unique subpaths. In this study, the maximum number of unique subpaths identified
is 113.
Step-5: Since the penetration rate (ƞ %) of Bluetooth trajectories is very low, the
B-SP vector for each simulation is generated for a range of ƞ % = 2.5%, 5%, 7.5%
and 10%. To mimic the randomness in real-world scenario, ƞ % Bluetooth
trajectories are randomly selected from the total pool of sub-trajectories (3,875 for
this study) and are used to generate B-SP vector for ith OD matrix and rth
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 130
replication ( ) of size 113*1. Note that random selection of ƞ% might not
account all subpaths. This implies some of the subpaths can contain zero flow
values.
Step-6: Combine the subpath flows from all r=5 replications i.e.
and to create a consolidated vector of B-SP flows per ith OD matrix ( ). Each
B-SP vector can be considered as a representation of Bluetooth subpath flows
(ƞ%) from a particular day. In other words, we have database from D=25 different
days with similar traffic patterns for a particular ƞ% .
Step-7: Set i=i+1. If i is less than equal to n=5 then GO TO Step-2. Else GO TO
Step-8.
Step-8: Combine the B-SP flow vectors from each to develop the consolidated
vector of observed B-SP flows ( ) and terminate the simulation.
Step-9: Repeat Step-1 to Step-8 for the rest of ƞ values.
Figure 4.40: Generation of consolidated B-SP flows vector, ( ) for a particular ƞ%
B-SP flows ( ,i)
.
Traj-rRep-r
Aimsun model
Xtrue,i
Start
Stop
i=1
Rep-1 B-SP flows ( ,i)
.
Average B-SP flows per OD type ( )
i<=n
Average B-SP flows of “n” OD types with
“r” replications each ( )
Yes
Traj-1
Subtraj-r
Subtraj-1
Unique subpaths-2
.Unique
subpaths-1.
Select % of sub-trajectories
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 131
4.7.1 Experiments for B-SP method (lower penetration rates):
Here, the experiments are conducted using traditional traffic counts-based approach
and four different penetration rates (η) for each of Xprior. Thus, 5 cases for 3 prior OD
scenarios imply 15 experiments in total. The description of 5 cases are as follows:
1. Traditional case: Only observed link flows are used for OD estimation.
2. B-SP case-1: Observed link flows and observed Bluetooth subpath flows at ƞ =
2.5% are used in OD estimation.
3. B-SP case-2: Observed link flows and observed Bluetooth subpath flows at ƞ =
5% are used in OD estimation.
4. B-SP case-3: Observed link flows and observed Bluetooth subpath flows at ƞ =
7.5% are used in OD estimation.
5. B-SP case-4: Observed link flows and observed Bluetooth subpath flows at ƞ =
10% are used in OD estimation.
4.7.2 Results for B-SP method (lower penetration rates)
The quality of the OD estimates (Xest) from all experiments are assessed using the
goodness of fit criteria described in the below sections.
4.7.2.1 RMSE results
The plot illustrated in Figure 4.41 shows a gradual improvement in
from 14.02 (prior) to 11.34 (for η=10%) for the set of experiments initiated with
Xprior1. Similarly, the results for the experiments initiated with Xprior2 and Xprior3 have
also demonstrated improvement.
Figure 4.41: for all experiments compared with prior OD
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 132
Based on the average of results from three prior OD
scenarios, the percent improvements in for all 5 cases are illustrated
in Figure 4.42. We can see that traditional case (based on link flows only) showed
8.89% improvement in RMSE, which increased significantly to 13.41% for ƞ=2.5%
(consolidation of 25 days). The results showed a gradual improvement for rest of the
penetration rates.
Figure 4.42: Average percentage of improvement in w.r.t. Xprior
for all cases
4.7.2.2 StrOD results
The results shown in Figure 4.43 demonstrate that there is structural
improvement in the OD estimates as η increased from 2.5% to 10%. The Figure 4.43
also highlights that the traditional traffic counts-based approach could not bring any
significant structural improvements in the OD estimates unless additional information
from Bluetooth subpath flows is introduced.
Figure 4.43: for all experiments compared with prior OD
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
Traditional ƞ% = 2.5%, 25 days
ƞ% = 5%, 25 days
ƞ% = 7.5%, 25 days
ƞ% = 10%, 25 days
8.89
13.41 13.87 14.29 15.26
% R
MSE
impr
ovem
ent
Cases
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 133
Based on the average of results from three prior OD
scenarios, the percent improvement in for all 5 cases is illustrated
in Figure 4.44. It can be seen that rate of improvement for ƞ =2.5% to ƞ =10% are
better than that of traditional method. The traditional method could achieve only 0.36%
improvement in , which increased drastically to 2.72% for ƞ=2.5%,
and then to 3.09%, 3.15%, 3.64% for ƞ=5%, 7.5%, and 10% respectively.
Figure 4.44: Average percentage improvement in w.r.t. Xprior for
all cases
4.7.3 Discussion
The goodness of fit measurements namely, and
showed significant improvement with respect to both prior OD and traditional method
even at lower penetration rates of Bluetooth trips (i.e. ƞ =2.5% observed from 25 days).
We can also see that the results for ƞ >2.5% were better than ƞ =2.5%. However, in
practice, the chances of ƞ =2.5% is higher than ƞ =10%, and significant improvement
in the results at ƞ =2.5% demonstrated the practical significance of the proposed
methodology. For instance, few samples of Bluetooth trajectories through key
corridors that serve higher traffic demand such as, major arterials and motorways,
should serve the purpose of enhancing the quality of OD estimates for large scale urban
networks.
The trend of improvement in both and is same for
the cases based on Bluetooth subpath flows. However, the traditional method did not
show any significant structural enhancements (see Figure 4.43) although the
measure is improved (see Figure 4.41). This shows that preserving
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 134
the structure of OD matrix using additional path-based information from Bluetooth
short trips (which we referred as subpath flows) helps to direct OD convergence
towards a better solution estimate instead of ‘getting stuck’ in the local optima.
4.8 SUMMARY
This chapter presented methods for integrating the structural information about
Bluetooth trips into the objective function of bi-level formulation with the purpose of
improving the quality of OD matrix estimates. To achieve this, the study proposed –
B-OD and B-SP methods. The B-OD method is applicable for networks (such as
Brisbane city) that have a good connectivity of Bluetooth scanners. However, when
the penetration rate of Bluetooth trajectories is low (for instance if the OD needs to be
estimated for a sub-network defined at the outer suburbs of Brisbane city), the B-SP
method is more practical.
The proof of the concept was first tested by assuming that the structure of
Bluetooth trips exactly represented the structure of the true OD through an ideal
scenario of the B-OD method. Having achieved considerable improvements in the
results, randomness was then introduced into the structure of the B-OD flows in the
near-ideal scenario of the B-OD method. The experiments for the near-ideal scenario
of the B-OD method demonstrated significant improvements in the quality of the OD
matrix estimate for different rates (Ω) of Bluetooth connected OD pairs.
The B-SP method was specifically designed to closely represent the realistic
observations of Bluetooth trajectories through the concept of subpath flows. The
experiments for the B-SP method also demonstrated significant improvements in the
quality of OD matrix estimates as measured through RMSE, StrOD, and GSSI. While
the results of the B-SP method were not superior to those of the B-OD methods, it
must be understood that: a) the results of the B-SP method were far better than those
of the traditional method, and b) the B-SP method was more practically applicable to
real Bluetooth observations compared to the B-OD methods.
The results of the experiments among all methods suggest another interesting
finding with respect to the Bluetooth connectivity rate and penetration rate of
Bluetooth trips/trajectories. The ideal scenario suggested 50% and the near-ideal
scenario suggested that 40%-60% of Bluetooth connectivity of OD pairs (with 20%
penetration rate of Bluetooth trajectories) for significant improvement in the results.
Chapter 4: Assignment-based OD Matrix Estimation: Exploiting the Structure of Bluetooth Trips 135
The B-SP experiments concluded that even a minimum penetration rate of 10% of
Bluetooth trajectories (subpaths) would result in considerable improvements
compared to the traditional method and further penetration rate of 30% would achieve
greater improvement in the quality of the OD estimates.
The study also showed that the proposed B-SP method is robust for lower sample
rate (i.e. ƞ =2.5%) of random Bluetooth observations from several days of similar
travel patterns. The Brisbane City Council (BCC) and the Department of Transport
Main Roads (TMR) have been recording the Bluetooth observations on a continuous
basis, and it is possible to have the database of traffic observations from several days
representing similar travel patterns (Behara, et al., 2018). Thus, the B-SP method is
ready for practical implementation on real world networks with trajectories and loop
counts database.
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 136
Chapter 5: Non-Assignment-based OD
Matrix Estimation:
Exploiting Observed
Turning Proportions and
Structure of Bluetooth Trips
This chapter presents the background about the issues related to the assignment-
based OD estimation methods in Section 5.1, a comparison between traditional bi-
level models and the proposed approach is provided in Section 5.2, the study networks
is described in Section 5.3, the underlying concept of possible paths is outlined in
Section 5.4, the OD matrix estimation methodology is discussed in Section 5.5, the
experiments and results for the two networks are described in Section 5.6, and Section
5.7, and finally the summary of the chapter is provided in Section 5.8.
5.1 BACKGROUND
The design of a bi-level OD estimation framework is such that the dependence
on “assignment” has become crucially important. Since both are unknown, the OD
matrix and “assignment” are mutually estimated until convergence to obtain the final
estimate of the OD matrix. The non-separable relationship between both (see Equation
(3)) makes the bi-level problem non-convex and non-differentiable.
While the mapping relationship between link flows and OD flows is well-
established, there are several problems (as discussed in Chapter 1) associated with
respect to the accuracy of the assignment models, and most importantly, the
computational costs associated with the bi-level framework.
Several researchers have proposed alternative methods/heuristics to simplify the
problem’s complexity, especially related to the assignment formulation (either
analytical/simulation). Minimising the use of an assignment matrix has been the prime
focus of most recent studies involved with OD matrix estimation. For example,
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 137
Cheung, Wong, and Tong (2006) demonstrated the method of successive averages
(MSA) to approximate a simulated assignment. However, these methods ignore the
discrepancy that exists between a fixed assignment matrix and the updated OD matrix
and its corresponding assignment matrix. Some researchers have proposed an update
to the assignment matrix during the OD matrix estimation iterative process (Yun &
Park, 2005; Zhu, 2007), while others have suggested linearizing the assignment;
however, this requires two simulations per iteration (Lundgren & Peterson, 2008a;
Maher, et al., 2001). Some researchers have suggested one simulation per iteration but
to linearize the assignment using first order Taylor expansion (Toledo & Kolechkina,
2013), while others have suggested a weighted average of previous assignments
(Masip, et al., 2018). Osorio (2019) recently proposed a metamodel to derive the
analytical formulation of the simulated link counts as a function of OD flows so that
gradient-based algorithms could be easily employed. However, there is always a trade-
off between the computational cost and the accuracy of the OD matrix estimates due
to assignment approximations.
On the other hand, integration of different big traffic data sources could
potentially provide more opportunities for a good blend of empirical (data-driven) and
theoretical methodologies such as relaxing the complete dependence on explicit
assignment formulation in an OD optimisation formulation. Probably, as we gain more
confidence on these data sources, non-assignment-based OD estimation methods
might not be far from being achieved. Sprung from one such data driven ideas, this
chapter develops a non-assignment-based method to estimate OD matrices from
observed turning proportions and the structure of Bluetooth OD flows. The complexity
of bi-level optimisation is reduced to single-level formulation in this chapter.
Further discussion about this proposed methodology is outlined in the following
sections.
5.2 OD MATRIX ESTIMATION: TRADITIONAL VERSUS
PROPOSED APPROACH
The flowchart in Figure 5.1 illustrates the proposed non-assignment-based OD
matrix estimation method. This method can be compared with the traditional bi-level
method discussed in Figure 1.6. The major differences between both methods are:
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 138
The key idea of the proposed approach is that it considers link flows to be
the proportion of origin flows and not the proportion of OD flows (as used
in the usual traditional models);
The traditional approach is based only on observed link counts, while the
proposed approach depends on observed turning proportions and the
structure of Bluetooth OD flows in addition to observed link counts; and
The traditional approach is a bi-level process where, the OD matrix and
assignment matrix are optimised in the upper level and lower levels,
respectively. However, the proposed approach is independent of the
assignment matrix and is therefore a single-level approach. The mapping
relationship is directly derived from observed turning counts; thus, only the
OD matrix is updated.
Figure 5.1: Non-assignment-based OD matrix estimation methodology
Before providing a detailed explanation of the proposed methodology (Figure
5.1) in Section 5.4.2, the study network and the key difference between the traversed
and possible paths are described in Sections 5.3 and 5.4, respectively.
5.3 STUDY NETWORKS
The proposed methodology was tested on both a toy network with sufficient
route choice options between OD pairs that could represent realistic traffic behaviour
(see Figure 5.2 for the sketch of the network) and a realistic network of Brisbane city
(see Figure 5.3). The details of both networks are discussed in more detail below.
Prior OD flows, Xprior
Startk=1
Est. link flows ( )
Step length,
End
Is convergence
achieved?
Prior Step length, prior
No
Yes
Bluetooth OD vector,
Bluetooth OD connectivity,Ix
OD flows,
Origin flows,
All possible sub-paths
Obs. Turn. proportions
Turning proportion matrix, S
Incidence matrix,
Obs. Link flows,
Update step length, based on and
Upd
ate
OD
mat
rix
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 139
5.3.1 Toy network
The number of origins was two ( & ) and the number of destinations was
three ( , & ), with a total of 6 OD pairs (in the order
of , ). There were 23 links in total, but the
number of selected links with installed detectors- d2, d5, d9, d14, d15 and d16, was L = 6.
Figure 5.2 shows the origins, destinations, nodes, links, detector locations, and
observed turning proportions at all intersections.
Figure 5.2: Sketch of the toy network
5.3.2 TMR network
A medium scale network was developed from the Brisbane City road network
that is under the control of Transport Department of Main Roads (TMR, 2016). It had
9 zones, with each acting as both origin and destination. Ignoring the internal trips,
there were 9*9-9=72 OD pairs. Among the nine zones, only two zones corresponding
to the CBD and Garden City were chosen as internal, while the remaining even zones
were chosen as external zones. The network had 12 loop detector count locations and
turning count proportions were observed at all 35 intersections. Because each
intersection was equipped with a BMS unit, there were 35 Bluetooth scanners. The
study network is shown in the Figure 5.3.
To facilitate the computation of GSSI, the OD matrices were further split into
geographical windows using the knowledge of higher-level zones (see the
geographical window concept in 3.3 for further details). The details of the higher-level
zones (hz) that were formed as a combination of lower level zones are: hz1 included
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 140
Z1 and Z8; hz2 included Z6 and Z5; hz3 included Z4, Z3 and Z7; and hz4 included Z9
and Z2, respectively.
Figure 5.3: TMR network
5.4 CONCEPT OF POSSIBLE PATHS
The proposed methodology attempted to estimate link flows from the origin
flows and the turning proportion (S) matrix, and not from the OD flows and assignment
matrix. The turning proportions, as the critical construct of the matrix S, were observed
from the sequence of intersections that connected the link count locations with all
origins. In this study, the turning proportions were assumed to be known. The
sequences of intersections from each origin represented the set of all possible paths
among which the paths traversed by vehicles were only a subset (see Section 5.4.2 for
further details).
5.4.1 Possible paths in the toy network
For the toy network shown in Figure 5.2, the total possible paths between all OD
pairs is K=54, and paths traversed by vehicles is =8 (see Figure 5.4 for all eight paths
traversed by vehicles in simulation).
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 141
Figure 5.4: Paths traversed by vehicles in simulation
Using the same concept, the paths leading to any link can be categorised as
traversed and possible paths. To demonstrate this, consider the traversed paths (Figure
5.5) and possible paths (Figure 5.6) until link, l14.
Figure 5.5: Traversed paths from all origins until link, l14
Figure 5.6: Possible paths from all origins until link, l14
5.4.2 Possible paths in the TMR network
Although numerous possible paths are feasible from each origin until each of the
counting location (detector), the study has chosen only a few key corridors to minimise
the complexity. The design parameter maximum likely possible paths (MLPP) was
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 142
chosen to be a maximum of 8 per OD pair. The Figure 5.7 illustrates the number of
possible paths chosen from each origin zone (Zi) to the detector location.
Figure 5.7: The number of possible paths from all origins until the detector locations of TMR network
5.5 OD MATRIX ESTIMATION METHODOLOGY
The proposed method updated the prior OD matrix by iteratively minimising the
objective function until convergence. The objective function formulation was based
on the deviations of the link flows (observed and estimated) and structural comparison
of the OD flows (observed Bluetooth and estimated), which were expressed as a
function of the OD matrix. The estimated link flows were obtained through the new
mapping relationship based on the turning proportion matrix explained in Section
5.5.1. The structural comparison of OD flows is explained in Section 5.5.2. Finally,
the proposed objective function formulation is described in Section 5.5.3.
5.5.1 Link flows estimation from turning proportion matrix
Turning proportions/probabilities generally refer to the ratio of turning volume
to the approach volume at an intersection. Figure 5.8 represents an isolated intersection
with turning movements and associated proportions. The flows on link l5 are diverted
towards l12, l15 and l11 via, left, through and right movements, respectively. Thus, the
number of turn movements, m=3 and their corresponding turning proportions are
0.076, 0.836 and 0.088, respectively.
2
1
1
4
5
2
3
0
1
2
2
1
1
2
2
2
4
1
2
0
1
1
1
1
3
0
1
3
3
1
5
1
1
4
2
3
0
2
8
3
1
1
3
3
2
2
2
0
1
1
3
0
1
2
2
1
1
2
3
1
3
2
5
0
0
2
1
1
1
3
3
2
3
4
1
2
4
1
1
1
3
1
2
2
3
5
1
3
2
2
2
2
2
2
4
3
2
4
1
5
1
1
1
0
4
0 5 10 15 20 25 30 35 40
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
Number of possible paths until each detector location
Det
ecto
rs
Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 143
Figure 5.8: Schematic representation of an isolated intersection and associated turning proportions
The turning proportion (S) matrix is a tableau representation of the proportions
of origin flows that pass through selected links. It has dimensions of L* and each
cell value is represented by S (l, o) = (see Equation (62)).
For example, (kl,o)th path connects lth link with oth origin, and the total number of
such possible paths are . The turning proportion at intersection present along
(kl,o)th path is denoted by . There are intersections present along the (kl,o)th
path.
(62)
(62a)
The product of the turning proportions along the (kl,o)th path yields the
probability of origin flows passing through (kl,o)th path and observed at lth link, and is
represented by (see Equation (62)a).
Summing up the probabilities along all paths ( ) connecting oth origin to lth
link should give the total probability of trips generated from oth origin observed at lth
link. This total probability with respect to oth origin is represented by as shown in
Equation (62).
The total link flow generated from oth origin is given by multiplying with the
total trips produced from oth origin (i.e., ), as shown in Equation (63), and the total
link flows that result due to the flows produced from all origins is given by Equation
(64).
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 144
) (63)
)
(64)
To demonstrate the above formulations, consider Figure 5.6a for all possible
paths from until link l14. Table 5.1 shows the computations of aforementioned
formulations to further estimate link flows ( ) at l14 .
Table 5.1: Demonstration of equation (62) for l14
l1- l2- l14 0.25*1=0.25 l8- l7- l4-l2- l14 0.59*0*0.25*1=0
l1- l3- l5- l12 -l14 0.75*1*0.076*1
=0.057
l8- l7- l5-l12- l14 0.59*1*0.076*1=0.045
l1- l3- l6- l9 -l10 – l12 -l14 0.75*0*1*0*0*1=0 l8- l9- l10-l12- l14 0.41*0*0*1=0
= 0.25+0.057+0=0.307 0+0.045+0=0.045
If the number of trips generated from are =425 and are =490, then
link flows from and until are: = * = 425*0.307=130.48, and =
* =490*0.045=22.05. Thus, total flow on is approximately = +
=130.48+22.05=153. This is exactly equal to the flow observed from simulation.
The proof of this concept is further demonstrated using the network (see Figure
5.9) considered by Bar-Gera et al. (2006) in their study.
Figure 5.9: Sample network used by Bar-Gera et al. (2006)
The paths and path flows for the sample network shown in Figure 5.9 are
described in Table 5.2.
Table 5.2: Paths and path flows for Bar-Gera et al. (2016) network
1
4
2
5
3
6
A
C
B
D
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 145
Origin A Origin B Path Path Flow Path Path Flow
[A,1,4,5,6, D] 10 [B,3,2,1,4, C] 6 [A,1,2,3,6, D] 5 [B,3,6,5,4, C] 4
Origin C Origin D Path Path Flow Path Path Flow
[C,4,5,2,3, B] 8 [D,6,5,2,1, A] 12 [C,4,5,6, D] 20 [D,6,5,4, C] 1
From Table 5.2, the total traffic counts observed on link, l5-2 is 8+12=20. The
total possible paths contributing to the flows on link, l5-2 are shown in Table 7.3
Table 5.3: Link flows at link, l5-2 estimated using the proposed approach
Origin Origin flows
Possible paths Product of Origin flows and Turn Proportions along the path until link, l5-2
A 15 A-1-4-5-2 A-1-2-3-6-5-2
15*0.667*0.625*0.211 = 1.316 15*0.333*1*0.385*0.444*0.706 = 0.603
B 10 B-3-6-5-2 B-3-2-1-4-5-2
10*0.4*0.556*0.706 = 1.569 10*0.6*1*0.667*0.625*0.285 = 0.714
C 28 C-4-5-2 28*1*0.211 = 5.895 D 13 D-6-5-2 13*1*0.8 = 10.400 Estimated link flows at link, l5-2 20.497
From Table 5.3, it is clear that the estimated link flows on link l5-2 is 20.497 and
the estimated flows are close to the actual flow values; that is, 20. Thus, the proposed
approach proves valid and is a good alternative to assignment-based estimation of link
flows.
While the proposed turning proportions-based approach seems promising,
numerous paths can add to the complexity of the problem for realistic medium to large
scale networks. In such situations, it is recommended to consider the number of most-
likely possible paths (MLPP) as the design parameter. For instance, considering MLPP
as less than or equal to 10 might be a solution. The TMR network shown in Section
5.3.2 was designed especially to demonstrate that this approach performs well with
MLSP as a design parameter. Note: The analysis conducted in this chapter assumed
no errors in observed turning proportions and link flows.
5.5.2 The structural comparison of OD flows
Another goal of the proposed objective function formulation is to minimise the
structural deviation (or maximise the structural similarity) between the estimated and
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 146
observed Bluetooth OD flows expressed using the formulation described in Equation
(53). The concept of incorporating “structural comparison of OD flows” was discussed
in Section 4.4.1.
5.5.3 OD matrix estimation formulation
The OD vector, X is related to origin flows, G through an incidence matrix, I, as
shown in Equation (65).
(65)
For the study network shown in Figure 5.2, the incidence matrix is represented
as shown in Equation (66).
(66)
In Equation (66), the first and second rows correspond to origins, O1 and O2;
and the columns corresponds to the OD pairs, , ,
respectively. The value of “1” indicates that the OD pair, belongs to
the origin , and is “0” otherwise.
Based on Equation (66), Equation (65) can be shown as Equation (67) for the
study network.
(67)
Where, X1 to X6 are OD flows corresponding to the OD
pairs, , , ; and G1 and G2 are the origin flows
of O1 and O2 of the study network.
The relationship between the link flows vector (Y) and the OD vector, X is
shown in Equation (68). Here, S is the matrix that consists of (see Equation (62)).
For the study experiments (see Section 0), the size of S was 6 x 2; thus, the size of Y
was 6 x 1.
(68)
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 147
The formulation of the objective function (Z3) consisting of the deviations of
link flows and the structural comparison of OD flows is shown in Equation (69).
(69)
The notations and the stability of this combined formulation are explained in
Section 4.4.1 and the value of c is assumed to be 2 for entire analysis discussed in this
chapter.
5.5.3.1 Gradient (search direction) computation
The gradient descent algorithm explained in Section 4.4.2 was adopted for this
method. The search direction was defined by computing the gradient of the objective
function, (Z3) with respect to OD vector (X) as explained through equations (70)a-
(70)b. The derivative of with respect to X is already explained in
Equation (55).
(70)
(70a)
Where, (70b)
5.5.3.2 Updating step of OD matrix
The OD matrix X is updated using the formulation shown in Equation (71). Here,
and are the OD matrices for the current (kth) and next ((k+1)th) iterations,
respectively. The optimum step length ( ) is obtained using Equation (57).
= (71)
Where, (71a)
The termination criteria for the optimisation problem in this study was chosen as
100 iterations.
5.6 EXPERIMENTS AND RESULTS: TOY NETWORK
True OD flows (Xtrue), observed Bluetooth OD flows ( ), observed link counts
( ), and turning proportions for every turning movement at the intersection ( ) for
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 148
the study network (Figure 5.2) were synthesized using Aimsun Next. The true OD
flows for OD pairs that are Bluetooth connected are represented by .The
Bluetooth OD matrix ( ) was generated using 20% of true with random fluctuations
of +/- 5%, as shown in Equation (72), where rand () function choses any value between
0 and 1.
) (72)
The experiments performed in this study consisted of six different cases, as
discussed below:
Case 1: Here, the objective function only minimised the deviation of link
counts.
Case 2: In this case, the objective function minimised the deviation of link
counts and maintained structural consistency using the structure of
Bluetooth OD flows from Ω=33% of OD pairs (i.e., only 2 out of 6 OD pairs
were Bluetooth connected).
Case 3: Here, the objective function minimised the deviation of link counts
and maintained structural consistency. Here, Ω=50% (i.e., only 3 out of 6
OD pairs were Bluetooth connected).
Case 4: The objective function minimised both deviation of link counts and
maintained structural consistency. Here, Ω=67% (i.e., only 4 out of 6 OD
pairs were Bluetooth connected).
Case 5: The objective function minimised both link counts deviation and
maintained structural consistency using Ω=83% (i.e., only 5 out of 6 OD
pairs were Bluetooth connected).
Case 6: The objective function minimised both link counts deviation and
maintained structural consistency using Ω=100% (i.e., all 6 OD pairs were
Bluetooth connected).
The results of the above-mentioned six cases were compared using two statistical
performance measures: and as previously discussed
in Equation (47) and Equation (49), respectively.
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 149
5.6.1 Convergence of gradient descent algorithm
The convergence of the gradient descent algorithm is demonstrated here by
plotting the RMSE and StrOD from consecutive iterations for all six cases, as shown
in the Figure 5.10 and Figure 5.11.
Figure 5.10: Convergence of RMSE for all cases
Figure 5.11: Convergence of StrOD for all cases
Figure 5.10 shows that the RMSE converged for all six cases, with the highest
value for Case-1 and lowest for Case-6. Similarly, in Figure 5.11, the similarity of the
structures between estimated and true OD improved from Case-1 to Case-6.
5.6.2 Structural consistency
Although the chosen prior OD had a high error value (102.59) and poorer
structure (0.2782) compared with that of true OD, Figure 5.12 demonstrates a
significant reduction in the RMSE value from 102.59 to 67.83 in both Case-1 and
Case-2. However, Figure 5.13 illustrates not much improvement in structure (StrOD
=0.3104 and 0.3105 for Case-1 and Case-2, respectively). This is because no additional
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
110.00
1 3 5 7 9 111315171921232527293133353739414345474951535557596163656769
RMSE
com
paris
on o
f tru
e an
d es
timat
ed O
D
Iterations
Case 1 Case 2 Case 3 Case 4 Case 5 Case 6
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
1.0000
1 3 5 7 9 111315171921232527293133353739414345474951535557596163656769
Str c
ompa
rison
of
true
and
estim
ated
OD
Iterations
Case 1 Case 2 Case 3 Case 4 Case 5 Case 6
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 150
structural knowledge of Bluetooth OD flows was used in Case-1, and only two OD
pairs were Bluetooth connected in Case-2 respectively. It is also clear from Table 5.5
that Case-1 overestimated and under-estimated the OD flows for O1-D1 (224.7) and
O1-D3 (111.3), respectively. Thus, if the quality of the prior OD is poor, a dependence
on only the deviations of link counts might not improve the quality of OD matrix
estimates. However, with the availability of additional structural knowledge, the
quality of OD estimates can be enhanced by maintaining structural consistency despite
starting with a poor prior OD. Referring to other cases in Figure 5.12 and Figure 5.13,
these show that the rate of improvement increased with Ω%.
Figure 5.12: RMSE comparison with
Figure 5.13: StrOD comparison with
5.6.3 Under-determinacy problem
The results also highlight the under-determinacy problem of traffic counts-based
OD matrix estimations. In other words, there could be many possible solutions for OD
matrices reproducing the same set of link counts. For instance, the link flows estimated
from Case-1 exactly matched the true link counts (see Case1 in Table 5.4). In fact, the
estimated link flowed from all six cases (Table 5.4) exactly matched the true flows
102.59
67.83 67.83
51.63
26.23 22.0814.36
0.00
20.00
40.00
60.00
80.00
100.00
Prior OD Case1 Case2 Case3 Case4 Case5 Case6
RMSE
com
paris
ion
OD from differnet cases
0.2782 0.3104 0.3105
0.6155
0.9507 0.9539 0.9729
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prior OD Case1 Case2 Case3 Case4 Case5 Case6
Stru
ctur
al co
mpa
risio
n
OD from differnet cases
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 151
( ), while their corresponding OD matrix estimates were different from each other
(Table 5.5). In Table 5.4, YPrior shows the initial estimate of link counts from prior OD
matrix.
Table 5.4: Comparison of link flows for the selected links
Links YPrior case1 case 2 case 3 case 4 case 5 case 6 l2 107 48 107 107 107 107 107 107 l5 609 265 609 609 609 609 609 609 l9 199 84 199 199 199 199 199 199 l14 153 68 153 153 153 153 153 153 l15 509 222 509 509 509 509 509 509 l16 253 107 253 253 253 253 253 253
Table 5.5: Comparison of OD demand flows
OD pairs Xtrue XPrior X case1 X case 2 X case 3 X case 4 X case 5 X case 6
O1-D1 107 101 225 225 142 106 108 115 O1-D2 91 40 89 89 128 72 84 93 O1-D3 227 50 111 111 155 247 233 217 O2-D1 199 80 190 190 151 169 174 224 O2-D2 92 36 86 86 68 76 74 72 O2-D3 199 90 214 214 272 246 242 194
5.6.4 Optimal percentage of Bluetooth connectivity
The results show a sudden jump and then later stabilisation after a certain
percentage of Ω. There was only a marginal improvement in both RMSE and StrOD
values, with an increase in Bluetooth OD connectivity from 67% (i.e., Case-4). The
network considered in this study might be simple, but for realistic networks,
knowledge of optimum Bluetooth connectivity has immense financial implications,
such as a reduction in installation and maintenance costs of the infrastructure.
5.7 EXPERIMENTS AND RESULTS: TMR NETWORK
True OD flows (Xtrue), Bluetooth OD flows ( ), observed link counts ( ), and
turning proportions for every turning movement at the intersection ( ) for the
study network (Figure 5.3) were synthesized in Aimsun. The Bluetooth OD matrix ( )
was randomly generated, with a mean 0.1 times of and standard deviation of +/-
10%, as shown in Equation (72), where the rand () function choses any value between
0 and 1.
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 152
(73)
The true OD flows for OD pairs that are Bluetooth connected are represented by
.To compare the performance of the proposed non-assignment-based approach
with a bi-level approach, two sets of experiments: non-assignment-based and
assignment-based, were designed. Both experiments were further divided into five
cases, and these are discussed in the following sections. Note that Xprior is generated
as described in Equation (50). The quality of Xprior can be expressed through RMSE
(Xprior, Xtrue) = 64.8 and GSSI (Xprior, Xtrue) = 0.6956.
5.7.1 Non-assignment-based vs assignment-based experiments
Case 1: Here, the objective function in both experiments was based only on
the deviation of link counts, without any additional knowledge of Bluetooth
OD. Note that traditional method for the non-assignment-based approach
implies that link counts were obtained from the proposed turning
proportions-based formulation, as discussed in Section 5.5.1. On the other
hand, they were obtained from simulation in the assignment-based method.
Case 2: In this case, the objective functions in both experiments included
the deviation of link counts and structural comparison of estimated OD
flows with Bluetooth OD flows for Ω=25% of OD pairs (i.e., only 18 out of
72 OD pairs were Bluetooth connected).
Case 3: This is similar to Case-2, with only a difference in the number of
Bluetooth connected OD pairs; that is, Ω=50% (i.e., only 36 out of 72 OD
pairs were Bluetooth connected).
Case 4: This is similar to Case-2 except that Ω=75% (i.e., only 54 out of 72
OD pairs were Bluetooth connected).
Case 5: This is similar to Case-2 except that Ω=100% (i.e., all 72 OD pairs
were Bluetooth connected).
The results of the above-mentioned six cases were compared using two statistical
performance measures: and GSSI as previously discussed
in Equation (47) and Equation (49), respectively. These are further discussed in the
following sections.
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 153
5.7.2 RMSE results
The RMSE results of both non-assignment-based and assignment-based methods
showed that the OD matrices estimated from all cases were better than Xprior (Figure
5.14).
Figure 5.14: RMSE results for non-assignment-based and assignment-based approaches
The percentage improvement with respect to the Xprior increased from 11.11%
to 32.72% and from 8.95% to 27.93% for both non-assignment-based and assignment-
based methods, respectively (Figure 5.15)
Similarly, the percentage improvement with respect to the traditional method
(i.e., Case-1) increased from 6.25% to 24.31% and 5.85% to 20.85% for both non-
assignment and assignment-based methods, respectively (Figure 5.16).
Figure 5.15: Percent improvement in RMSE with respect to Xprior - non-assignment vs assignment-based methods
64.8
57.60
54.00
49.6047.40
43.60
64.8
59.0
55.653.8
50.2
46.7
40
45
50
55
60
65
70
Prior Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
RMSE
Experiments
Non-Assgn. Assgn. Based
11.11
16.67
23.4626.85
32.72
8.95
14.2716.98
22.53
27.93
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
Tradtional Ω=25% Ω=50% Ω=75% Ω=100%% im
prov
emen
t in
RMSE
w.r.
t. Pr
ior
OD
Experiments
Non-Assgn. Assgn. Based
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 154
Figure 5.16: Percent improvement in RMSE with respect to traditional method- non-assignment vs assignment-based methods
5.7.3 GSSI results
The GSSI results of both the non-assignment-based and assignment-based
methods proved that the OD matrices estimated from all cases were better than Xprior
(Figure 5.17).
Figure 5.17: GSSI results for non-assignment-based and assignment-based approaches
The percentage improvement with respect to Xprior increased from 1.81% to
19.67% and from 0.92% to 17.80% for both non-assignment-based and assignment-
based methods respectively (Figure 5.18)
Similarly, the percentage improvement with respect to the traditional method
(i.e., Case-1) increased from 4.12% to 17.54% and 5.33% to 16.72% for both non-
assignment and assignment-based methods, respectively (Figure 5.19).
6.25
13.89
17.71
24.31
5.858.81
14.92
20.85
0.00
5.00
10.00
15.00
20.00
25.00
Ω=25% Ω=50% Ω=75% Ω=100%% im
prov
emen
t in
RMSE
w.r.
t. Tr
adtio
nal
OD
Experiments
Non-Assgn. Assgn. Based
0.69560.7082
0.7374
0.7738
0.7948
0.8324
0.6956 0.7020
0.7394
0.76390.7822
0.8194
0.650.670.690.710.730.750.770.790.810.830.85
Prior Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
GSS
I
Experiments
Non-Assgn. Assgn. Based
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 155
Figure 5.18: Percent improvement in GSSI with respect to Xprior - non-assignment vs assignment-based methods
Figure 5.19: Percent improvement in GSSI with respect to traditional method- non-assignment vs
assignment-based methods
5.7.4 Computational time: Non-assignment-based vs assignment-based
While both methods performed fairly well compared to the prior OD (Xprior) and
traditional method, there were two key differences between the two approaches. First,
the rate of RMSE values proved that the non-assignment-based method performed
better than that of assignment-based method which could be due to a) no errors
considered in the observations of turning proportions, and b) modelling errors in the
traffic assignment. Secondly, the bi-level method was computationally expensive
compared to the non-assignment-based method. While assignment-based method took
11.70 to 15.14 minutes for 20 iterations, the non-assignment-based method barely
required around 0.17 to 0.24 minutes for 100 iterations (refer Table 5.6).
1.816.01
11.24
14.2619.67
0.92
6.30 9.82
12.45
17.80
0.002.004.006.008.00
10.0012.0014.0016.0018.0020.0022.00
Tradtional Ω=25% Ω=50% Ω=75% Ω=100%
% im
prov
emen
t in
GSS
I w.
r.t. P
rior
OD
ExperimentsNon-Assgn. Assgn. Based
4.12
9.2612.23
17.54
5.33
8.8211.42
16.72
0.002.004.006.008.00
10.0012.0014.0016.0018.0020.00
Ω=25% Ω=50% Ω=75% Ω=100%
% im
prov
emen
t in
GSS
I w.
r.t.
Trad
tiona
l O
D
Experiments
Non-Assgn. Assgn. Based
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 156
Table 5.6: Comparison of computational times: Non-assignment-based vs assignment-based methods
Case Assignment-based method
(minutes)
Non-Assignment-based
method (minutes)
Case-1 15.14 0.17 Case-2 14.00 0.24 Case-3 14.01 0.19 Case-4 11.70 0.20 Case-5 15.11 0.24
5.8 SUMMARY
This chapter discussed a novel approach for estimating OD matrices using
observed turning proportions and the structure of Bluetooth OD flows. The
contribution of this methodology is twofold.
Firstly, the observations of turning proportions relax the dependence on
conventional assignment-based models. This implies that there is no longer a bi-level
framework and the issues associated with it, especially the computational cost, are
therefore minimised.
Secondly, the structural knowledge from observed Bluetooth OD flows was used
to maintain structural consistency in OD matrix estimates. This implies that a better
estimate can be obtained even with a poor structure of prior OD matrix.
A few methods have been proposed in the past to minimise the dependency on
the assignment; for instance, Nie, Zhang, and Recker (2005) and Barceló, Montero,
Bullejos, Serch, et al. (2013) proposed methods to estimate OD from estimated path
flows. While Nie, Zhang, and Recker (2005) proposed to decouple the user-
equilibrium based OD estimation problem through K-Shortest path ranking procedure,
Barceló, Montero, Bullejos, Serch, et al. (2013) considered path flows as the state
variables and used travel times from Bluetooth for mapping the link flows to origin
flows. However, the difference between their approach and the one proposed in this
thesis are as follows:
1. While both expressed objective function in terms of the path flows, in this
research, the objective function is expressed directly in terms of OD flows.
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 157
2. In both approaches, link flows are estimated from the estimated path flows. In
this thesis, link flows are estimated from the origin flows and observed turning
proportions.
While the study demonstrated an alternate non-assignment-based approach, it
has a few limitations. The following points discuss such limitations and plausible
solutions to address them.
While the study demonstrates that turning proportions observed at every
intersection are required to replace the need for assignment, practically it
might not be easy to obtain. For instance, the presence of shared lanes might
make it difficult to estimate turning movements from loop detectors.
However, in such situations, the turning proportions from the intersections
of critical paths connecting higher level zones can be used to estimate the
OD matrix at higher zonal level (say SA4) using the proposed non-
assignment-based approach. This higher zonal level OD can further be used
as additional constraint for estimating OD at lower zonal level (say SA2 or
SA3) using assignment-based approaches (such the one proposed in Chapter
4). This way the proposed non-assignment-based approach can act as a
higher order constraint in OD optimisation.
In this chapter, the methodology was demonstrated using the structure of
Bluetooth OD flows. As discussed in Section 4.5, the actual observations
from Bluetooth do not provide the complete sequence of trajectories and are
therefore not true trip ends. To address this, the proposed methodology can
be re-formulated using the structural knowledge of Bluetooth subpath flows
instead of Bluetooth OD flows.
Chapter 5: Non-Assignment-based OD Matrix Estimation: Exploiting Observed Turning Proportions and Structure
of Bluetooth Trips 158
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 159
Chapter 6: Methodology to Cluster B-
OD Matrices and Identify
Typical Travel Patterns:
Case Study Application of
the BCC region
The previous chapters focussed on the development of methods primarily from
the structural perspective of OD matrices, such as: a) development of statistical metrics
for the structural comparison of OD matrices, and b) integrating the structural
information of Bluetooth trips into OD matrix estimation methods (both assignment
and non-assignment-based methods). However, this chapter focusses on the practical
application of the structural knowledge of B-OD matrices and the proposed statistical
metrics i.e. GSSI and NLOD. Both metrics are deployed independently as structural
proximity measures for a clustering algorithm to identify typical travel patterns and
the corresponding typical OD matrices from real Bluetooth data of the BCC region.
Due to the lack of a large-scale database of loop counts, the travel patterns from B-OD
matrices constructed from 415 days of Bluetooth data are analysed in this study.
The outline of this chapter is as follows: first, background about the travel
patterns is provided with a detailed review of similar studies in Section 6.1; the
proposed clustering-based methodology is discussed in Section 6.2; the experiments
and results based on structural proximity metrics - GSSI, NLOD, and a traditional
metric -RMSN are discussed in Section 6.3; and finally the summary to the chapter is
provided in Section 6.4.
6.1 BACKGROUND
A pattern means the “repeated or regular way in which something happens”
(Dictionary, 2018). A travel pattern can be defined as a repeated travel behaviour
related to various features, such as the origin and destination (OD) of travel (Kieu,
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 160
Bhaskar, & Chung, 2015), mode selection (De Haas, 2016), route selection (Lee &
Sohn, 2015), and activity (Lee & McNally, 2003). The focus of this chapter was OD
related travel patterns, and in this study, the term “travel pattern” should be considered
for the same.
The previous studies on the analysis of travel patterns, based on the level of detail
they provide, can be categorized into three types. They are traffic counts-based, travel
time/speed-based; trajectories-based patterns.
The traffic counts/travel times-based patterns analyse the patterns from the time-
series plots. For instance, few researchers analysed travel patterns by classifying traffic
volumes time series (Weijermars & Van Berkum, 2005; Wild, 1997); and travel time
series (Chung, 2003). With the availability of location-aware technologies such as
GPS, Mobile Phones and Bluetooth more studies have focussed on mobility patterns
in the form of trajectories. Based on varying space-time characteristics, Guo, Zhu, Jin,
Gao, and Andris (2012) classified trajectories-based information into three types:
point-based trajectories, point-based OD pairs and area-based OD pairs. Among these
types, most studies focused on spatial clustering of trajectories that have common
attributes such as spatial contiguity (Guo, et al., 2012), similar sub-trajectories (Lee,
Han, Li, & Gonzalez, 2008), and link flows/speeds (Laharotte et al., 2015). In regards
to their application, some studies inferred spatio-temporal patterns of activities (Gong,
Liu, Wu, & Liu, 2016); origin-destination hotspots (Gonzalez, Hidalgo, & Barabasi,
2008) etc. While the trajectory-based information provides more mobility detail than
that of OD flows, the latter is computationally effective for analysing larger spatio-
temporal dimensions of travel patterns (say daily mobility of any large-scale city)
(Guo, et al., 2012).
Very limited studies are found in the literature in regards to classification of days
based on traffic data such as speed/occupancy (Rakha & Van Aerde, 1995); travel time
series (Chung, 2003); traffic load profiles (Friedrich, Immisch, Jehlicka, Otterstätter,
& Schlaich, 2010) and OD flows (Andrienko, Andrienko, Fuchs, & Wood, 2017; Guo,
et al., 2012; Yang, Yan, & Xu, 2017). With respect to OD flows related patterns, graph
partitioning methods have becoming more popular. For example Guo, et al. (2012)
applied dynamic graph partitioning to represent day-of-the-week patterns using smart
card and Bluetooth data; and Naveh and Kim (2018) used trips ends of taxi trajectory
data to spatially cluster the GPS points and analyse their patterns across space and
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 161
time. However, representation of OD flows in dynamic graphs is computationally
expensive due to huge spatio-temporal dimensions of OD. Addressing this, previous
studies have proposed dimensionality reduction methods such as Principal Component
Analysis (PCA), Singular Value Decomposition (SVD) (Yang, et al., 2017); Non-
Negative Tensor Factorization methods (Guo, et al., 2012); and spatial abstraction
methods (converting graphs into multi-dimensional vectors) (Andrienko, et al.,
2017).While these methods can capture most of OD flow information they might miss
the subtle differences within the underlying patterns. For instance, PCA and SVD may
not be appropriate if the data points lie in different subspaces/density regimes
(Steinbach, Ertöz, & Kumar, 2004); and in spatial abstraction methods discretization
of flows and distances might fit different values within the same class.
In regards to exploiting hidden structure of OD matrices, Laharotte, et al. (2015)
presented Latent Dirichlet Allocation (LDA) approach to identify temporal patterns of
the Brisbane network based on different LDA templates such as high level of traffic,
even peak (high) or leisure etc. While Laharotte, et al. (2015) had reduced the B-OD
matrices into LDA (B-OD) templates and clustered those B-OD pairs, the study
proposed to cluster daily B-OD matrices to identify day-to-day variations. Although
past studies (Djukic, et al., 2013; Ruiz de Villa, et al., 2014) proposed structural
similarity metrics to compare OD matrices, clustering of daily OD matrices and
identifying typical OD matrices based on their structural proximity has not been
addressed before.
With respect to the travel patterns, many questions were raised in Section 1.2.5.
To answer these questions, this chapter explores a clustering-based approach to
classify an individual B-OD matrix into specific groups, where OD matrices within
the same group should have similar travel patterns. Raw Bluetooth data from 845
BMSs (Figure 1.9) were obtained for 415 days (June, July, August and December
months of 2015 and all months excluding April of 2016). In the following section, a
detailed methodology is discussed to cluster high-dimensional (Osorio (2017)
emphasizes that dimension of 200 is generally high dimensional) and multi-density B-
OD matrices and identify typical OD matrices of typical travel patterns.
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 162
6.2 METHODOLOGY TO CLUSTER B-OD MATRICES AND
IDENTIFY TYPICAL TRAVEL PATTERNS
The following sections discuss the traditional DBSCAN approach followed by
the proposed three-level approach and distance measures for the clustering algorithm.
6.2.1 Traditional DBSCAN approach
A density-based spatial clustering of applications with noise (DBSCAN)
algorithm was selected for the current application. The algorithm, originally proposed
by (Ester, Kriegel, Sander, & Xu, 1996), is widely used to cluster data points based on
their density. The advantage of a DBSCAN algorithm is that it does not require any
predetermined number of clusters and the size of a cluster is not fixed (Kieu, et al.,
2015). The following sections provide a conceptual framework for the algorithm,
where the data point in the current application should be read as a B-OD matrix.
The algorithm first marks all of the data points as “non-visited”, starting with an
arbitrary selection of a “non-visited” point and identifying all other data points within
ε distance (distance threshold). These data points, if any, are termed as neighbourhood
points. If the number of neighbourhood points is at least MinPts (size threshold) then
the data point under consideration becomes the first point of a new cluster where the
neighbourhood points are part of the same cluster; otherwise, the data point is labelled
as noise. In either case, the data point is now marked as “visited”. If a cluster is
identified, then the above process for defining neighbourhood points is repeated for all
of the new points identified as neighbourhoods in the current cluster and the number
of points in the cluster is extended. Thereafter, a new “non-visited” point is selected,
and the process is repeated until all of the points are marked as “visited”. This leads to
each point being defined as either a cluster or a marked as noise.
From the above, it is clear that the algorithm does not require a number of pre-
determined clusters, as in k-NN (Altman, 1992), and is able to define clusters with
varying density. It also identifies outliers as noise. However, as the algorithm is
sensitive to the setting of its parameters (ε and MinPts), the algorithm does not perform
well for multi-density data sets (Huang, Yu, Li, & Zeng, 2009). Moreover, in the
current application, where data points are high dimensional matrices, a relevant
indicator is required to define the ε. To address these needs, the following sub-sections
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 163
discuss setting DBSCAN parameters and distance measures for B-OD clustering.
Following this, the experiments and results are discussed.
6.2.1.1 Setting DBSCAN parameters
The optimum DBSCAN parameters in the traditional approach are identified using
a simple and interactive heuristic proposed by Ester et al. (1996), as discussed below
(see Figure 6.1):
Step 1: First, a k-dist function is defined to maps each data point, p, to the
distance values (k-dist (p)) corresponding to their kth-nearest neighbour.
Step 2: For a given value of k, choose the kth neighbourhood of every point in
the database and plot the points (x-axis) in the descending order of k-dist values
(y-axis). The graph resulting from this distribution is referred to as sorted k-
dist graph.
Step 3: The shape of the sorted k-dist graph further helps to identify the
threshold point. The parameter MinPts is set to k and is chosen corresponding
to the valley of the sorted k-dist graph. The valley points are identified through
a visual observation, and as such, this technique is an interactive approach. All
points on the left side of the threshold point (i.e., higher k-dist value) are
considered noise and the remaining points are assigned to some clusters.
Figure 6.1: Typical shape of sorted k-dist graph
For ease of explanation, the above technique is presented with an example.
Figure 6.2 (left) shows five data points (P1, P2, P3, P4, and P5) that need to be
clustered using their k-dist values corresponding to their kth nearest neighbour. Here,
the values presented on the link joining the points is the distance between the points.
The kth nearest neighbour and k-dist (within brackets) of all points are shown in Figure
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 164
6.2 (right) for k=1, k=2, and k=3. For instance, the 1st, 2nd, and 3rd nearest neighbours
of P3 are P2, P4, and P1, respectively. The sorted k-dist plots for k=1, k=2, and k=3
along with corresponding valley points are illustrated in Figure 6.3. Here, the y-axis
represents k-dist values and the x-axis shows the order of points. It should be noted
that the order of points changes as k changes. After setting MinPts equal to k, the
optimal values are nothing but the k-dist values corresponding to the valley points of
sorted k-dist plots. For instance, the optimal values for MinPts=1 is 3, MinPts=2 is
3.2, and MinPts=3 is 7. The points on the left side of the valley correspond to noise
and the rest form clusters, as shown in Figure 6.3. As can be seen, for MinPts=1,
clusters can be formed using points that are in proximity (i.e., P1, P2, P3, and P4) while
considering one point (P5) as noise. Similarly, for MinPts=2 and MinPts=3, clusters
can be formed using P1, P3, and P4 while considering P4 and P5 as noise.
Alternatively, it can also be observed from Figure 6.2 that P2 and P5 are slightly away
from rest of the points. Thus, they have a higher possibility of forming noise as
compared to others.
Ester et al. (1996) identified that k-dist graphs for k > 4 did not significantly
differ from the 4-dist graph. Thus, they fixed MinPts to be 4 and identified the
threshold corresponding to the valley of 4-dist graph.
Figure 6.2: Sample data points (left) along with kth nearest neighbour and k-dist of all points (right)
Figure 6.3: Sorted k-dist graphs for k=1, k=2 and k=3 and the resulting clusters
P4
P5
P1
P3
P2
3.2
1
2
3
4
5
6
P5 P2 P3 P1 P4
k=1
Clusters
1
3
5
7
9
P2 P5 P1 P3 P4
k=2
Noise
1
4
7
10
13
P2 P5 P4 P1 P3
k=3
Noise NoiseClusters
Clusters
Valley Valley Valley
1-D
ist
2-D
ist
3-D
ist
Order of the points Order of the points Order of the points
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 165
A traditional DBSCAN algorithm performs poorly if the data points are of varied
density (multi-density data sets). To address this, some researchers have suggested
dividing datasets into different density levels (referred as subspaces) prior to the
clustering process (Elbatta & Ashour, 2013; Parsons, Haque, & Liu, 2004). The
difference in density levels can be observed from sorted k-dist plots. For instance,
Figure 6.4 shows a typical sorted k-dist plot if there are two density levels in the data
points. Thus, the decision to consider subspace clustering is made based on the density
distribution. As such, major subspaces/clusters are initially identified within the
datasets and the clustering process is then performed within the subspaces.
Figure 6.4: Demonstration of two density levels through sorted k-dist plot
6.2.2 Three-level approach for identifying DBSCAN parameters
This section discusses the methodology developed to identify the optimum
DBSCAN parameters using a three-level approach. Figure 6.5 illustrates the overview
of the three-level approach based on DBSCAN clustering algorithm. It describes the
methodology adopted to estimate typical OD matrices of typical travel patterns by
clustering β (in the study β= 415) B-OD matrices. The step by step approach is
described as follows.
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 166
Figure 6.5: Three level approach to cluster B-OD matrices
First level: Identify the possible subspaces
o Step 1: First, the density distribution of data points is observed from
sorted k-dist plots for k=1 to k=15. Based on the experiments in this
study, for k>15 the number of clusters formed were less than or equal
to 2. Thus, an upper limit of k to be 15 was selected. If plots show v
distinct valleys, then it is a v-density dataset. Thus, the data points are
further split into v subspaces for subspace clustering. If the plots
represent only one valley, then no subspace clustering is undertaken.
Second level: Identifying the initial set of DBSCAN parameters
o Step 2: Unlike the approach adopted by Ester, et al. (1996); that is,
visually inferring threshold from the valley of sorted k-dist plots, it is
proposed in this study that the shortest distance from origin criterion is
to identify the initial set of DBSCAN parameters represented by
),… ) )]. According to this
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 167
criterion, the valley of a sorted k-dist graph corresponds to point at the
shortest distance from the origin of axes formed by k-dist values in the
y-axis and sorted data points (OD matrices) in the x-axis.
Third level: Identifying the optimum set of DBSCAN parameters ( ) and the
resulting clusters.
o Step 3: DBSCAN clustering is now performed using the set of
parameters ( ) identified in the second level.
o Step 4: Although a good number of clusters is required, at the same time
unimportant clusters are not wanted. Thus, those parametric
combinations of and MinPts that result in c homogeneous clusters
where, cl <= c <= cu. The lower and upper limits are analyst’s
discretion, and in this study 3 <= c <= 6 is considered. The selected
parameters are referred to as ( ) and the rest of parametric
combinations are ignored. The homogeneous clusters belonging to
these parametric combinations are the final clusters.
Section 6.3 explains the above process further, with an example from the real
data.
6.2.3 Distance measures for clustering B-OD matrices
The two statistical metrics proposed in Chapter 3; that is, the GSSI and NLOD,
were deployed as the structural proximity measures for comparison of OD matrices.
In this research, the applicability of these metrics were independently tested as distance
measures for clustering B-OD matrices. First, the formulations and characteristics of
both statistical metrics are discussed and thereafter the distance measures are defined.
Since the DBSCAN algorithm considers a distance matrix for clustering process,
GSSI values are initially converted into distance values (dGSSI) using Equation (74).
The pre-computed 415*415 GSSI matrix is multiplied by 1,000 so that the distance
value is close to one decimal place.
(74)
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 168
The NLOD in itself is a distance value; thus, it requires no further conversion.
However, to be consistent with GSSI the 415*415 NLOD matrix is multiplied by 1000
as shown in Equation (75).
dNLOD = 1000*(NLOD( )) (75)
To compare the results of experiments based on the structural proximity measures
with a traditional metric that does not account the OD matrix structural information,
normalized root mean square error (RMSN) is chosen. The formulation for RMSN is
taken from (Antoniou, et al., 2004) and is shown in Equation (76). To be consistent
with other distance measures, the equivalent distance measure for RMSN is obtained
by multiplying Equation (76) with 1000 as shown in Equation (77).
RMSN ( ) =
(76)
dRMSN = 1000*(RMSN ) (77)
6.3 EXPERIMENTS AND RESULTS
This section details the conduct of experiments using dGSSI (Experiment-1) and
dNLOD (Experiment-2) as proximity measures and their corresponding results are
compared against Experiment-3 that is based on dRMSN..
The initial observations from sorted k-dist plots indicated a possibility of two
different density regimes in the datasets for all three experiments (Figure 6.6, Figure
6.7 and Figure 6.8). Thus, based on Step-1 of the three-level approach (Section 6.2.2),
all data points were first divided into two different subspaces. It was observed that the
first 129 points (in the order shown by x-axis) defined subspace-1 and consisted of
Saturdays, Sundays, public holidays, and long weekends. The rest of the data points
were pre-classified as subspace-2, which consisted of regular weekdays (WDR) and
weekday school holidays (WDSH). The experiments for individual subspaces are
described in the following subsections.
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 169
Figure 6.6: Sorted k-dist plots for experiment-1
Figure 6.7: Sorted k-dist plots for experiment-2
Figure 6.8: Sorted k-dist plots for experiment-3
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 170
6.3.1 Experiment-1: dGSSI as proximity measure
6.3.1.1 Subspace-1 analysis
Here, the analysis was performed on 129 data points of subspace-1. The initial
set of DBSCAN parameters; that is, were identified based on the shortest
distance from origin criterion. Figure 6.9a presents the number of clusters formed for
different MinPts. The pie-chart represents a consistent proportion of clusters
(homogeneous clusters) for (MinPts =4 to MinPts =9). The relationship between the
optimum parameters ( ) was observed to be linear, with an R2 value of 0.8932 (see
Figure 6.9b).
The clusters of subspace-1 from Experiment-1 were:
Cluster-1 (C1) included weekends, Public Holidays, Long Weekends,
January to June 2016.
Cluster-2 (C2) included Sundays of Spring and summer, 2016;
Cluster-3 (C3) included Saturdays of Spring and summer, 2016;
Cluster-4 (C4) included Sundays of Winter, 2015; and
Cluster-5 (C5) included Saturdays of Winter, 2015.
Figure 6.9: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for Subspace-1 of experiment-1
6.3.1.2 Subspace-2 analysis
Similar to the last analysis, the graphs presented in Figure 6.10a indicate the
number of clusters formed for different MinPts and Figure 6.10b indicates the linear
R² = 0.8932
30.00
31.00
32.00
33.00
34.00
35.00
36.00
3 4 5 6 7 8 9
MinPts ( )
0
2
4
6
8
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
C148%
C215%
C312%
C413%
C512%
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 171
relationship (with R2 =0.94) between optimal DBSCAN parameters. The following are
the observed clusters:
Cluster-1 (C1) included regular weekdays of 2016 except summer;
Cluster-2 (C2) included regular weekdays, 2015;
Cluster-3 (C3) included weekday school holidays, 2015 and 2016; and
Cluster-4 (C4) included regular weekdays of November 2016
Figure 6.10: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for subspace-2 of experiment-1
6.3.2 Experiment-2: dNLOD as proximity measure
6.3.2.1 Subspace-1 analysis
The relationship between MinPts and the number of clusters is illustrated in
Figure 6.11a, while Figure 6.11b shows the linear relationship between the optimal
DBSCAN parameters (R2 =0.8977) that resulted in the following clusters:
Cluster-1 (C1) included weekends, Public Holidays, Long Weekends,
January to June 2016.
Cluster-2 (C2) included Sundays of Winter, 2015; and
Cluster-3 (C3) included Saturdays of Winter, 2015.
0123456789
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
R² = 0.94
15.00
16.00
17.00
18.00
19.00
20.00
2 4 6 8 10
MinPts ( )
C144%
C224%
C324%
C48%
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 172
Figure 6.11: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for subspace-1 of experiment-2
6.3.2.2 Subspace-2 analysis
The relationship between MinPts and number of clusters is shown in Figure
6.12a. The relationship between ε and M for subspace-2 of experiment-2 was also
found to be linear, with a R2 value of 0.9716 (Figure 6.12b). The clusters resulting
from this analysis were:
Cluster-1 (C1) included regular weekdays of 2016 except Summer;
Cluster-2 (C2) included regular weekdays, 2015;
Cluster-3 (C3) included weekday school holidays, 2015 and 2016; and
Cluster-4 (C4) included regular weekdays of November 2016.
Figure 6.12: (a) Number of clusters vs MinPts and proportion of clusters; and (b) vs for subspace-2 of experiment-2
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
C137%
C233%
C330%
R² = 0.8977
94.00
99.00
104.00
109.00
4 6 8 10 12
MinPts ( )
01234567
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber
of c
luste
rs
MinPts
(A) (B)
C146%
C223%
C322%
C49%
R² = 0.9716
65.00
67.00
69.00
71.00
73.00
75.00
2 3 4 5 6
MinPts ( )
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 173
6.3.1 Experiment-3: dRMSN as proximity measure
6.3.1.1 Subspace-1 analysis:
The distance measure dRMSN has resulted in only one major cluster for subspace-
1. It included all Saturdays, Sundays, Public Holidays of 2015 and 2016 except
Saturdays of spring and summer, 2016 that was considered to be noise.
6.3.1.2 Subspace-2 analysis:
A total of 4 homogeneous clusters are formed for MinPts ranging from 4 to 13.
The relationship between MinPts and number of clusters are illustrated in Figure
6.13(a) and Figure 6.13(b) shows the linear relationship between the optimal
DBSCAN parameters (R2 =0.9832) that resulted in the following clusters:
Cluster-1 (C1) includes WDR of 2016 except summer;
Cluster-2 (C2) includes WDR, 2015;
Cluster-3 (C3) includes WDSH, 2015 and 2016; and
Cluster-4 (C4) includes WDR of November 2016.
Figure 6.13: (a) Number of clusters vs MinPts and proportion of clusters; and (B) vs for Subspace-2 of experiment-3
6.3.2 Typical B-OD flows
One of the ways to derive typical B-OD matrices and typical OD flows for
individual OD pairs is by taking average of all B-OD matrices within each cluster type.
To give an example of the difference among the typical OD flows, OD flows for the
OD pair-Mt. Gravatt and Brisbane CBD is shown in the Box-Whisker plot (Figure
6.14). The plot is shown for the clusters resulted from experiment-1 where, the first 5
clusters in the x-axis correspond to C-1 to C-5 of subspace-1 and the last 4 clusters
0
2
4
6
8
10
12
14
16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Num
ber o
f clu
sters
MinPts
C141%
C224%
C323%
C412%
R² = 0.9832
135
140
145
150
155
160
165
170
175
2 4 6 8 10 12 14
Opt
imum
ε
MinPts
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 174
correspond to C-1 to C-4 of subspace-2, respectively. The y-axis represents the OD
flow values.
Figure 6.14: Box-Whisker plot demonstrating the difference among the typical B-OD flows for OD pair – Mt. Gravatt and Brisbane CBD (results of experiment-1)
6.3.3 Discussion
Since the ground truth is unknown, one of the ways to compare the clusters
resulted from all three experiments is to see how good they are able to reproduce pre-
classified day types. The number of days in each category of day type are shown in the
Figure 6.15. (Refer Figure 6.16 or notations section for the expansion of the terms used
in Figure 6.15).
Figure 6.15: Classification of day types
While the comparison in Figure 6.16 shows that PH (Public Holidays), LW
(Long Weekends), School Holidays during Saturdays and Sundays could not form
standalone clusters, both GSSI (9 clusters) and NLOD (7 clusters) could represent the
pre-classification better than RMSN (5 clusters). The similarity in the clusters resulted
from GSSI and NLOD are further explained in detail below.
3918
40
16
219
67
5 11
SATRSATSHSUNRSUNSHWDRWDSHPHLW
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 175
Both metrics were able to differentiate weekday and weekend patterns. In
fact, there was no typical weekend travel pattern because travel patterns
during Saturday and Sunday were found to differ from each other.
Both metrics observed seasonal trends in travel patterns. For instance,
Saturdays during the Australian Winter, 2015 were observed to have
different travel patterns compared to the rest of the Saturdays. A similar
observation was noted for the Sundays of Winter, 2015.
Both metrics identified a group of Saturdays and Sundays during the
school holiday season that shared similar travel patterns with a few well
noted public holidays of Australia.
The classification of subspace-2 (i.e., WDR and WDSH) was the same in
both experiments. This identified that the travel patterns during WDSH
differed from those of the WDR. Interestingly, WDSH from both 2015 and
2016 were grouped into one single cluster by both metrics.
Both metrics identified that WDR travel patterns during November 2016
differed from those of other regular working weekdays. The difference in
travel patterns during November 2016 could be attributed to major events
held in that month. The annual report published by Royal National
Agricultural and Industrial Association of Queensland (RNA, 2016)
estimated that, in 2016, the Brisbane Showgrounds attracted almost a
million people by hosting more than 250 events, with an increase of 20%
compared to 2015. The month of November was the busiest month of
2016, due to hosting a total of 35 events.
However, the only difference between them is that GSSI identified Saturdays
and Sundays from Australian spring and summer of 2016 into two individual clusters
which NLOD failed to differentiate. The less sensitivity of NLOD in this regard can
be attributed to the fact that it computes statistics on OD pairs belonging to one specific
origin, whereas, GSSI computes statistics on groups of OD pairs belonging to more
than one origin. Due to this, GSSI is able to capture subtle structural differences in
travel patterns during the afore-mentioned days.
On the other hand, clusters produced from experiment-3 (based on RMSN)
demonstrated seasonal trends in subspace-2 travel patterns and were similar to the
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 176
results of other experiments. However, it failed to distinguish the differences among
the daily travel patterns during Saturdays, Sundays and Public Holidays. Resulting in
one major cluster, it was unable to recognize seasonal variations within other days in
subspace-1. This is because RMSN is based on deviations of individual OD flows due
to which it could not identify the structural differences within the respective B-OD
matrices.
The typical OD flows (see section 6.3.2 for results of experiment-1) from each
cluster demonstrated typical travel patterns of the Brisbane city and are better than the
observations from a similar study by Guo, et al. (2012) conducted on Brisbane city
over the same time period. Guo, et al. (2012) could identify only three types of travel
patterns namely Saturday, Sunday and Weekday patterns. This is perhaps because
travel patterns are analysed on the dimensionally reduced OD matrices. However, the
present study is able to identify other patterns highlighting the strength of structural
proximity measures to identify more typical travel patterns.
For travel demand modelling, the knowledge of travel patterns can be used for
estimating typical OD matrices using bi-level solution algorithms. Moreover, the
knowledge of travel patterns is important for effective policy decisions such as shifting
public holidays of similar travel patterns towards weekends can form more number of
long weekends (Chung, 2003). This would encourage public to spend more during the
holidays, and thus boosting the nation’s economy. Further, the knowledge of seasonal
distribution of travel patterns help transport planners to schedule the travel surveys
across the study network over any period. For instance, the Household Travel Survey
(HTS) for South East Queensland (SEQTS, 2010) was conducted for over 10 weeks
from mid-April through late-June and in July in 2009. However, the survey period
avoided the days during School/University holidays. Since, the study showed that the
travel patterns are different during school holidays and during different seasons,
distributing the survey period over a year based on the knowledge of Bluetooth travel
patterns can capture better travel patterns of any study region. There are short-term
ITS applications of identifying typical OD matrices. For instance, developing the
database of typical historical time-sliced OD matrices can improve the performance of
OD prediction algorithms (like Kalman Filter) for real time traffic management and
decision making such as Aimsun Live (Aimsunlive, 2017) etc.
Cha
pter
6: M
etho
dolo
gy to
Clu
ster
B-O
D M
atric
es a
nd Id
entif
y Ty
pica
l Tra
vel P
atte
rns:
Cas
e St
udy
App
licat
ion
of th
e B
CC
regi
on
177
Figu
re 6
.16:
Com
paris
on o
f clu
ster
s res
ulte
d fr
om a
ll th
ree
expe
rimen
ts
2015
2016
2015
2016
2015
,16
2015
,16
2015
2016
2015
2016
2015
2016
2015
2016
1W
eeke
nds,
PH a
nd L
W, J
an-J
un 2
016
23
816
817
2Su
nday
s, Sp
ring
and
sum
mer
201
61
25
83
Satu
rday
s, Sp
ring
and
sum
mer
201
65
94
Sund
ays,
Win
ter 2
015
13
105
Satu
rday
s, W
inte
r 201
53
106
WD
R,
2016
exc
ept s
umm
er11
97
WD
R, 2
015
631
8W
DSH
, 201
5 an
d 20
163
2240
9W
DR
, Nov
embe
r 201
623
2015
2016
2015
2016
2015
,16
2015
,16
2015
2016
2015
2016
2015
2016
2015
2016
1W
eeke
nds,
PH a
nd L
W, J
an-J
un 2
016
33
141
282
101
282
Sund
ays,
Win
ter 2
015
13
103
Satu
rday
s, W
inte
r 201
53
104
WD
R,
2016
exc
ept s
umm
er11
95
WD
R, 2
015
636
WD
SH, 2
015
and
2016
117
407
WD
R, N
ovem
ber 2
016
23
2015
2016
2015
2016
2015
,16
2015
,16
2015
2016
2015
2016
2015
2016
2015
2016
Subs
pace
-11
Wee
kend
s, PH
and
LW
, 201
5 an
d 20
165
116
1011
186
1011
292
WD
R, 2
015
611
3W
DSH
, 201
5 an
d 20
161
2239
4W
DR
, Nov
embe
r 201
624
5W
DR
, 20
16 e
xcep
t sum
mer
109
Long
W
eeke
nds
(LW
)
Satu
rday
sSu
nday
s
Dur
ing
Scho
ol
Hol
iday
s (S
ATS
H)
Reg
ular
(S
ATR
)
Dur
ing
Scho
ol
Hol
iday
s (S
UN
SH)
Reg
ular
(SU
NR
)
Expe
rimen
t-3: R
MSN
Subs
pace
-2
Subs
pace
-1
Subs
pace
-2
Expe
rimen
t-2: N
LOD
Subs
pace
-1
Subs
pace
-2
Expe
rimen
t-1: G
SSI
Wee
kday
sPu
blic
Hol
iday
sW
eeke
nds
Reg
ular
W
eekd
ays
(WD
R)
Scho
ol H
olid
ays
durin
g w
eekd
ays
(WD
SH)
Nor
mal
Pub
lic
Hol
iday
s (P
H)
Chapter 6: Methodology to Cluster B-OD Matrices and Identify Typical Travel Patterns: Case Study Application
of the BCC region 178
6.4 SUMMARY
Although DBSCAN clustering algorithm is not new, the study has two major
contributions:
Firstly, clustering multi-density OD matrices based on structural proximity measures
to identify typical daily travel patterns of large-scale network has not been addressed
in the literature.
Secondly, the proposed three-level clustering approach is simple and effective in
identifying the OD clusters. The prior identification of subspaces addresses the
incapacity of classical DBSCAN with respect to multi-density datasets. Identification
of the set of optimum DBSCAN parameters demonstrates that different parametric
combinations can produce homogeneous clusters and their relationship is nearly linear.
The clustering results demonstrated many typical travel patterns for the BCC
region. All three experiments showed that there were seasonal variations in the travel
patterns for weekdays, and the travel patterns of during weekday school holidays and
November 2016 were unique. The experiments based on structural proximity measures
could identify the seasonal variations even among the travel patterns during Saturdays
and Sundays. On the other hand, RMSN failed to identify any unique travel patterns
within subspace-1 because of its incapacity to capture the subtle structural differences
within those patterns. This highlights the importance of accounting the structural
information of OD matrices with many practical benefits for both long-term strategic
and short-term transport planning applications.
Chapter 7: Conclusion 179
Chapter 7: Conclusion
This chapter contains the conclusions, limitations, and recommendations related
to the research. First, a summary of this thesis is provided in Section 7.1. Second, the
findings of the study and their connection to the research questions raised in Chapter
1 are reflected upon in Section 7.2. Lastly, based on the understanding gained in this
research, new and pertinent questions for future research are discussed in Section 7.3.
7.1 BRIEF SUMMARY
Estimating OD matrices has been the study of transport modelling research for
more than last three decades. Ever since traffic counts began to be treated as indirect
observations of OD flows, “matrix estimation” has been considered an optimisation
problem. Since then, many methods have been proposed and implemented with respect
to solution algorithms, assignment models, rules-based heuristics, objective function
formulations, measurements from alternate data sources, and statistical performance
measures. While most of the methods developed thus far fall under the schema of bi-
level modelling framework, many challenges are yet to be resolved. First, a traffic
count-based bi-level method is an under-determined problem and to address this most
methods are still dependent on an outdated target OD matrix to maintain the structural
consistency in an OD matrix estimation. Second, assignment-models remain
challenging due to modelling errors and inseparable dependency on OD matrix. Third,
bi-level methods are computationally challenging due to the dimensionality of an OD
matrix and lower-level user-equilibrium assignment problem. Fourth, most existing
statistical performance measures do not account for the structural information of OD
matrices. Fifth, there is a great need to identify typical travel patterns and their
corresponding typical OD matrices in demand modelling. The last challenge is related
to bridging the gap between the availability of massive amounts of big-traffic data and
their direct implementation into transport models, especially tackling the issue related
to unknown market penetration rates of trips inferred from advanced data sources.
This research is an attempt to review the literature, understand the state-of-the-
art techniques, and propose methods to address some of the challenges. Specifically,
Chapter 7: Conclusion 180
this study proposes methods to exploit the additional structural knowledge available
from other big data sources, such as Bluetooth, to maintain structural consistency and
address the problem of under-determinacy, develop alternate methodology to the
existing bi-level-based framework, develop new statistical performance measures for
the structural comparison of OD matrices, and propose a methodological approach to
cluster B-OD matrices and identify typical travel patterns based on the structural
proximity measures using a case study application on real Bluetooth datasets from
BCC region.
The Brisbane City network is already equipped with several Bluetooth scanners.
This Bluetooth data is a good source of travel related information in both spatial and
temporal contexts. While the current applications are only limited to travel time
estimation, the unexplored potential of trip-related information formed the strong
motivation for the current research. Taking one step beyond the existing
implementation, the current study investigated the potential of Bluetooth data and
proposed new methods for improving the quality of OD matrix estimates using
additional knowledge (either the “structure of trips” and/or turning proportions) of
Bluetooth observations. Few analyses were conducted as a part of this research (see
Appendix B) to add more confidence into the structural knowledge of real Bluetooth
observations from the BCC region. However, in the absence of ground truth,
simulation-based experiments are the only way to strengthen the argument that the
“structure” of Bluetooth trips could improve the quality of OD estimates. Although,
the current research is based on Bluetooth observations and applied on the BCC region,
the methodology is applicable for data from any other similar data sources that can
provide additional information related to the structure of trips over any other study
network.
Overall, the entire study is based on enhancing the existing research with respect
to OD matrices comparison (through structural similarity measures); OD matrix
estimation (through the knowledge of Bluetooth trips/turning proportions), and
identification of typical travel patterns and typical OD matrices (through structural
proximity-based clustering method).
Chapter 7: Conclusion 181
7.2 RESEARCH FINDINGS
The study identified major research gaps, which lead to the development of four
research questions (Chapter 1) following a comprehensive review of the literature
(Chapter 2). In conjunction with the research questions, the research findings are
discussed as follows:
The sensitivity analysis results from Chapter 3 demonstrated that GSSI and
NLOD are robust statistical performance measures that have enough
potential to structurally compare OD matrices, which answered the first
research question (RQ1).
The findings of Chapter 4 answered RQ-2, as follows:
o The B-OD method demonstrated that the additional structural
knowledge of Bluetooth OD flows can improve the quality of OD
matrix estimates. The B-OD method is suitable for the networks (such
as the BCC region) that have a good connectivity of Bluetooth scanners.
Although, the B-OD method assumes that the trip ends are exactly
known, the methodology still holds well for observations from any
other emerging data sources that can provide more confidence about
trip ends compared to Bluetooth.
o The B-SP method suits the situations when the penetration rate of
Bluetooth trajectories is low. This method demonstrated the
applicability of Bluetooth subpath flows. The quality of the OD matrix
estimates are found to be better than the traditional traffic counts-based
approach even for 2.5% penetration rate of Bluetooth trips.
o Since, the core of both methods is based on structural information of
Bluetooth trips, the need to estimate unknown penetration rates of
Bluetooth trips is relaxed.
The findings of Chapter 5 answered RQ-3, as follows:
o It demonstrated the ability of the proposed turning-proportion-based
technique as an alternate method to replace the assignment-based
models.
o The improvement in the quality of the OD matrix estimates through
additional knowledge of Bluetooth trips strengthened the proposed
Chapter 7: Conclusion 182
single-level formulation. In fact, knowledge about traffic assignment
was implicitly considered in the observed turning proportions and
Bluetooth trips.
The core of Chapter 6 was to develop a methodological approach to cluster multi-
density B-OD matrices database and identify typical travel patterns with a real
case study application on the BCC region. This chapter addressed RQ-4. The
major findings of clustering analysis were:
o The clusters resulting from experiment-1 and experiment-2
demonstrated the ability of the proposed statistical metrics – GSSI and
NLOD as potential structural proximity measures for DBSCAN
clustering algorithm.
o The clusters from experiment-3 that is based on RMSN failed to
distinguish travel patterns during the weekends and public holidays.
This is because most traditional metrics do not the account the structure
of OD matrices in their mathematical formulation and due to which they
could not identify the subtle structural differences in the afore-
mentioned travel patterns.
7.3 RECOMMENDATIONS FOR FUTURE RESEARCH
This section discusses the future research directions and some pertinent
questions:
Although introducing randomness in Bluetooth flows demonstrated
improvement in the quality of OD flow estimates, to achieve more realistic
modelling, the experiments could include errors and inconsistencies in the
observed traffic counts and turning proportions.
In this study, Bluetooth subpaths were created by trimming the first and last
IDs of BMS from the complete sequence of trips. However, as shown in
Figure 1.12, there could be mis-detections within the Bluetooth trajectories.
Accounting for these mis-detections before incorporating them into the
optimisation model would be even more realistic.
Future studies could be tested using state-of-the-art solution algorithms,
such as versions of SPSA (Tympakianaki, et al., 2018) or metamodels
Chapter 7: Conclusion 183
(Osorio, 2019), and these could be compared with other solution algorithms,
such as a genetic algorithm (Kim, et al., 2001), etc., over a benchmark
network. More improvements could be made with respect to the parameters
of gradient-based algorithms. For instance, in the present study, the prior
step-size was chosen through trial-and error. However, the sensitivity of OD
flows to different values of step-sizes and the rate of change of step-sizes
need to be investigated. The step-sizes could also be sensitive to the OD
flow values; that is, higher and lower flow values. Convergence criteria
could also be tested for future investigation.
The current research focussed only on utilising the knowledge of Bluetooth
trips in the objective function formulation. As vehicle trajectories can be
inferred from Bluetooth observations, they could be used to calibrate the
assignment model in the future research.
This study can be extended to dynamic OD space. Current state-of-the-art
techniques to estimate better quality time-dependent OD matrices use quasi-
dynamic approaches. Thus, the methods proposed in this research could
incorporate a quasi-dynamic assumption with respect to the distribution of
origin flows and estimate better time-dependent offline OD matrices. Quasi-
dynamic Kalman filter algorithms could then be investigated with additional
measurements from Bluetooth observed flows for real-time estimation of
OD flows.
Bibliography 184
Bibliography
Abedi, N., Bhaskar, A., & Chung, E. (2013). Bluetooth and Wi-Fi MAC address based crowd data collection and monitoring: benefits, challenges and enhancement. Retrieved from
Abedi, N., Bhaskar, A., & Chung, E. (2014). Tracking spatio-temporal movement of
human in terms of space utilization using Media-Access-Control address data. Applied Geography, 51, 72-81. Retrieved from
Abedi, N., Bhaskar, A., Chung, E., & Miska, M. (2015). Assessment of antenna
characteristic effects on pedestrian and cyclists travel-time estimation based on Bluetooth and WiFi MAC addresses. Transportation Research Part C: Emerging Technologies, 60, 124-141. Retrieved from
ABS (Singer-songwriter). (2017). More than two in three drive to work, Census
reveals. On. Retrieved from http://www.abs.gov.au/ausstats/[email protected]/mediareleasesbyReleaseDate/7DD5DC715B608612CA2581BF001F8404?OpenDocument
ABS. (2018). Census of Population and Housing: Community Profile, DataPack and
TableBuilder Templates, Australia, 2016 Retrieved from http://www.abs.gov.au/AUSSTATS/[email protected]/Latestproducts/2079.0Main%20Features42016?opendocument&tabname=Summary&prodno=2079.0&issue=2016&num=&view=. http://www.abs.gov.au/AUSSTATS/[email protected]/Latestproducts/2079.0Main%20Features42016?opendocument&tabname=Summary&prodno=2079.0&issue=2016&num=&view=
Ahas, R., Silm, S., Järv, O., Saluveer, E., & Tiru, M. (2010). Using mobile positioning
data to model locations meaningful to users of mobile phones. In Journal of urban technology (Vol. 17, pp. 3-27).
Aimsun. (2019). Aimsun Next 8.4 User's Manual. Aimsun, Barcelona, Spain.
Retrieved from https://www.aimsun.com/ Aimsunlive. (2017). Gold Coast: Predictive Solutions Trial. Retrieved from
https://www.aimsun.com/gold-coast-predictive-solutions-trial/Retrieved from https://www.aimsun.com/gold-coast-predictive-solutions-trial/
Alexander, L., Jiang, S., Murga, M., & González, M. C. (2015). Origin–destination
trips by purpose and time of day inferred from mobile phone data. In Transportation research part c: emerging technologies (Vol. 58, pp. 240-250).
Bibliography 185
Alibabai, H., & Mahmassani, H. (2008). Dynamic origin-destination demand estimation using turning movement counts. Transportation Research Record: Journal of the Transportation Research Board(2085), 39-48. Retrieved from
Allahviranloo, M., & Recker, W. (2015). Mining activity pattern trajectories and
allocating activities in the network. In Transportation (pp. 1-19). Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric
regression. The American Statistician, 46(3), 175-185. Retrieved from Andrienko, G., Andrienko, N., Fuchs, G., & Wood, J. (2017). Revealing patterns and
trends of mass mobility through spatial and temporal abstraction of origin-destination movement data. IEEE Transactions on Visualization & Computer Graphics(1), 1-1. Retrieved from
Antoniou, C., Barceló, J., Breen, M., Bullejos, M., Casas, J., Cipriani, E., . . . Marzano,
V. (2016). Towards a generic benchmarking platform for origin–destination flows estimation/updating algorithms: Design, demonstration and validation. Transportation Research Part C: Emerging Technologies, 66, 79-98. Retrieved from
Antoniou, C., Ben-Akiva, M., & Koutsopoulos, H. (2004). Incorporating automated
vehicle identification data into origin-destination estimation. Transportation Research Record: Journal of the Transportation Research Board(1882), 37-44. Retrieved from
Antoniou, C., Ben-Akiva, M., & Koutsopoulos, H. N. (2006). Dynamic traffic demand
prediction using conventional and emerging data sources. In IEE Proceedings-Intelligent Transport Systems (Vol. 153, pp. 97-104): IET.
Antoniou, C., Ciuffo, B., Montero, L., Casas, J., Barcelò, J., Cipriani, E., . . . Bullejos,
M. (2014). A framework for the benchmarking of OD estimation and prediction algorithms. In 93rd Transportation Research Board Annual Meeting.
Asakura, Y., Hato, E., & Kashiwadani, M. (2000). Origin-destination matrices
estimation model using automatic vehicle identification data and its application to the Han-Shin expressway network. Transportation, 27(4), 419-438. Retrieved from
ASGS. (2017). Australian Statistical Geography Standard (ASGS). Retrieved from
http://www.abs.gov.au/websitedbs/D3310114.nsf/home/Australian+Statistical+Geography+Standard+(ASGS)
Ashok, K. (1996). Estimation and prediction of time-dependent origin-destination
flows. In Doctoral Dissertation. Ashok, K., & Ben-Akiva, M. E. (2000). Alternative approaches for real-time
estimation and prediction of time-dependent origin–destination flows. Transportation Science, 34(1), 21-36. Retrieved from
Bibliography 186
Ashok, K., & Ben-Akiva, M. E. (2002). Estimation and prediction of time-dependent
origin-destination flows with a stochastic mapping to path flows and link flows. Transportation Science, 36(2), 184-198. Retrieved from
ATAP. (2016a). Australian Transport Assessment and Planning Guidelines. Retrieved
from https://atap.gov.au/tools-techniques/travel-demand-modelling/files/T1_Travel_Demand_Modelling.pdf
ATAP. (2016b). Overview of transport modelling. Retrieved from
https://atap.gov.au/tools-techniques/travel-demand-modelling/2-overview.aspx
Australian Transport Assessment and Planning (ATAP). (2017). Retrieved from
https://atap.gov.au/tools-techniques/travel-demand-modelling/1-introduction.aspx
Balakrishna, R., Ben-Akiva, M., & Koutsopoulos, H. (2007). Offline calibration of
dynamic traffic assignment: simultaneous demand-and-supply estimation. Transportation Research Record: Journal of the Transportation Research Board(2003), 50-58. Retrieved from
Bar-Gera, H., Mirchandani, P. B., & Wu, F. (2006). Evaluating the assumption of
independent turning probabilities. Transportation Research Part B: Methodological, 40(10), 903-916. Retrieved from
Barceló Bugeda, J., Montero Mercadé, L., Marqués, L., & Carmona, C. (2010). A
Kalman-filter approach for dynamic OD estimation in corridors based on bluetooth and Wi-Fi data collection. In 12th World Conference on Transportation Research WCTR, 2010.
Barceló, J., Gilliéron, F., Linares, M., Serch, O., & Montero, L. (2012). Exploring link
covering and node covering formulations of detection layout problem. Transportation Research Record: Journal of the Transportation Research Board(2308), 17-26. Retrieved from
Barceló, J., Montero, L., Bullejos, M., Linares, M., & Serch, O. (2013). Robustness
and Computational Efficiency of Kalman Filter Estimator of Time-Dependent Origin-Destination Matrices: Exploiting Traffic Measurements from Information and Communications Technologies. Transportation Research Record: Journal of the Transportation Research Board(2344), 31-39. Retrieved from
Barceló, J., Montero, L., Bullejos, M., Serch, O., & Carmona, C. (2013). A Kalman
filter approach for exploiting bluetooth traffic data when estimating time-dependent OD matrices. Journal of Intelligent Transportation Systems, 17(2), 123-141. Retrieved from
Battiti, R. (1989). Accelerated backpropagation learning: Two optimization methods.
Complex systems, 3(4), 331-342. Retrieved from
Bibliography 187
Bauer, D., Richter, G., Asamer, J., Heilmann, B., Lenz, G., & Kölbl, R. (2018). Quasi-
Dynamic Estimation of OD Flows From Traffic Counts Without Prior OD Matrix. IEEE Transactions on Intelligent Transportation Systems, 19(6), 2025-2034. Retrieved from
Behara, K. N., Bhaskar, A., & Chung, E. (2018, 7- 11 January 2018). Classification of
typical Bluetooth OD matrices based on structural similarity of travel patterns-Case study on Brisbane city. In Transportation Research Board 97th Annual Meeting.
Bell, M. G. (1983). The estimation of an origin-destination matrix from traffic counts.
Transportation Science, 17(2), 198-217. Retrieved from Bell, M. G. (1991). The estimation of origin-destination matrices by constrained
generalised least squares. Transportation Research Part B: Methodological, 25(1), 13-22. Retrieved from
Ben-Akiva, M. E., Gao, S., Wei, Z., & Wen, Y. (2012). A dynamic traffic assignment
model for highly congested urban networks. Transportation research part C: emerging technologies, 24, 62-82. Retrieved from
Bera, S., & Rao, K. (2011). Estimation of origin-destination matrix from traffic counts:
the state of the art. European Transport - Trasporti Europei, 49, 2-23. Retrieved from
Bhaskar, A., & Chung, E. (2013). Fundamental understanding on the use of Bluetooth
scanner as a complementary transport data. Transportation Research Part C: Emerging Technologies, 37, 42-72. Retrieved from
Bhaskar, A., Qu, M., & Chung, E. (2015). Bluetooth vehicle trajectory by fusing
bluetooth and loops: motorway travel time statistics. IEEE Transactions on Intelligent Transportation Systems, 16(1), 113-122. Retrieved from
Bhaskar, A., Qu, M., Nantes, A., Miska, M., & Chung, E. (2015). Is bus
overrepresented in Bluetooth MAC scanner data? Is MAC-ID really unique? International Journal of Intelligent Transportation Systems Research, 13(2), 119-130. Retrieved from
Bierlaire, M. (2002). The total demand scale: a new measure of quality for static and
dynamic origin–destination trip tables. In Transportation Research Part B: Methodological (Vol. 36, pp. 837-850).
Bierlaire, M., & Crittin , F. (2004). An efficient algorithm for real-time estimation and
prediction of dynamic OD tables. Operations Research, 52(1), 116-127. Retrieved from
Bierlaire, M., & Toint, P. L. (1995). Meuse: An origin-destination matrix estimator
that exploits structure. Transportation Research Part B: Methodological, 29(1), 47-60. Retrieved from
Bibliography 188
Blogg, M., Semler, C., Hingorani, M., & Troutbeck, R. (2010). Travel time and origin-
destination data collection using Bluetooth MAC address readers. In Australasian Transport Research Forum (pp. 1-15).
Bluetooth data from Brisbane City Council. (2016). Retrieved from Brooks, A. C., Zhao, X., & Pappas, T. N. (2008). Structural similarity quality metrics
in a coding context: Exploring the space of realistic distortions. IEEE Transactions on image processing, 17(8), 1261-1273. Retrieved from
BSTM (Cartographer). (2015). Traffic Analysis Zonal network on Google Earth. BSTM. (2016). Brisbane Strategic Transport Demand Model. Retrieved from Bullejos, M., Barceló Bugeda, J., & Montero Mercadé, L. (2014). A DUE based bilevel
optimization approach for the estimation of time sliced OD matrices. In Proceedings of the International Symposia of Transport Simulation (ISTS) and the International Workshop on Traffic Data Collection and its Standardisation (IWTDCS), ISTS'14 and IWTCDS'14.
Calabrese, F., Di Lorenzo, G., Liu, L., & Ratti, C. (2011). Estimating origin-
destination flows using mobile phone location data. IEEE Pervasive Computing, 10(4), 0036-0044. Retrieved from
Cantelmo, G., Cipriani, E., Gemma, A., & Nigro, M. (2014). An adaptive bi-level
gradient procedure for the estimation of dynamic traffic demand. IEEE Transactions on Intelligent Transportation Systems, 15(3), 1348-1361. Retrieved from
Carpenter, C., Fowler, M., & Adler, T. (2012). Generating route-specific origin-
destination tables using Bluetooth technology. Transportation Research Record: Journal of the Transportation Research Board(2308), 96-102. Retrieved from
Cascetta, E. (1984). Estimation of trip matrices from traffic counts and survey data: a
generalized least squares estimator. Transportation Research Part B: Methodological, 18(4-5), 289-299. Retrieved from
Cascetta, E., Inaudi, D., & Marquis, G. (1993). Dynamic estimators of origin-
destination matrices using traffic counts. Transportation science, 27(4), 363-373. Retrieved from
Cascetta, E., & Nguyen, S. (1988). A unified framework for estimating or updating
origin/destination matrices from traffic counts. Transportation Research Part B: Methodological, 22(6), 437-455. Retrieved from
Cascetta, E., Papola, A., Marzano, V., Simonelli, F., & Vitiello, I. (2013). Quasi-
dynamic estimation of o–d flows from traffic counts: Formulation, statistical
Bibliography 189
validation and performance analysis on real data. Transportation Research Part B: Methodological, 55, 171-187. Retrieved from
Cascetta, E., & Postorino, M. N. (2001). Fixed point approaches to the estimation of
O/D matrices using traffic counts on congested networks. Transportation science, 35(2), 134-147. Retrieved from
Chang, G.-L., & Wu, J. (1994). Recursive estimation of time-varying origin-
destination flows from traffic counts in freeway corridors. Transportation Research Part B: Methodological, 28(2), 141-160. Retrieved from
Cheung, W., Wong, S., & Tong, C. (2006). Estimation of a time‐dependent origin‐
destination matrix for congested highway networks. Journal of advanced transportation, 40(1), 95-117. Retrieved from
Chitturi, M. V., Shaw, J. W., Campbell IV, J. R., & Noyce, D. A. (2014). Validation
of Origin–Destination Data from Bluetooth Reidentification and Aerial Observation. Transportation Research Record, 2430(1), 116-123. Retrieved from
Chung, E. (2003). Classification of traffic pattern. In Proc. of the 11th World Congress
on ITS (pp. 687-694). Chung, E. (2016). Use of Bluetooth and Wifi for Measuring Vehicles and People
Movements, PATREC. Retrieved from http://www.patrec.uwa.edu.au/announcements/use-of-bluetooth-and-wifi-for-measuring-vehicles-and-people-movements
Cipriani, E., Florian, M., Mahut, M., & Nigro, M. (2010). Investigating the efficiency
of a gradient approximation approach for the solution of dynamic demand estimation problems. Chapters. Retrieved from
Cipriani, E., Florian, M., Mahut, M., & Nigro, M. (2011). A gradient approximation
approach for adjusting temporal origin–destination matrices. Transportation Research Part C: Emerging Technologies, 19(2), 270-282. Retrieved from
Ciuffo, B., & Punzo, V. (2010). Verification of traffic micro-simulation model
calibration procedures: Analysis of goodness-of-fit measures. In Proceeding of the 89th Annual Meeting of the Transportation Research Record, Washington, DC.
Cools, M., Moons, E., & Wets, G. (2010). Assessing the quality of origin-destination
matrices derived from activity travel surveys: Results from a Monte Carlo experiment. Transportation Research Record: Journal of the Transportation Research Board(2183), 49-59. Retrieved from
Cooper, R. (1977). Abstract Structure and the Indian Rāga System. In
Ethnomusicology (pp. 1-32).
Bibliography 190
Crawford, F., Watling, D. P., & Connors, R. D. (2018). Identifying road user classes based on repeated trip behaviour using Bluetooth data. Transportation research part A: policy and practice, 113, 55-74. Retrieved from
Cremer, M., & Keller, H. (1981). Dynamic identification of flows from traffic counts
at complex intersections. In Proc., 8th International Symposium on Transportation and Traffic Theory (pp. 121-142): University of Toronto Press, Canada.
Cremer, M., & Keller, H. (1987). A new class of dynamic methods for the
identification of origin-destination flows. Transportation Research Part B: Methodological, 21(2), 117-132. Retrieved from
Dandy, G., Daniell, T., Foley, B., & Warner, R. (2017). Planning and design of
engineering systems: CRC Press. de Dios Ortuzar, J., & Willumsen, L. G. (2011). Modelling transport: John Wiley &
Sons. De Haas, M. (2016). Travel pattern transitions: A study on the effects of life events on
changes in travel patterns. Retrieved from Dictionary. (Ed.) (2018) Cambridge online dictionary. Cambridge, UK. Dixit, V., Gardner, L. M., & Waller, S. T. (2013). Strategic User Equilibrium
Assignment Under Trip Variability. In Transportation Research Board 92nd Annual Meeting (Vol. 9).
Dixon, M. P. (2000). Incorporation of automatic vehicle identification data into the
synthetic OD estimation process. Ph.D. thesis, Texas A&M University, College Station, TX.
Dixon, M. P., & Rilett, L. (2002). Real‐Time OD Estimation Using Automatic Vehicle
Identification and Traffic Count Data. Computer‐Aided Civil and Infrastructure Engineering, 17(1), 7-21. Retrieved from
Djukic, T. (2014). Dynamic OD demand estimation and prediction for dynamic traffic
management. In PhD Thesis. Djukic, T., Barceló Bugeda, J., Bullejos, M., Montero Mercadé, L., Cipriani, E., van
Lint, H., & Hoogendoorn, S. (2015). Advanced traffic data for dynamic od demand estimation: The state of the art and benchmark study. In TRB 94th Annual Meeting Compendium of Papers (pp. 1-16).
Djukic, T., Hoogendoorn, S., & Van Lint, H. (2013). Reliability assessment of dynamic
OD estimation methods based on structural similarity index. Retrieved from Djukic, T., Van Lint, J., & Hoogendoorn, S. (2012). Application of principal
component analysis to predict dynamic origin-destination matrices.
Bibliography 191
Transportation Research Record: Journal of the Transportation Research Board(2283), 81-89. Retrieved from
Dong, H., Wu, M., Ding, X., Chu, L., Jia, L., Qin, Y., & Zhou, X. (2015). Traffic zone
division based on big data from mobile phone base stations. In Transportation Research Part C: Emerging Technologies (Vol. 58, pp. 278-291).
Elbatta, M. T., & Ashour, W. M. (2013). A dynamic method for discovering density
varied clusters. Int. Journal of Signal Processing, Image Processing, and Pattern Recognition, 6(1), 123-134. Retrieved from
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for
discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, pp. 226-231).
Fisk, C. (1989). Trip matrix estimation from link traffic counts: The congested network
case. Transportation Research Part B: Methodological, 23(5), 331-336. Retrieved from
Fisk, C. S., & Boyce, D. E. (1983). A note on trip matrix estimation from link traffic
count data. Transportation Research Part B: Methodological, 17(3), 245-250. Retrieved from
Florian, M., & Chen, Y. (1995). A Coordinate Descent Method for the Bi‐level O–D
Matrix Adjustment Problem. International Transactions in Operational Research, 2(2), 165-179. Retrieved from
Frederix, R., Viti, F., & Tampère, C. M. (2011). A hierarchical approach for dynamic
origin-destination matrix estimation on large-scale congested networks. In 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC) (pp. 1543-1548): IEEE.
Frederix, R., Viti, F., & Tampère, C. M. (2013). Dynamic origin–destination
estimation in congested networks: theoretical findings and implications in practice. Transportmetrica A: Transport Science, 9(6), 494-513. Retrieved from
Friedrich, M., Immisch, K., Jehlicka, P., Otterstätter, T., & Schlaich, J. (2010).
Generating origin-destination matrices from mobile phone trajectories. Transportation Research Record: Journal of the Transportation Research Board(2196), 93-101. Retrieved from
Gan, L., Yang, H., & Wong, S. C. (2005). Traffic counting location and error bound
in origin-destination matrix estimation problems. Journal of Transportation Engineering, 131(7), 524-534. Retrieved from
Gazis, D. C., & Knapp, C. H. (1971). On-line estimation of traffic densities from time-
series of flow and speed data. Transportation Science, 5(3), 283-301. Retrieved from
Bibliography 192
Gong, L., Liu, X., Wu, L., & Liu, Y. (2016). Inferring trip purposes and uncovering travel patterns from taxi trajectory data. Cartography and Geographic Information Science, 43(2), 103-114. Retrieved from
Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A.-L. (2008). Understanding individual
human mobility patterns. nature, 453(7196), 779. Retrieved from Guo, D., Zhu, X., Jin, H., Gao, P., & Andris, C. (2012). Discovering spatial patterns
in origin‐destination mobility data. Transactions in GIS, 16(3), 411-429. Retrieved from
Gur, Y. J. (1980a). Estimation of an origin-destination trip table based on observed
link volumes and turning movements. Executive summary. Retrieved from Gur, Y. J. (1980b). ESTIMATION OF AN ORIGIN-DESTINATION TRIP TABLE
BASED ON OBSERVED LINK VOLUMES AND TURNING MOVEMENTS. EXECUTIVE SUMMARY. Retrieved from
Hai, Y., Akiyama, T., & Sasaki, T. (1998). Estimation of time-varying origin-
destination flows from traffic counts: A neural network approach. Mathematical and computer modelling, 27(9), 323-334. Retrieved from
Hazelton, M. L. (2000). Estimation of origin–destination matrices from link flows on
uncongested networks. Transportation Research Part B: Methodological, 34(7), 549-566. Retrieved from
Heeringa, W. J. (2004). Measuring dialect pronunciation differences using
Levenshtein distance. Citeseer. Hensher, D. A. (1976). The structure of journeys and nature of travel patterns. In
Environment and Planning A (Vol. 8, pp. 655-672). Hollander, Y., & Liu, R. (2008). The principles of calibrating traffic microsimulation
models. Transportation, 35(3), 347-362. Retrieved from Hu, S. (1996). An adaptive kalman filtering algorithm for the dynamic estimation and
prediction of freeway origin-destination matrices (Order No. 9725558). Available from ProQuest Dissertations & Theses Global. (304264559). . Retrieved from
Huang, T.-q., Yu, Y.-q., Li, K., & Zeng, W.-f. (2009). Reckon the parameter of
DBSCAN for multi-density data sets with constraints. In Artificial Intelligence and Computational Intelligence, 2009. AICI'09. International Conference on (Vol. 4, pp. 375-379): IEEE.
Iqbal, M. S., Choudhury, C. F., Wang, P., & González, M. C. (2014). Development of
origin–destination matrices using mobile phone call data. In Transportation Research Part C: Emerging Technologies (Vol. 40, pp. 63-74).
Bibliography 193
Jiang, S., Ferreira, J., & González, M. C. (2017). Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore. In IEEE Transactions on Big Data (Vol. 3, pp. 208-219).
Jornsten, K., & Nguyen, S. (1979). On the estimation of a trip matrix from network
data. Publication No. 153, Centre de Recherche sur les Transports, Universite~ de Montreal, Montreal. Retrieved from
Jörnsten, K., & Wallace, S. W. (1993). Overcoming the (apparent) problem of
inconsistency in origin-destination matrix estimations. Transportation science, 27(4), 374-380. Retrieved from
Kang, Y. (1999). Estimation and prediction of dynamic origin-destination (OD)
demand and system consistency control for real-time dynamic traffic assignment operation.
Kantorovich, L. V. (1942). On the translocation of masses. In Dokl. Akad. Nauk. USSR
(NS) (Vol. 37, pp. 199-201). Khoei, A. M., Bhaskar, A., & Chung, E. (2013). Travel time prediction on signalised
urban arterials by applying SARIMA modelling on Bluetooth data. In 36th Australasian transport research forum (ATRF) 2013.
Kieu, L.-M., Bhaskar, A., & Chung, E. (2015). A modified Density-Based Scanning
Algorithm with Noise for spatial travel pattern analysis from Smart Card AFC data. Transportation Research Part C: Emerging Technologies, 58, 193-207. Retrieved from
Kieu, L. M., Bhaskar, A., & Chung, E. (2012). Bus and car travel time on urban
networks: integrating bluetooth and bus vehicle identification data. Retrieved from
Kim, H., Baek, S., & Lim, Y. (2001). Origin-destination matrices estimated with a
genetic algorithm from link traffic counts. Transportation Research Record: Journal of the Transportation Research Board(1771), 156-163. Retrieved from
Kim, S.-J., Kim, W., & Rilett, L. (2005). Calibration of microsimulation models using
nonparametric statistical techniques. Transportation Research Record: Journal of the Transportation Research Board(1935), 111-119. Retrieved from
Kroeber, A. L. (1943). Structure, function and pattern in biology and anthropology.
The Scientific Monthly, 56(2), 105-113. Retrieved from Kwon, J., & Varaiya, P. (2005). Real-time estimation of origin-destination matrices
with partial trajectories from electronic toll collection tag data. Transportation Research Record: Journal of the Transportation Research Board(1923), 119-126. Retrieved from
Laharotte, P.-A., Billot, R., Come, E., Oukhellou, L., Nantes, A., & El Faouzi, N.-E.
(2015). Spatiotemporal analysis of Bluetooth data: Application to a large urban
Bibliography 194
network. IEEE Transactions on Intelligent Transportation Systems, 16(3), 1439-1448. Retrieved from
Lee, J.-G., Han, J., Li, X., & Gonzalez, H. (2008). TraClass: trajectory classification
using hierarchical region-based and trajectory-based clustering. Proceedings of the VLDB Endowment, 1(1), 1081-1094. Retrieved from
Lee, M., & Sohn, K. (2015). Inferring the route-use patterns of metro passengers based
only on travel-time data within a Bayesian framework using a reversible-jump Markov chain Monte Carlo (MCMC) simulation. Transportation Research Part B: Methodological, 81, 1-17. Retrieved from
Lee, M. S., & McNally, M. G. (2003). On the structure of weekly activity/travel
patterns. Transportation Research Part A: Policy and Practice, 37(10), 823-839. Retrieved from
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and
reversals. In Soviet physics doklady (Vol. 10, pp. 707-710). Li, M., Zhang, T., Chen, Y., & Smola, A. J. (2014). Efficient mini-batch training for
stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 661-670): ACM.
Lo, H.-P., & Chan, C.-P. (2003). Simultaneous estimation of an origin–destination
matrix and link choice proportions using traffic counts. Transportation Research Part A: Policy and Practice, 37(9), 771-788. Retrieved from
Lu, L., Xu, Y., Antoniou, C., & Ben-Akiva, M. (2015). An enhanced SPSA algorithm
for the calibration of Dynamic Traffic Assignment models. Transportation Research Part C: Emerging Technologies, 51, 149-166. Retrieved from
Lu, Z., Rao, W., Wu, Y. J., Guo, L., & Xia, J. (2015). A Kalman filter approach to
dynamic OD flow estimation for urban road networks using multi‐sensor data. Journal of Advanced Transportation, 49(2), 210-227. Retrieved from
Lundgren, J. T., & Peterson, A. (2008a). A heuristic for the bilevel origin–destination-
matrix estimation problem. Transportation Research Part B: Methodological, 42(4), 339-354. Retrieved from
Lundgren, J. T., & Peterson, A. (2008b). A heuristic for the bilevel origin–destination-
matrix estimation problem. In Transportation Research Part B: Methodological (Vol. 42, pp. 339-354).
Ma, W., & Qian, Z. S. (2018). Statistical inference of probabilistic origin-destination
demand using day-to-day traffic data. In Transportation Research Part C: Emerging Technologies (Vol. 88, pp. 227-256).
Bibliography 195
Maher, M. (1983). Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach. Transportation Research Part B: Methodological, 17(6), 435-447. Retrieved from
Maher, M. (1998). Algorithms for logit-based stochastic user equilibrium assignment.
Transportation Research Part B: Methodological, 32(8), 539-549. Retrieved from
Maher, M. J., Zhang, X., & Van Vliet, D. (2001). A bi-level programming approach
for trip matrix estimation and traffic control problems with stochastic user equilibrium link flows. Transportation Research Part B: Methodological, 35(1), 23-40. Retrieved from
Manual, T. A. (1964). Bureau of public roads. In US Department of Commerce. Martin, W. A., & McGuckin, N. A. (1998). Travel estimation techniques for urban
planning (Vol. 365): National Academy Press Washington, DC. Marzano, V., Papola, A., Simonelli, F., & Papageorgiou, M. (2018). A Kalman Filter
for Quasi-Dynamic od Flow Estimation/Updating. IEEE Transactions on Intelligent Transportation Systems(99), 1-9. Retrieved from
Masip, D., Djukic, T., Breen, M., & Casas, J. (2018). Efficient OD Matrix Estimation
Based on Metamodel for Nonlinear Assignment Function. Paper presented at Australasian Transport Research Forum 2018 Proceedings, Darwin, Australia.
McNally, M. G. (2008). The four step model. Center for Activity Systems Analysis.
Retrieved from Michau, G. (2016). Link dependent origin-destination matrix estimation: nonsmooth
convex optimisation with Bluetooth-inferred trajectories. Université de Lyon. Michau, G., Nantes, A., Bhaskar, A., Chung, E., Abry, P., & Borgnat, P. (2017).
Bluetooth data in an urban context: Retrieving vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems, 18(9), 2377-2386. Retrieved from
Michau, G., Nantes, A., & Chung, E. (2013). Towards the retrieval of accurate OD
matrices from Bluetooth data: lessons learned from 2 years of data. Retrieved from
Michau, G., Nantes, A., Chung, E., Abry, P., & Borgnat, P. (2014, 17-18 February
2014). Retrieving trip information from a discrete detectors network: The case of Brisbane Bluetooth detectors. In 32nd Conference of Australian Institutes of Transport Research (CAITR 2014).
Michau, G., Pustelnik, N., Borgnat, P., Abry, P., Nantes, A., Bhaskar, A., & Chung,
E. (2016). A Primal-Dual Algorithm for Link Dependent Origin Destination Matrix Estimation. arXiv preprint arXiv:1604.00391. Retrieved from
Bibliography 196
Michau, G., Pustelnik, N., Borgnat, P., Abry, P., Nantes, A., Bhaskar, A., & Chung, E. (2017). A primal-dual algorithm for link dependent origin destination matrix estimation. IEEE Transactions on Signal and Information Processing over Networks, 3(1), 104-113. Retrieved from
Mishalani, R. G., Coifman, B., & Gopalakrishna, D. (2002). Evaluating Real-Time
Origin-Destination Flow Estimation Using Remote Sensing Based Surveillance Data. In Proceeding of the 7th International Conference on the Applications of Advanced Technology in Transportation, ASCE, Cambridge, MA.
Monge, G. (1781). Mémoire sur la théorie des déblais et des remblais. Histoire de
l'Académie Royale des Sciences de Paris, 177, 666-704. Retrieved from Nanda, D. (1997). A Method to Enhance the Performance of Synthetic Origin-
Destination (OD) Trip Table Estimation Models. In Masters Thesis. Naoki, M. (2013). Geographic Boundaries of Population Census of Japan. Retrieved
from http://ggim.un.org/meetings/2013-ISGI-NY/documents/ESA_STAT_AC.279_P20_Geographic%20Boundaries%20of%20Population%20Census%20of%20Japan02.pdf
Naveh, K. S., & Kim, J. (2018). Urban Trajectory Analytics: Day-of-Week Movement
Pattern Mining Using Tensor Factorization. IEEE Transactions on Intelligent Transportation Systems. Retrieved from
Nguyen, S. (1976). A unified approach to equilibrium methods for traffic assignment.
In Traffic equilibrium methods (pp. 148-182): Springer. Nguyen, S. (1977). Estimating and OD Matrix from Network Data: a Network
Equilibrium Approach. Montréal: Université de Montréal, Centre de recherche sur les transports. Retrieved from
NPTEL. (2009). Data collection. I. Madras (Ed.) Retrieved from
https://nptel.ac.in/courses/105101087/06-Ltexhtml/p8/p.html Okutani, I., & Stephanedes, Y. J. (1984). Dynamic prediction of traffic volume through
Kalman filtering theory. Transportation Research Part B: Methodological, 18(1), 1-11. Retrieved from
Oliveira-Neto, F. M., Han, L. D., & Jeong, M. K. (2012). Online license plate matching
procedures using license-plate recognition machines and new weighted edit distance. Transportation research part C: emerging technologies, 21(1), 306-320. Retrieved from
Osorio, C. (2017). High-dimensional offline OD calibration for stochastic traffic
simulators of large-scale urban networks. In Technical Report: Massachusetts Institute of Technology.
Bibliography 197
Osorio, C. (2019). Dynamic origin-destination matrix calibration for large-scale network simulators. In Transportation Research Part C: Emerging Technologies (Vol. 98, pp. 186-206).
Oxford. (Ed.) (2018) English Oxford living Dictionaries. Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional
data: a review. Acm Sigkdd Explorations Newsletter, 6(1), 90-105. Retrieved from
Patriksson, M. (2015). The traffic assignment problem: models and methods: Courier
Dover Publications. Perera, K., Bhattacharya, T., Kulik, L., & Bailey, J. (2015). Trajectory inference for
mobile devices using connected cell towers. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 23): ACM.
Pollard, T., Taylor, N., van Vuren, T., & MacDonald, M. (2013). Comparing the
Quality of OD Matrices in Time and Between Data Sources. In Proceedings of the European Transport Conference.
Pool, B. (2014). Brisbane Strategic Transport Model-Multi-Modal (BSTM-MM):
model improvement program. In Australian Institute of Traffic Planning and Management (AITPM) National Conference, 2014, Adelaide, South Australia, Australia.
Rakha, H., & Van Aerde, M. (1995). Statistical analysis of day-to-day variations in
real-time traffic flow data. Transportation research record, 26-34. Retrieved from
Respati, W. S., Bhaskar, A., Zheng, Z., & Chung, E. (2017). Systematic Identification
of Peak Traffic Period. Paper presented at Australasian Transport Research Forum 2017 Proceedings, Auckland, New Zealand.
RNA. (2016). RNA Annual Report. Retrieved from
https://www.rna.org.au/media/881637/2016%20rna%20annual%20report.pdf Robillard, P. (1975). Estimating the OD matrix from observed link volumes.
Transportation Research, 9(2), 123-128. Retrieved from Ros-Roca, X., Montero, L., Schneck, A., & Barceló, J. (2018). Investigating the
performance of SPSA in simulation-optimization approaches to transportation problems. In Transportation research procedia (Vol. 34, pp. 83-90).
Ruiz de Villa, A., Casas, J., & Breen, M. (2014). OD matrix structural similarity:
Wasserstein metric. In Transportation Research Board 93rd Annual Meeting. SEQTS. (2010). South-East Queensland Travel Survey 2009. In Queensland
Transport and Main Roads.
Bibliography 198
Shafiei, M., Nazemi, M., & Seyedabrishami, S. (2015). Estimating time-dependent
origin–destination demand from traffic counts: extended gradient method. Transportation Letters, 7(4), 210-218. Retrieved from
Shafiei, S., Gu, Z., & Saberi, M. (2018). Calibration and validation of a simulation-
based dynamic traffic assignment model for a large-scale congested network. Simulation Modelling Practice and Theory, 86, 169-186. Retrieved from
Spall, J. C. (1992). Multivariate stochastic approximation using a simultaneous
perturbation gradient approximation. IEEE transactions on automatic control, 37(3), 332-341. Retrieved from
Spiess, H. (1987). A maximum likelihood model for estimating origin-destination
matrices. Transportation Research Part B: Methodological, 21(5), 395-412. Retrieved from
Spiess, H. (1990). A gradient approach for the OD matrix adjustment problem.
CENTRE DE RECHERCHE SUR LES TRANSPORTS PUBLICATION, 1(693), 2. Retrieved from
Stathopoulos, A., & Tsekeris, T. (2003). Framework for analysing reliability and
information degradation of demand matrices in extended transport networks. Transport Reviews, 23(1), 89-103. Retrieved from
Stathopoulos, A., & Tsekeris, T. (2005). Methodology for Validating Dynamic
Origin–Destination Matrix Estimation Models with Implications for Advanced Traveler Information Systems. Transportation Planning and Technology, 28(2), 93-112. Retrieved from
Steinbach, M., Ertöz, L., & Kumar, V. (2004). The challenges of clustering high
dimensional data. In New directions in statistical physics (pp. 273-309): Springer.
Stone, J. R., Han, Y., Khattak, A. J., Fan, Y., Huntsinger, L. F., & Bing Mei, P. (2007).
Guidelines for Developing Travel Demand Models: Medium Communities and Metropolitan Planing Organizations. Retrieved from
Tamin, O., & Willumsen, L. (1989). Transport demand model estimation from traffic
counts. Transportation, 16(1), 3-26. Retrieved from Tavana, H. (2001). Internally-Consistent Estimation of Dynamic Network Origin-
Destination Flows from Intelligent Transportation Systems Data Using Bi-Level Optimization. Ph.D. Dissertation, The University of Texas at Austin. Retrieved from
Tavassoli, A., Alsger, A., Hickman, M., & Mesbah, M. (2016a). How close the models
are to the reality? Comparison of Transit Origin-Destination Estimates with Automatic Fare Collection Data. In Australasian Transport Research Forum (ATRF), 38th, 2016, Melbourne, Victoria, Australia.
Bibliography 199
Tavassoli, A., Alsger, A., Hickman, M., & Mesbah, M. (2016b). How close the models
are to the reality? Comparison of Transit Origin-Destination Estimates with Automatic Fare Collection Data. In Australasian Transport Research Forum 2016 Proceedings.
TMR. (2016). BSTM data. In Department of Transport Main Roads. TMR. (2017). The Future of Transport. Retrieved from
https://blog.tmr.qld.gov.au/blog/2017/02/09/the-future-of-transport/ Toledo, T., & Kolechkina, T. (2013). Estimation of Dynamic Origin-Destination
Matrices Using Linear Assignment Matrix Approximations. IEEE Trans. Intelligent Transportation Systems, 14(2), 618-626. Retrieved from
Toledo, T., Koutsopoulos, H., Davol, A., Ben-Akiva, M., Burghout, W., Andréasson,
I., . . . Lundin, C. (2003). Calibration and validation of microscopic traffic simulation tools: Stockholm case study. Transportation Research Record: Journal of the Transportation Research Board(1831), 65-75. Retrieved from
Toole, J. L., Colak, S., Sturt, B., Alexander, L. P., Evsukoff, A., & González, M. C.
(2015). The path most traveled: Travel demand estimation using big data resources. Transportation Research Part C: Emerging Technologies, 58, 162-177. Retrieved from
Transport, B. o., & Economics, R. (Singer-songwriters). (2007). Estimating urban
traffic and congestion cost trends for Australian cities. On: Department of Transport and Regional Services Canberra.
Tympakianaki, A., Koutsopoulos, H. N., & Jenelius, E. (2018). Robust SPSA
algorithms for dynamic OD matrix estimation. Procedia computer science, 130(C), 57-64. Retrieved from
USCensus. (2019). 2005 Metropolitan and Micropolitan Statistical Areas (CBSAs) of
the United States and Puerto Rico. Retrieved from https://www2.census.gov/geo/maps/metroarea/us_wall/Dec2005/cbsa_us_1205.pdf?#.
Van Der Zijpp, N. (1997). Dynamic origin-destination matrix estimation from traffic
counts and automated vehicle identification data. Transportation Research Record: Journal of the Transportation Research Board(1607), 87-94. Retrieved from
Van Zuylen, H. (1978). The information minimising method: validity and applicability
to transport planning. New developments in modelling travel demand and urban systems. Retrieved from
Van Zuylen, H. J., & Willumsen, L. G. (1980). The most likely trip matrix estimated
from traffic counts. Transportation Research Part B: Methodological, 14(3), 281-293. Retrieved from
Bibliography 200
Verbas, İ., Mahmassani, H., & Zhang, K. (2011). Time-dependent origin-destination
demand estimation: Challenges and methods for large-scale networks with multiple vehicle classes. Transportation Research Record: Journal of the Transportation Research Board(2263), 45-56. Retrieved from
Villani, C. (2003). Topics in optimal transportation: American Mathematical Soc. Vogl, T. P., Mangis, J., Rigler, A., Zink, W., & Alkon, D. (1988). Accelerating the
convergence of the back-propagation method. Biological cybernetics, 59(4-5), 257-263. Retrieved from
Wang, W., Wan, H., & Chang, K.-H. (2016). Randomized block coordinate
descendant STRONG for large-scale Stochastic Optimization. In Winter Simulation Conference (WSC), 2016 (pp. 614-625): IEEE.
Wang, Y., Ma, X., Liu, Y., Gong, K., Henricakson, K. C., Xu, M., & Wang, Y. (2016).
A Two-Stage Algorithm for Origin-Destination Matrices Estimation Considering Dynamic Dispersion Parameter for Route Choice. PloS one, 11(1), e0146850. Retrieved from
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality
assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612. Retrieved from
Weijermars, W., & Van Berkum, E. (2005). Analyzing highway flow patterns using
cluster analysis. In Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE (pp. 308-313): IEEE.
Wen, T., Cai, C., Gardner, L., Dixit, V., & Waller, S. T. (2014). A Least Squares
Method For Origin-Destination Estimation Incorporating Variability of Day-to-Day Travel Demand. Retrieved from
Wild, D. (1997). Short-term forecasting based on a transformation and classification
of traffic volume time series. International Journal of Forecasting, 13(1), 63-72. Retrieved from
Willumsen, L. (1984a). Estimating time-dependent trip matrices from traffic counts.
In Ninth International Symposium on Transportation and Traffic Theory (pp. 397-411): VNU Science Press Utrecht.
Willumsen, L. (1984b). Estimating time-dependent trip matrices from traffic counts.
In Ninth International Symposium on Transportation and Traffic Theory, VNU Science Press (pp. 397-411).
Willumsen, L. G. (1978). Estimation of an OD Matrix from Traffic Counts–A Review.
Retrieved from Wilson, A. G. (1967). A statistical theory of spatial distribution models.
Transportation research, 1(3), 253-269. Retrieved from
Bibliography 201
Yang, C., Yan, F., & Xu, X. (2017). Daily metro origin-destination pattern recognition
using dimensionality reduction and clustering methods. In Intelligent Transportation Systems (ITSC), 2017 IEEE 20th International Conference on (pp. 548-553): IEEE.
Yang, H. (1995). Heuristic algorithms for the bilevel origin-destination matrix
estimation problem. In Transportation Research Part B: Methodological (Vol. 29, pp. 231-242).
Yang, H., Iida, Y., & Sasaki, T. (1991). An analysis of the reliability of an origin-
destination trip matrix estimated from traffic counts. Transportation Research Part B: Methodological, 25(5), 351-363. Retrieved from
Yang, H., Sasaki, T., Iida, Y., & Asakura, Y. (1992). Estimation of origin-destination
matrices from link traffic counts on congested networks. Transportation Research Part B: Methodological, 26(6), 417-434. Retrieved from
Yujian, L., & Bo, L. (2007). A normalized Levenshtein distance metric. IEEE
transactions on pattern analysis and machine intelligence, 29(6), 1091-1095. Retrieved from
Yun, I., & Park, B. (2005). Estimation of dynamic origin destination matrix: A genetic
algorithm approach. In Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE (pp. 522-527): IEEE.
Zhang, A., Kang, J. E., Axhausen, K. W., & Kwon, C. (2018). Multi-day activity-
travel pattern sampling based on single-day data. In 97th Annual Meeting of the Transportation Research Board (TRB 2018): TRB Annual Meeting.
Zhou, X. (2004). Dynamic origin-destination demand estimation and prediction for
off-line and on-line dynamic traffic assignment operation. Retrieved from Zhou, X., & Mahmassani, H. S. (2006). Dynamic origin-destination demand
estimation using automatic vehicle identification data. IEEE Transactions on intelligent transportation systems, 7(1), 105-114. Retrieved from
Zhou, X., & Mahmassani, H. S. (2007). A structural state space model for real-time
traffic origin–destination demand estimation and prediction in a day-to-day learning framework. Transportation Research Part B: Methodological, 41(8), 823-840. Retrieved from
Zhu, K. (2007). Time-dependent origin-destination estimation: Genetic algorithm-
based optimization with updated assignment matrix. KSCE Journal of Civil Engineering, 11(4), 199-207. Retrieved from
Appendices 202
Appendices
Appendix A
Methodology to develop B-OD matrix
The knowledge of trajectories can further help in developing Bluetooth based
OD matrices at scanner as well as at zonal level. The methodology to develop
Bluetooth-based OD matrix (B-OD) at zonal level is explained using flowchart shown
in the Figure A1.1.
To develop a B-OD matrix, raw Bluetooth data from a particular day is spatially
and temporally matched to define individual Bluetooth vehicle trajectories that are
further split into trips (Michau, et al., 2014). Here, the Bluetooth dataset for the study
date is downloaded from the BCC server and unique Device IDs are then identified.
Records are retrieved individually for each Device ID and are sorted based on time-
stamp detections for further analysis. Within the record of each Device ID, difference
in time-stamps between successive detections; that is, δ, is used to identify unique
trips/trajectories. If successive detections are from the same scanner, then the threshold
value of δ chosen to identify a new trip is 10 minutes. On the other hand, if the
successive detections are from different scanners, the threshold value of δ chosen is 30
minutes, to identify a new trip. The threshold values are chosen in accordance with a
similar study on Brisbane Bluetooth datasets by Michau et al. (2017). This way, all
individual trips/trajectories of each Device ID are identified, and are then further used
to infer OD trips at a scanner level to form the sOD matrix. The size of the sOD matrix
is 845 × 845, which is further transformed into B-OD matrix at either SA2 or SA3
levels. For this, the concordance between BMS location and SA zones are considered
from the BCC. The process is repeated over 415 days to generate the B-OD matrices
for each day.
Appendices 203
Figure A1.1: Methodology to develop B-OD matrix at zonal level
BCC Bluetooth dataset
Select Device ID
Retrieve the detection record (R) of Device ID and sort it
based on time-stamps
Identify trip ends and add trips of Device _ID into
OD flows for corresponding OD pairs
Exogenous information relating scanner
locations to SA2 zones
Is it the last Device_ID?
End
If successive detections are from the
same scanner
Select two successive detections from the first till the
last detections in record R
If δ >= 10 mins
Record a new trip for the Device ID
If δ >=30 minsYesNo
No
Yes Yes
No
Is it the last record for Device_ID?
Yes
No
Trajectories construction
Yes
No
Identify trip ends and add trips of Device _ID into
OD flows for corresponding OD pairs
Exogenous information relating scanner
locations to SA2 zones
YYYesYesY
OD matrix development
Appendices 204
Appendix B
Can the structure of Bluetooth trips be a proxy for true OD?
1. Background
Although Bluetooth observations capture only a fraction of the actual OD
demand, the observed trip distribution patterns can provide some insights into the real
travel behaviour within any network. Due to this capacity, the knowledge of Bluetooth
trips seems to have the potential to contribute to the OD matrix estimation process.
However, it is important to validate the knowledge of Bluetooth trips before any
practical implementation. Since the ground truth is unknown, it is not directly possible
to validate Bluetooth trips. However, in the absence of the availability of true OD
flows, confidence in the Bluetooth trips can be gained using surrogate measures that
are considered to be the structural properties of OD matrices (Antoniou, et al., 2016).
Because Bluetooth trips are only partial observations, they might not infer a
complete sequence of trajectories. However, at a macroscopic level, the structure of
Bluetooth trips might provide some valuable trip-related information.
In this context, few analyses were conducted to check if the “structure” of
Bluetooth trips preserves the integrity of the actual demand distribution and can be
used as a proxy for the actual distribution of trips. This hypothesis was validated by
testing the following four surrogate measures: a) screenline counts, b) the Brisbane
Strategic Transport Model (BSTM) (BSTM, 2016) travel time distributions, c) car
users (as drivers) taken from the 2016 Census (ABS, 2018), and d) BSTM OD flows.
2. Bluetooth vs Screenline counts
Screenlines divide the region into larger zones, running along natural barriers,
such as river sides, with few cross points across them or along major road
corridors/tunnels (NPTEL, 2009). They are primarily used to calibrate and validate the
base year transport models, such as BSTM (Pool, 2014). See Figure A2.1(a) for the
screenlines and the locations of screenline counts (blue coloured Google pins), and
Figure A2.1(b) for a closer look at the alignment of screenlines with the locations of
BMSs (red coloured circles) within the BCC region.
Appendices 205
(b) Figure A2.1: (a) Locations of screen line counts and screen lines for BCC region (b) Closer
look at the alignment of BMS locations with the screen lines (BSTM, 2016)
A good correlation between screenline counts and the number of Bluetooth
observations from BMS scanners upstream and downstream of the screenline count
location should enhance confidence in using Bluetooth data. For the current analysis,
selected locations of the screenline survey (blue coloured Google pins) and the
corresponding BMS locations (red coloured circles) are shown in Figure A2.2. For
each selected location (both directions of flow), BMS scanners were identified
upstream and downstream, such that the detected Bluetooth data should pass through
the screenline count location. Here, eight screenline count locations were selected, and
these locations were distributed throughout the study region (see Figure A2.2). The
data for comparison were weekday traffic from the year 2016.
Figure A2.2: Selected screen line and BMS locations
Figure A2.3 presents the correlation between the two counts. An increasing trend
between Bluetooth and screenline counts with R2 value = 0.7594 and correlation
coefficient (ρ) = 0.8714 was observed. A decent alignment with high correlation
Appendices 206
coefficients between both observations demonstrates the aptness of Bluetooth in
transport applications.
The penetration rate of Bluetooth counts; that is, the ratio of Bluetooth to
screenline counts for the selected locations is illustrated in Figure A2.3. The average
penetration rate is observed to be nearly 20% and spread between 15%-35% (see
Figure A2.4), which is consistent with 12%-30% for the year 2014 for Brisbane City
(Michau, 2016). Note that slope of the plot in Figure A2.3 also illustrates the
penetration rate of Bluetooth counts. Although traffic counts observations from both
data sources do not provide any “structure” or trip distribution related information, the
penetration rate of Bluetooth counts being consistent both in the literature and in the
current study provides some intial confidence on the Bluetooth observations.
Figure A2.3: Bluetooth vs screenline counts
Figure A2.4: The penetration rate of Bluetooth counts at the selected study locations
y = 0.1923x + 2.8692R² = 0.7594
0
500
1000
1500
2000
1000 2000 3000 4000 5000 6000 7000 8000 9000
Blue
toot
h co
unts
-AM
pea
k(7
AM
-9 A
M)
Screenline counts- AM peak (7AM-9 AM)
Correlation coefficient = 0.8714
0.15
0.34
0.19
0.19
0.16
0.22
0.15
0.19
0.20
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
Walter Taylor Bridge
Breakfast Creek Rd
William Jolly Bridge
Compton Rd
Sherwood Road
Wynnum Rd
South Pine Road
Beckett Road
Average penetration
Bluetooth penetration rate
Sele
ctiv
e sc
reen
line
loca
tions
Appendices 207
3. Trip length (travel time) distribution
Trip length distribution tables are generally used to compare and validate the
modelled trip distribution (such as gravity model) with that of the survey data (Stone
et al., 2007). The trip length distribution plots of existing demand models can also be
used to compare the distributions developed from other data sources. In this study, a
similar analysis was carried out to check the validity of the Bluetooth travel time
distribution plots with BSTM’s distribution.
First, the raw travel times from Bluetooth observations were filtered using a
median absolute deviation filter with f=2 (Kieu, Bhaskar, & Chung, 2012) and the
Bluetooth travel times were estimated for trips between SA2 zones. Similarly, the
BSTM travel times were aggregated from BSTM zonal level to SA2 level for a fair
comparison.
The travel time distribution plots for the Bluetooth observations and the BSTM
model are shown in Figure A2.5. Here, the x-axis represents the travel time in minutes
between SA2 zonal pairs and the y-axis represents the proportion of car trips during
the AM peak period. The mean travel time of trips observed from the BSTM and
Bluetooth were 15.87 minutes and 12.96 minutes, respectively, and their
corresponding standard deviations were 19.70 minutes and 15.33 minutes,
respectively. The highest proportion of car trips (represented by peaks) for Bluetooth
and the BSTM plots were at 10 and 15 minutes, respectively. The difference between
the two plots could be due to the modelling errors in the BSTM, or because the
Bluetooth travel time was the travel time between BMS scanner to scanner locations,
which was not consistent with that of BSTM zone to zone travel time. Another reason
for the negative shift was that Bluetooth detections at the first and last signalised
intersections were not necessarily captured. Thus, proper care must be taken when
using Bluetooth data. Nevertheless, the general shape of the distribution and the values
are acceptable for current surrogate comparison.
Appendices 208
Figure A2.5: BSTM vs Bluetooth travel time distribution
4. Trip productions: Bluetooth vs Census
In this section, Bluetooth trips produced from SA2 zones during the AM peak
period are compared to the 2016 Census “Method of travel to work” observations
(ABS, 2018). The following assumption was made before the comparison: Since most
of the Bluetooth trips were from the detections of in-built cars systems, they could be
considered as a proxy for car trips within the study region.
According to the 2016 Census, most work-based trips in Brisbane were made by
car (as driver) for commuting (75.3%) (ABS, 2017). Since most work-based trips are
generally observed during the AM peak period, car users (who preferred to travel to
work as drivers) from the 2016 Census data were used as a proxy for actual car trips
produced.
The comparison between trips produced by Bluetooth (x-axis) and car users (as
drivers) from the 2016 Census (y-axis) is demonstrated using a scatter plot in Figure
A2.6. Bluetooth observations were found to closely correspond to the 2016 Census
data, with a correlation coefficient (ρ) of 0.8467 and R2 value of 0.7168. Bluetooth
trips also constituted approximately 4.3% of the census car trips. Interestingly, this
observation is consistent with the average Bluetooth trips capture rate of 4.4%
validated by Chitturi et al. (2014).
0.000
0.050
0.100
0.150
0.200
0.250
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Prop
ortio
n of
trip
s
Travel time(Minutes)
BSTM Bluetooth
Appendices 209
Figure A2.6: Bluetooth vs 2016 Census – trips productions at SA2 level
5. BSTM OD flows vs Bluetooth based OD flows
In this section, BSTM OD flows are compared with Bluetooth based OD (B-OD)
flows at the SA3 level for the AM peak period. In practice, the BSTM base year OD
was generated using extensive modelling techniques. On the other hand, the B-OD
flows were developed through the inference of vehicles trajectories (see Appendix A
for the details of the methodology adopted for developing the B-OD matrix).
Because the B-OD flows are only a fraction of the actual OD flows and BSTM
flows represent scaled-up demand, this section first analyses the variation in the
capture rates of B-OD flows with respect to BSTM OD flows, and then compares both
through R2 and correlation coefficient.
The total number of BSTM OD flows and B-OD flows to be compared were
235,556 and 56,542, respectively. This implies that Bluetooth captured almost 24% of
the total BSTM flows (this value also lies in the range of 15%-35%; i.e., the penetration
rate of Bluetooth counts in Section 2). However, it must be noted that the capture rate
of Bluetooth OD flows was different from that with respect to counts, and varied for
different OD pairs due to many factors, such as distance, socio-economic
characteristics, etc. To provide an example of the variations, the comparison between
BSTM OD flows and B-OD flows is shown using Pareto distribution plots (see Figure
A2.7), where the x-axis represents the ratio of the Bluetooth to BSTM OD flows
( ) arranged in the order of their frequency; the y-axis (left) represents the
proportion of total OD pairs for different values of , and the y-axis (right)
R² = 0.7168
0
1000
2000
3000
4000
5000
6000
7000
0 200 400 600 800 1000 1200 1400
Car u
sers
/SA
2 fro
m C
ensu
s 20
16
Bluetooth trips produced at SA2 level
Correlation coefficient = 0.8467
Appendices 210
represents the cumulative percentage of OD pairs. Interestingly, 75% of the OD pairs
had varying between 5 to 35%. Compared to the capture rate of Bluetooth
counts (i.e. 15% - 35% from Section 2), the penetration rate of the OD flows had a
higher variation. Note that although BSTM is a modelled flow, for understanding
purposes, can be considered a proxy for the actual capture rates of the OD
flows.
Figure A2.7: Pareto distribution of the ratio of Bluetooth OD to BSTM OD flows
Nevertheless, a good correlation was observed between BSTM OD flows and B-
OD flows (ρ = 0.8878 in Figure A2.8). The line of fit between both OD flows also
shows a descent alignment with R2 = 0.7883, and the slope of the fit suggests that the
B-OD flows were nearly 25% of BSTM OD flows. Although there was a wide spread
of , a good correlation with BSTM OD provides more confidence in the
structure of Bluetooth trips.
Perc
enta
ge o
f OD
pai
rs
Appendices 211
Figure A2.8: B-OD flows vs BSTM OD flows
From the above comparisons of the OD matrix structural properties (over four
surrogate measures) it can be concluded that although Bluetooth observations are
partial and only constitute a sample, the structure of the Bluetooth trips is not bad and
probably it can be used as a proxy for the actual distribution of trips. However, in the
absence of the ground truth, and the discrepancies due to the statistical and model
errors in the Bluetooth and data from other sources that are difficult to disentangle, a
further detailed investigation is recommended for the future research.
y = 4.0379x + 114.47R² = 0.7883
0
1000
2000
3000
4000
5000
6000
0 200 400 600 800 1000 1200 1400 1600
BST
M O
D fl
ows
B-OD flows
Correlation Coefficient = 0.8878
Appendices 212
Appendix C
MATLAB optimisation code for B-OD/B-SP methods
clc clear all currentFolder = pwd; True_OD_matrix = load(fullfile(currentFolder, 'inputs', 'OD.txt')); W=size(True_OD_matrix,1)*size(True_OD_matrix,2); % Size of OD vector True_transpose = True_OD_matrix'; OD_True_Vector=True_trasnpose(:); % True OD vector Prior_OD_matrix=load(fullfile(currentFolder, 'inputs', Prior_OD_matrix.txt')); Prior_transpose = Prior_OD_matrix'; Prior_OD_vector=Prior_transpose(:);% Prior OD vector load (fullfile(currentFolder, 'inputs','zones.txt')); load (fullfile(currentFolder, 'inputs', 'ObsCounts.txt'));% Observed link flows load (fullfile(currentFolder, 'inputs','det_sec.txt')); % The IDs of loop detectors and corresponding links (sections) y_obs=ObsCounts(: , 2);% Observed Link counts OD=Prior_OD_matrix; OD_Tranp=OD'; OD_Vector = OD_Tranp (:); %% case=1 for B-OD method and case=2 for B-SP method
if case==1 load (fullfile(currentFolder, 'inputs','BOD_vector.mat'));% Vector of B-OD flows BOD_matrix = reshape(BOD_vector,size (zones,1),size(zones,1)); BOD_matrix=BOD_matrix'; % B-OD matrix pen=Ω*210;% Ω is the percentage number of connected OD pairs (excluding internal OD pairs). So, for 210 OD pairs, Ω=100%, for 168 OD pairs, Ω= 80%, and so on. [BpenStr,Bpen,OD_Vector,OD_ind] = Bluetooth_connected_ODpairs (OD, pen);% Refer to “Bluetooth_connected_ODpairs” function
elseif case==2
load (fullfile(currentFolder, 'inputs',' EndDet_Zone.txt')); % Look up table relating BMS at trip ends with zonal IDs load (fullfile(currentFolder, 'inputs',' Subpathfreq_obs.mat')); % The 1st column is for subpath flows; 2nd and 3rd (last) column for origin and destination zones Subpathflows_obs = Subpathfreq_obs (:,1); % Vector of subpath flows
end lambda = lambda_prior; % choose any prior step length as lambda_prior StrOD_Prior=corr2(Prior_OD_vector,OD_True_Vector); StrOD_BT = corr2(BOD_vector,OD_True_Vector); Obj_ite=[]; y_est_ite=[];Demand=OD_Vector;Values_Ite=[]; l_up=1.5;l_down=0.9; % chose l_up and l_down by trial and error
Appendices 213
[GSSI_PriorOD]=GSSI_computation (Prior_OD_matrix,True_OD_matrix); % Refer to “GSSI_computation” function Objective=2;% Objective=1 corresponds to the obj. function of traditional method and Objective=2 is for B-OD/B-path method for ite=1:20% the number of iterations
[OD_Id_Sno] = Aimsun_matrix (OD,zones); [terminal] = Aimsun (); % Refer to the function “Aimsun.m” system(terminal); % Executing “terminal” [extracted_data] = SQLITE(); % Refer to the function “SQLITE” extracted_data=cell2mat (extracted_data); diff=abs(OD_True_Vector - OD_Vector); SumdiffSq =sum((diff). *(diff)); RMSE_OD=sqrt(SumdiffSq/size(OD_True_Vector,1)); load('BNE.matrix');% BNE is the output from ‘AutoRun_BNE.py’ saved as a text file. Refer python script (AutoRun_BNE.py) in appendix D. [y_est, Sections, LinkPropMat] = Assignment (det_sec, extracted_data, BNE, OD_Id_Sno); % Refer the “assignment” function diff2=abs(y_obs - y_est); SumdiffSq2 =sum((diff2). *(diff2)); RMSE_linkflows=sqrt(SumdiffSq2/size(y_obs,1));
if case==1 [Obj, Gradient, StrBOD] = Obj_Grad (y_obs, y_est, LinkPropMat, Objective, BpenStr, OD_Vector, BOD_vector, pen); % Refer the “Obj_Grad” function
elseif case==2
BTraw = readtable("Det2DetDataALLDETECTOR.txt"); % this text file is output from Aimsun through a separately scripted API. It resembles the raw Bluetooth observations from BMSs. [Traj3_table] = BTpaths_secs (BTraw, det_sec, EndDet_Zone); % refer to “BTpaths_secs” function. [SubTraj3_table] = Subpathsanalysis (Traj3_table); % refer “Subpathsanalysis” function MLSPNo=1; % Only one Most Likely Subpath per OD pair is considered [SubMLP, SubPathFreq] = MostLikelySubpaths (SubTraj3_table, zones, det_sec, MLSPNo); % refer “MostLikelySubpaths” function Subpathprop = Sub_path_proportion_matrix (Subpathfreq_obs, SubPathFreq, OD_True_Vector, OD, zones); [Obj, Gradient, StrSP] = Obj_Grad_subpathflows (y_obs, y_est, PropMat, Subpathprop, Subpathflows_obs, Subpathflows_est, Objective);
end Obj_ite = [Obj_ite; Obj];
if size(Obj_ite,1)>1 if Obj<=Obj_ite(end-1)
lambda=lambda*l_up;
Appendices 214
else
lambda=lambda*l_down; % Deleting the parameter values of current iteration Demand (:, end)=[]; Obj_ite(end)=[];y_est_ite(:,end)=[];Values_Ite(end,:)=[];
% Setting the OD vector to previous iteration OD_Vector=Demand(:,end);
end end
OD_Vector=OD_Vector.*(1-lambda.*(Gradient));% Updating OD vector temp5=reshape(OD_Vector,[size(OD,1),size(OD,2)]); OD = temp5';% Reshaping OD vector into matrix
if case==1 values = [StrBOD, RMSE_OD, RMSE_linkflows, Obj];
elseif case==2 values = [StrSP, RMSE_OD, RMSE_linkflows, Obj]; end
Values_Ite=[Values_Ite; values]; y_est_ite=[y_est_ite, y_est]; Demand = [Demand, OD_Vector]; fopen ('matrix.txt','w'); % deleting flow values in the text file “matrix.txt” delete 'BNE.ang.sqlite'; % deleting the Aimsun sqlite database delete 'BNE.ang.old'; % deleting the Aimsun back-up delete 'BNE.matrix'; % deleting the assignment related text file, “matrix.txt”
if case==2 delete 'Det2DetDataALLDETECTOR.txt'; end end IteNo=length (Obj_ite); tempe2=reshape (Demand (:,IteNo),[size(OD,1),size(OD,2)]); Final_OD_matrix =tempe2';% This is the final estimated OD matrix [GSSI_OD] = GSSI_computation (Final_OD_matrix, True_OD_matrix);
Appendices 215
Appendix D
Functions
Function-1: Bluetooth_connected_ODpairs.m function [BpenStr, OD_ind] = Bluetooth_connected_ODpairs (OD, pen) diagind= []; for ind=1: size(OD,1) diagind = [diagind; size(OD,1)*(ind-1)+ind]; % indices of diagonal elements end SNO= [1:size(OD,1)*size(OD,1)]; filter = [~ismember(SNO, diagind)]; OD_ind =SNO(: , filter); % Indices of all OD pairs except that of diagonal BpenStr=datasample (OD_ind, pen-size(OD,1),'Replace', false); % Indices of OD pairs that are Bluetooth connected end Function-2: Aimsun_matrix.m function [OD_Id_Sno] = Aimsun_matrix(OD,zones)
for j=1:size(OD,2) m=OD; %Save the matrix into a .txt file compliant AIMSUN standards filename=strcat('matrix','.txt'); fid=fopen(filename,'w'); fprintf (fid, 'id\t'); fprintf (fid,'%i\t', zones); fprintf (fid,'\n'); fclose (fid); fid=fopen (filename, 'a'); for i=1: length (zones) fprintf (fid,'%i\t', zones(i)); fprintf (fid,'%5.2f\t', m(i,:)); fprintf (fid,'\n'); end fclose(fid); end OD_Id=[]; for i=1: length(zones) for j=1: length(zones) OD_Id=[OD_Id; zones(i) zones(j)]; end end OD_Id_Sno=[[1:size(OD,1)*size(OD,1)]' OD_Id];
end
Appendices 216
Function-3: Aimsun.m function [terminal] = Aimsun () AIMSPath= ('C:\Program Files\Aimsun\Aimsun Next 8.2\aconsole.exe'); Autorunpath= ('C:\.....\AutoRun_BNE.py'); Angpath= ('C:\.....\BNE.ang'); Detpath= ('C:\.....\det_sec.txt'); terminal =horzcat ('"', AIMSPath, '"',' -script ', '"',Autorunpath,'"',' ','"', Angpath, '"',' ','"', Detpath, '"' ); end Function-4: SQLITE.m function [extracted_data] = SQLITE () Sqlitepath=('C:\.....\BNE.ang.sqlite'); conn=database(Sqlitepath,'','','org.sqlite.JDBC','jdbc:sqlite:C:\......\BNE.ang.sqlite'); sqlQuery='SELECT oid, did, sid, ent, countveh, speed, occupancy, density FROM MIDETEC ORDER BY oid, ent;';% Selected fields of sqlite database extracted_data = fetch (conn, sqlQuery); close(conn); end Function-5: GSSI_computation.m function [GSSI] = GSSI_computation(X,Y) % In this function geographical windows are created for 15 x 15 OD matrix % Higher zones (hz) are created as follows: % hz1: Westend-Southbank-Highgate Hill, Ext5, Gabba % hz2: BNE Inner East, New Farm; hz3: Valley, Spring Hill, CBD i.e. 9, 14,2 % hz4: Newstead-Bowen Hills, Ext 2, Ext 4; hz5: Ext 1, Kelvin Grove-Herston % hz6: Red Hill-Milton-Auchenflower, Ext 3 hz=[1;2;3;4;5;6]; % 6 hzs Zonal_IDs=[3,2,5,4,6,4,1,3,5,2,4,6,3,1,1];% hz IDs for all 15 small zones that links to the order of OD matrix loaded into Aimsun that is not in the sequence of hz. hz=unique(Zonal_IDs);
for i=1: length(hz) for j=1: length(hz) Filter_Row=[ismember(Zonal_IDs, hz(i))]; Filter_Col=[ismember(Zonal_IDs, hz(j))]'; X_Geo=X(Filter_Row, Filter_Col); Y_Geo=Y(Filter_Row, Filter_Col);
mean_comp(i,j)=2*mean2(X_Geo)*mean2(Y_Geo)/ (mean2(X_Geo)^2+mean2(Y_Geo)^2);
std_comp(i,j)=2*std2(X_Geo)*std2(Y_Geo)/ (std2(X_Geo)^2+std2(Y_Geo)^2);
Covariance=cov(X_Geo,Y_Geo); str_comp(i,j)=Covariance(1,2)/(std2(X_Geo)*std2(Y_Geo)); SSIM(i,j)=mean_comp(i,j)*std_comp(i,j)*str_comp(i,j); end end GSSI=mean2(SSIM); end
Appendices 217
Function-6: Assignment.m function [y_est, Sections, PropMat] = Assignment (det_sec, extracted_data, BNE, OD_Id_Sno) Detectors=unique(extracted_data(:,1));% as 1st column represents detectors y_est=[]; Sections=[];
for q=1: length(Detectors) Filter=[det_sec(:,1)==Detectors(q)]; Filter0 = [extracted_data(:,1)== Detectors(q)];
temp0=extracted_data(Filter0,:); if sum(Filter)~=0
Sections = [Sections; det_sec(Filter,2)]; % Links equipped with detectors y_est = [y_est; max(temp0 (:,5))]; % User equilibrium link flows
end end
PropMat= zeros(24,225); % 24 Links and 225 OD pairs (including diagonals) for w=1: length(Sections)
Filter2 = [BNE(:,4)==Sections(w)]; K=BNE(Filter2,:); for c=1:size(K,1)
for d=1:size(OD_Id_Sno,1) if OD_Id_Sno(d,2)== K(c,1)& OD_Id_Sno(d,3)== K(c,2)
PropMat(w, OD_Id_Sno(d,1))=K(c,7); end end
end end
end Function-7: MostLikelySubpaths.m function [SubMLP, SubPathFreq] = MostLikelySubpaths (SubTraj3_table, zones, det_sec, MLSPNo) SubMLP= []; SubPathFreq = []; for z1=1: size(zones,1)
for z2=1: size(zones,1) if z1~=z2
Filter =[SubTraj3_table.Zorg==zones(z1) & SubTraj3_table.Zdest==zones(z2)]; if sum(Filter)>0 Org_trips=SubTraj3_table.trip_det(Filter,:); subpath_id=[];subpath_id_str=[];
for p=1:size(Org_trips,1) p1=cell2mat(Org_trips(p)); subpath_str=[];
for j=1:size(p1,1) result = strcat(num2str(p1(j))); subpath_str = [subpath_str result];
end
Appendices 218
subpath_id = [subpath_id; str2num(subpath_str)]; subpath_id_str = [subpath_id_str; {subpath_str}]; subpath_str=[];
end a = unique(subpath_id);% “a” gives all path IDs that are unique temp2 = sortrows([a, histc(subpath_id(:),a)],2); % temp2 gives the frequency of each unique subpath_id a1 = unique(subpath_id_str); MLPTab=table; MLPTab.a1=a1; a2=[];
for r=1:size(a1,1) MLPTab.a1_no(r)=str2num(char(a1(r))); a2= [a2; MLPTab.a1_no(r)];
end MLPTab_sort=sortrows(MLPTab,{'a1_no'}); a3 = zeros(size(a2));
for i = 1:size(a2,1) %Replaced a3(i) = sum(path_id(:) == a2(i));
end if size(temp2,1)>MLPNo
dp=MLPNo; else
dp=size(temp2,1); end
MLPath_id = [];MLPath_freq=[]; for i=size(temp2,1):-1:(size(temp2,1)-dp+1)
MLPath_id = [MLPath_id;temp2(i)]; MLPath_freq = [MLPath_freq;[temp2(i,2) zones(z1) zones(z2)]];
end DetIDs = det_sec_test(:,1); Dig_DetID = numel(num2str(fix(abs(DetIDs(1)))));%Dig_DetID = No of digits in a detector ID MLP_Det = [];MLPaths=[];
for i=1:size(MLPath_id,1); AllDet_in_Path= []; filter = [MLPTab.a1_no(:)==MLPath_id(i)]; y=char(MLPTab.a1(filter)); Dig_PathID = numel(y);% Dig_PathID = No of digits in a pathID
for j=1:Dig_DetID:Dig_PathID % here step length of 3 is taken because, the no of digits in each detector is 3
Det_in_path = sscanf(y(j:j+Dig_DetID-1), '%d');
Appendices 219
AllDet_in_Path=[AllDet_in_Path Det_in_path];
end te=struct('f1',AllDet_in_Path); MLPaths = [MLPaths; [struct2cell(te) zones(z1) zones(z2)]]; end
SubMLP = [SubMLP; MLPaths]; SubPathFreq=[SubPathFreq; MLPath_freq]; MLPaths=[]; end
end end
end end ------------------------------------------------------------------------------------------------------ Function-8: Subpath proportion matrix function (Subpathprop) = Sub_path_proportion_matrix (Subpathfreq_obs, SubPathFreq, OD_True_Vector, OD, zones) u=unique(Subpathfreq_obs(:,2:3),'rows');temp=[]; for f=1:size(u,1) filter1 = [Subpathfreq_obs(:,2)==u(f,1) & Subpathfreq_obs(:,3)==u(f,2)]; filter2=[SubPathFreq(:,2)==u(f,1) & SubPathFreq(:,3)==u(f,2)]; p1=Subpathfreq_obs (filter1,:); p2=SubPathFreq (filter2,:); if size(p1,1)==size(p2,1) temp = [temp; p2]; elseif size(p1,1)>size(p2,1) diff=size(p1,1)-size(p2,1); if size(p2,1)==0 temp2 = repmat([0 p1(1,2) p1(1,3)],diff,1); temp = [temp; p2; temp2]; else temp2 = repmat([0 p2(1,2) p2(1,3)],diff,1); temp = [temp; p2; temp2]; end else size(p1,1)<size(p2,1) temp = [temp; p2(1:size(p1,1),:)]; end end Subpathfreq_est = temp; Subpathflows_est = Subpathfreq_est(:,1); OD_noDiag=[];OD_Flows_Zones=[]; for q1=1:size(OD,1) for q2=1:size(OD,1) OD_Flows_Zones = [OD_Flows_Zones; OD(q1,q2), zones(q1), zones(q2)]; % Creating a OD vector with Origin and Dest IDs if q1~=q2
Appendices 220
OD_noDiag=[OD_noDiag; OD(q1,q2), zones(q1), zones(q2)]; end end end % constructing subpath proportion matrix based on estimated subpath flows Subpathprop=zeros(size(Subpathflows_obs,1),size(OD_True_Vector,1)); for q3=1:size(Subpathprop,1) filter1 = [OD_Flows_Zones(:,2)==Subpathfreq_est(q3,2)]; filter2= [OD_Flows_Zones(:,3)==Subpathfreq_est(q3,3)]; filter3=filter1.*filter2; ODflow = OD_Flows_Zones(logical(filter3),1); f=find(filter3==1); Subpathprop(q3,f)=Subpathfreq_est(q3,1)/ODflow; end Function-9: Obj_Grad.m function [Obj, Gradient, StrBOD] = Obj_Grad_rep (y_obs, y_est, PropMat, Objective, BpenStr, OD_Vector, BOD_vector, pen) OD_Vec_sample = OD_Vector(BpenStr',:); BOD_Vec_sample = BOD_vector(BpenStr',:); B1=BOD_Vec_sample-mean(BOD_Vec_sample); OD1=OD_Vec_sample-mean(OD_Vec_sample); OD_ones = repmat(1,size(OD_Vector,1),1); StrBOD = corr2(OD_Vec_sample, BOD_Vec_sample); c1=sum(B1.*OD1)/sum(OD1.^2); c2=sqrt(sum(B1.^2)*sum(OD1.^2)); Grad_Str_sample=(B1-c1*OD1)/c2; Grad_Str_x = zeros(size(OD_Vector,1),1);
for u=1: pen Grad_Str_x(BpenStr(u))= Grad_Str_sample (u);
end G1= (y_est-y_obs)*(2-StrBOD); G2= (2-StrBOD)*PropMat'; G3 = Grad_Str_x*(y_est-y_obs)';
if Objective = = 1 % Link flows deviation Obj=0.5*(sum(((y_est-y_obs).^2))); Gradient = (PropMat')*((y_est-y_obs));
return elseif Objective = = 2 % Link flows deviations and Structural deviation of OD flows
Obj=0.5*(sum(((y_est-y_obs)*(2-StrBOD)).^2)); Gradient = (G2-G3)*(G1);return end end
Appendices 221
Function-10: Obj_Grad_subpathflows.m function [Obj, Gradient, StrSP] = Obj_Grad_subpathflows (y_obs, y_est, PropMat, pathprop, Subpathflows_obs, Subpathflows_est, Objective)
Pathflowsdiff=Subpathflows_obs-mean(Subpathflows_obs); Estpathflowsdiff=Subpathflows_est-mean(Subpathflows_est); StrSP = corr2(Subpathflows_est, Subpathflows_obs); c1=sum(Pathflowsdiff.*Estpathflowsdiff)/sum(Estpathflowsdiff.^2); c2=sqrt(sum(Pathflowsdiff.^2)*sum(Estpathflowsdiff.^2)); Grad_Str_x=pathprop'*(Pathflowsdiff-c1*Estpathflowsdiff)/c2; G1= (y_est-y_obs)*(2-StrSP); G2= (2-StrSP)*PropMat'; G3 = Grad_Str_x*(y_est-y_obs)';
if Objective == 1 % Link flows Obj=0.5*(sum(((y_est-y_obs).^2))); Gradient = (PropMat')*((y_est-y_obs));return
elseif Objective == 2 % Link flows deviations and Structural deviation of Subpath flows
Obj=0.5*(sum(((y_est-y_obs)*(2-StrSP)).^2)); Gradient = (G2-G3)*(G1);return
end end ------------------------------------------------------------------------------------------------------
Appendices 222
Appendix E
Python script – Autorun.py
from __future__ import division import sys, os import sqlite3 from PyANGAimsun import * from PyANGBasic import * from PyANGConsole import * from PyANGKernel import * from PyANGBasic import * from PyMesoPlugin import * IdRep =[486954] #Simulation Replication ID IdScenario = 479272 #Simulation Scenaria ID IdDemand = 479284 # IdDemand def getExternalMatrices(model): matrix =[]; objType = model.getType("GKODMatrix") for types in model.getCatalog().getUsedSubTypesFromType( objType ): for obj in types.itervalues(): matrix.append(obj) matrix.sort() return matrix def main(argv): for i in range(len(IdRep)): if len(argv) < 3: print "usage: %s ANG_FILE_NAME MATRIX_ID" % argv[0] return -1 angFileName = argv[1] angAbsName = os.path.basename(angFileName) angName = os.path.splitext(angAbsName)[0]# Motorway is [0] and .ang is [1] assignMatrixFileName = os.path.dirname(angFileName)+ os.sep+angName detectorsFileName=argv[2] console = ANGConsole() if console.open( argv[1] ): model = console.getModel() # Create a backup console.save(argv[1]+".old") for matrix in getExternalMatrices(model): newPath = os.path.dirname( str( model.getDocumentFileName() ) ) + "\matrix.txt"
Appendices 223
matrix.setLocation(newPath) matrix.restoreExternalMatrix() plugin = GKSystem.getSystem().getPlugin("GGetram") scenario = model.getCatalog().find(IdScenario) demand = model.getCatalog().find(IdDemand) simulator = plugin.getCreateSimulator(model) replication = model.getCatalog().find(IdRep[i]) if simulator.isBusy() == False: print "usage: %s simulator.isBusy() == False"
# sections_det code is to get only those sections with detectors installed sections_det=list() file=open(detectorsFileName,'r') if file!=None: for line in file.readlines(): idDetector = line.split(";") det=model.getCatalog().find(int(idDetector[0])) if det != None: sections_det.append(det.getBottomObject())
# links code is to get all sections in the network links=list(); linkType = model.getType( "GKSection" ) for segs in model.getCatalog().getUsedSubTypesFromType( linkType ): for lk in segs.itervalues(): links.append(lk) if replication.getExperiment().getSimulatorEngine() == GKExperiment.eMicro: simulator.addSimulationTask (GKSimulationTask(replication,GKReplication.eBatch)) simulator.setGatherProportions (True,assignMatrixFileName+'.matrix',sections_det, turnings, 0 ) simulator.simulate() console.close() else: console.getLog().addError( "Cannot load the network" ) print "cannot load network" if __name__ == "__main__": sys.exit(main(sys.argv))