Efficient Algorithms for Large-Scale GIS Applications
description
Transcript of Efficient Algorithms for Large-Scale GIS Applications
![Page 1: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/1.jpg)
Efficient Algorithms for
Large-Scale GIS Applications
Laura Toma
Duke University
![Page 2: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/2.jpg)
Why GIS?
How it all started..• Duke Environmental researchers:
• computing flow accumulation for Appalachian Mountains took 14 days (with 512MB memory)
– 800km x 800km at 100m resolution ~64 million points
GIS (Geographic Information Systems)• System that handles spatial data
• Visualization, processing, queries, analysis• Indispensable tool
• Modeling, analysis, prediction, decision making• Rich area of problems for Computer Science
• Graphics, graph theory, computational geometry etc
![Page 3: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/3.jpg)
GIS and the EnvironmentMonitoring: keep an eye on the state of earth
systems using satellites and monitoring stations (water, ecosystems, urban development)
Modeling, simulation: predict consequences of human actions and natural processes
Analysis and risk assessment: find the problem areas and analyse the possible causes (soil erosion, groundwater pollution, traffic jams…)
Planning and decision support: provide
information and tools for better management of natural and socio-economic resources
![Page 4: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/4.jpg)
Precipitation in Tropical South America
Lots of rain
Dry
H. Mitasova
![Page 5: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/5.jpg)
Nitrogen in Chesapeake Bay
High nitrogen concentrationsH. Mitasova
![Page 6: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/6.jpg)
Jockey’s Ridge evolution
N
H. Mitasova
Combining IR-DOQQ, LIDAR and RTK GPS to assess the change: decreasing elevation, extending towards homes and a road
A
B
C
![Page 7: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/7.jpg)
Bald Head Island Renourishment
1998: LIDAR shoreline 1998
2000: LIDAR shoreline 2000
2001, Dec.: RTK GPS shoreline
surface is 1998 LIDAR
H. Mitasova
![Page 8: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/8.jpg)
Sediment flow
H. Mitasova
![Page 9: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/9.jpg)
Computations on Terrains
Reality: Height of terrain is a
continuous function of two variables f(x,y)
Estimate, predict, simulate Flooding, pollution Erosion, deposition Vegetation structure ….
GIS:
DEM (Digital Elevation Model) is a set of sample points and theirheights { x, y, hxy}
Compute indices
![Page 10: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/10.jpg)
DEM Representations
3 2 47 5 87 1 9
3 2 47 5 87 1 9
3 2 47 5 87 1 9
3 2 47 5 87 1 9
TIN
GridContour lines
Sample points
![Page 11: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/11.jpg)
Panama DEM
![Page 12: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/12.jpg)
Modeling Flow on Terrains
What happens when it rains?• Predict areas susceptible to floods.• Predict location of streams.• Compute watersheds.
Flow is modeled using two basic attributes• Flow Direction (FD)
• The direction water flows at a point
• Flow Accumulation (FA)• Total amount of water that flows through a point
(if water is distributed according to the flow directions)
![Page 13: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/13.jpg)
Panama DEM - Flow Accumulation
![Page 14: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/14.jpg)
![Page 15: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/15.jpg)
![Page 16: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/16.jpg)
Uses
Flow direction and flow accumulationare used for:
Computing other hydrological attributes • river network• moisture indices• watersheds and watershed divides
Analysis and prediction of sediment and pollutant movement in landscapes.
Decision support in land management, flood and pollution prevention and disaster management
![Page 17: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/17.jpg)
Massive Terrain Data
Remote sensing technology • Massive amounts of
terrain data
• Higher resolutions (1km, 100m, 30m, 10m, 1m,…)
NASA-SRTM • Mission launched in 2001• Acquired data for 80% of
earth at 30m resolution • 5TB
USGS • Most of US at 10m
resolution LIDAR
• 1m res
![Page 18: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/18.jpg)
Example: LIDAR Terrain Data
Massive (irregular) point sets (1-10m resolution) Relatively cheap and easy to collect
Example: Jockey’s ridge (NC coast)
![Page 19: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/19.jpg)
It’s Growing!
Appalachian Mountains
Area if approx. 800 km x 800 km
Sampled at:
• 100m resolution: 64 million points (128MB)
• 30m resolution: 640 (1.2GB)
• 10m resolution: 6400 = 6.4 billion (12GB)
• 1m resolution: 600.4 billion (1.2TB)
![Page 20: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/20.jpg)
Computing on Massive Data GRASS (open source GIS)
• Killed after running for 17 days on a 6700 x 4300 grid (approx 50 MB dataset)
TARDEM (research, U. Utah)• Killed after running for 20 days on a 12000 x 10000 grid
(appox 240 MB dataset)• CPU utilization 5%, 3GB swap file
ArcInfo (ESRI, commercial GIS)• Can handle the 240MB dataset • Doesn’t work for datasets bigger than 2GB
![Page 21: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/21.jpg)
Outline
Introduction Flow direction and flow accumulation
• Definitions, assumptions, algorithm outline. Scalability to large terrains
• Why not? I/O-efficient algorithms
• I/O-efficient flow accumulation• TerraFlow
Theoretical results Conclusion
![Page 22: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/22.jpg)
Flow Direction (FD) on Grids
Water flows downhill• follows the gradient
On grids: Approximated using 3x3 neighborhood• SFD (Single-Flow Direction):
• FD points to the steepest downslope neighbor
• MFD (Multiple-Flow direction) : • FD points to all downslope neighbors
![Page 23: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/23.jpg)
Flow accumulation with MFD
![Page 24: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/24.jpg)
Flow accumulation with SFD
![Page 25: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/25.jpg)
Computing FD Goal: compute FD for every cell in the grid (FD grid) Algorithm:
• For each cell compute SFD/MFD by inspecting 8 neighbor cells Analysis: O(N) time for a grid of N cells Is this all?
• NO! flat areas: Plateas and sinks
![Page 26: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/26.jpg)
FD on Flat Areas …no obvious flow direction Plateaus
• Assign flow directions such that each cell flows towards the nearest spill point of the plateau
Sinks• Either catch the water inside the sink• Or route the water outside the sink using uphill flow directions
• model steady state of water and remove (fill) sinks by simulating flooding: uniformly pouring water on terrain until steady state is reached
• Assign uphill flow directions on the original terrain by assigning downhill flow directions on the flooded terrain
![Page 27: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/27.jpg)
Flow Accumulation (FA) on Grids
FA models water flow through each cell with “uniform rain”• Initially one unit of water in each cell
• Water distributed from each cell to neighbors pointed to by its FD• Flow conservation: If several FD, distribute proportionally to height
difference
• Flow accumulation of cell is total flow through it
Goal: compute FA for every cell in the grid (FA grid)
![Page 28: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/28.jpg)
Computing FA FD graph:
• node for each cell• (directed) edge from cell a to b if
FD of a points to b FD graph must be acyclic
• ok on slopes, be careful on plateaus
FD graph depends on the FD method used• SFD graph: a tree (or a set of trees)• MFD graph: a DAG (or a set of
DAGs)
![Page 29: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/29.jpg)
Computing FA: Plane Sweeping
Input: flow direction grid FD Output: flow accumulation grid FA (initialized to 1) Process cells in topological order. For each cell:
• Read its flow from FA grid and its direction from FD grid• Update flow for downslope neighbors (all neighbors pointed to by cell
flow direction) Correctness
• One sweep enough Analysis
• O(sort) + O(N) time for a grid of N cells
Note: Topological order means decreasing height order (since water flows downhill).
![Page 30: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/30.jpg)
Scalability Problem
We can compute FD and FA using simple O(N)-time algorithms
..but.. for large sets..??
Dataset Size (log)
![Page 31: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/31.jpg)
Scalability Problem: Why? Most (GIS) programs assume data fits in memory
• minimize only CPU computation But.. Massive data does not fit in main memory!
• OS places data on disk and moves data between memory and disk as needed
Disk systems try to amortize large access time by transferring large contiguous blocks of data
When processing massive data disk I/O is the bottleneck, rather than CPU time!
track
magnetic surface
read/write armread/write head
![Page 32: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/32.jpg)
Disks are Slow
“The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer)
![Page 33: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/33.jpg)
Scalability to Large Data Example: reading an array from disk
• Array size N = 10 elements
• Disk block size = 2 elements
• Memory size = 4 elements (2 blocks)
1 2 10 9 5 6 3 4 8 71 5 2 6 3 8 9 4 7 10
Algorithm 2: Loads 5 blocksAlgorithm 1: Loads 10 blocks
N blocks >> N/B blocks Block size is large (32KB, 64KB) N >> N/B
N = 256 x 106, B = 8000 , 1ms disk access time
N I/Os take 256 x 103 sec = 4266 min = 71 hr
N/B I/Os take 256/8 sec = 32 sec
![Page 34: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/34.jpg)
I/O model
I/O-operation• Read/write one block of
data from/to disk
I/O-complexity• number of I/O-operations
(I/Os) performed by the algorithm
External memory or I/O-efficient algorithms:
Minimize I/O-complexity
RAM model
CPU-operation
CPU-complexity• Number of CPU-operations
performed by the algorithm
Internal memory algorithms:Minimize CPU-complexity
![Page 35: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/35.jpg)
I/O-Efficient Algorithms
O(N) I/Os is bad!! • Improve to O(N/B) I/Os (if possible)
Minimize the number of blocks transferred between main memory and disk• Compute on whole block while it is in memory
• Avoid loading a block each time
• Use techniques from PRAM algorithms
![Page 36: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/36.jpg)
Sorting
Mergesort illustrates often used features:• Main memory sized chunks (for N/M runs)• Multi-way merge (repeatedly merge M/B of
them)
)log(BN
BN
BMO
![Page 37: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/37.jpg)
Computing FAI/O-Analysis
Algorithm: O(N) time Process (sweep) cells in topological order. For each cell:
• Read flow from FA grid and direction from FD grid• Update flow in FA grid for downslope neighbors
Problem: Cells of same height distributed over the terrain scattered access to FA grid and FD grid O(N) blocks
![Page 38: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/38.jpg)
I/O-Efficient Flow Accumulation
Eliminating scattered accesses to FD grid• Store FD grid in topological order
Eliminating scattered accesses to FA grid• Obs: flow to neighbor cell is only needed when
its time comes to be processed:• Topological rank time when cell is processed priority• Push flow by inserting flow increment in priority queue with priority equal to neighbor’s priority• Flow of cell obtained using DeleteMin operations• Note: Augment each cell with priority of 8 neighbors
– Obs: Space (~9N) traded for I/O
• Turns O(N) grid accesses into O(N) priority queue operations• Use I/O-efficient priority queue [A95,BK97]
• Buffered B-tree with with lazy updates
[ATV00]
![Page 39: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/39.jpg)
![Page 40: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/40.jpg)
TerraFlow TerraFlow is our suite of programs for flow routing and
flow accumulation on massive grids [ATV`00,AC&al`02]
Flow routing and flow accumulation modeled as graph problems and solved in optimal I/O bounds
Efficient• 2-1000 times faster on very large grids than existing software
Scalable• 1 billion elements!! (>2GB data)
Flexible • Allows multiple methods flow modeling
http://www.cs.duke.edu/geo*/terraflow
![Page 41: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/41.jpg)
TerraFlow
Significant speedup over ArcInfo for large datasets• East-Coast
TerraFlow: 8.7 Hours
ArcInfo: 78 Hours
• Washington state
TerraFlow: 63 Hours
ArcInfo: %
GRASS cannot handle
Hawaii dataset (killed
after (17 days!)Haw
aii
56M
Cumber
lands
80M Lower
NE
256M
East-C
oast
491M M
idwes
t
561M
Was
hingto
n
2G
0
10
20
30
40
50
60
70
80
90
Run
ning
Tim
e (H
ours
)
TerraFlow 512TerraFlow 128ArcInfo 512ArcInfo 128
500 MHz Alpha, FreeBSD 4.0
![Page 42: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/42.jpg)
I/O-Model Parameters
N = # elements in problem instance
B = # elements that fit in disk blockM = # elements that fit in main memory
Fundamental bounds:• Sorting: sort(N) =
D
P
M
Block I/O
)log(BN
BN
BMO
NBN
BN
BN
BM <<< log
In practice block and main memory sizes are big
![Page 43: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/43.jpg)
![Page 44: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/44.jpg)
I/O-Efficient Graph AlgorithmsGraph G=(V,E) Basic graph (searching) problems
• BFS, DFS, SSSP, topological sorting ..are big open problems in the I/O-model!
• Standard internal memory algorithms: O(E) I/Os• No I/O-efficient algorithms are known for any of these
problems on general graphs!• Lower bound Ω (sort(V)), best known Ω (V/sqrt(B))
O(sort(E)) algorithms for special classes of graphs• Trees, grid graphs, bounded-treewidth graphs, outerplanar
graphs, planar graphs• Exploit existence of small separators or geometric structure
![Page 45: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/45.jpg)
SSSP on Grid Graphs [ATV’00]
Lemma: The portion of δ(s,t) between
intersection points with boundaries of subgrids is the shortest path within the subgrid
Grid graphO(N) vertices, O(N) edges
Dijskstra’s algorithm: O(N) I/Os
Goal: compute shortest path δ(s,t) in O(sort(N)) I/Os
![Page 46: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/46.jpg)
SSSP on Grid Graphs [ATV’00]
Divide grid into subgrids of size BxB (assume M > B2)
Replace each BxB subgrid with complete graph on boundary nodes• Edge weight: shortest path between
the two boundary vertices within the subgrid
Reduced graph GR
O(N/B) vertices, O(N) edges
Idea: Compute shortest paths locally in each subgrid then compute the shortest way to combine them together
![Page 47: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/47.jpg)
SSSP on Grid Graphs [ATV’00]Algorithm
1. Compute SSSP on GR from s to all boundary vertices
2. Find SSSP from s to all interior vertices: for any subgrid σ, for any t in σ
δ(s,t) = min v in Bnd(σ) {δ(s,v) + δ σ(v,t)}
Correctness: • easy to show using Lemma
Analysis: O(sort(N)) I/Os• Dijkstra algorithm using I/O
efficient priority queue and graph blocking
![Page 48: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/48.jpg)
![Page 49: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/49.jpg)
Results on Planar graphsPlanar graph G with N vertices Separators can be computed in O(sort(N)) I/Os I/O-efficient reductions [ABT’00, AMTZ’01]
BFS, DFS, SSSP in O(sort(N)) I/Os
O(sort(N)) I/Os [AMTZ’01]
O(sort(N)) I/Os [ABT’00]O(sort(N)) I/Os [ABT’00]
DFS
BFS SSSPε-separators
![Page 50: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/50.jpg)
SSSP on Planar Graphs Similar with grid graphs. Assume M > B2, bounded degree Assume graph is separated
• O(N/B2) subgraphs, O(B2) vertices each, S=O(N/B) separators
• each subgraph adjacent to O(B) separators
![Page 51: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/51.jpg)
SSSP on Planar Graphs
Reduced graph GR
• S = O(N/B) vertices
• O(N/B2) x O(B2) = O(N) edges
Compute SSSP on GR• Dijkstra’s algorithm and I/O-efficient priority queue
• Each vertex is accessed once by its O(B)
adjacent vertices O(N) I/Os
• Use boundary sets• O(N/B2) boundary sets, each
accessed once by its O(B) adjacent
vertices O(N/B) I/Os
![Page 52: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/52.jpg)
On I/O-Efficient DFS
DFS upper bounds• Internal memory algorithm: O(V+E) time, O(V+E) I/Os
• Best upper bound• O(V + E/B log V) I/Os on general graphs
DFS on general graphs is a big open problem• Note: PRAM DFS is P-complete
DFS on planar graphs uses O(sort(N)) I/Os• DFS to BFS reduction [AMTZ’01]
![Page 53: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/53.jpg)
DFS to BFS Reduction on Planar Graphs
Idea: Partition the faces of G into levels around a source face containing s and grow DFS level-by-level
Levels can be obtained from BFS in dual graph Denote
• Gi = union of the boundaries of faces at level <= i• Ti = DFS tree of Gi
• Hi = Gi \ G i-1
Algorithm: Compute a spanning forest of Hi and attach it onto T
i-1 Structure of levels is simple
• The bicomps of the Hi are the boundary cycles of Gi
Glueing onto T i-1 is simple• A spanning tree is a DFS tree if and only if it has no cross edges
![Page 54: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/54.jpg)
DFS to BFS Reduction on Planar Graphs
Idea: Partition the faces of G into levels around a source face containing s and grow DFS level-by-level
![Page 55: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/55.jpg)
Other Graphs Results
Grid graphs [ATV’00]• MST, SSSP in O(sort(N)) I/Os
• CC in O(scan(N)) I/Os
Planar graphs [ABT’00, AMTZ’01]• Planar reductions
• DFS
General graphs [ABT’00]• MST in O(sort(N) log log N) I/Os
Planar directed graphs [submitted]• Topological sorting and ear decomposition in O(sort(N)) I/Os
![Page 56: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/56.jpg)
..In Conclusion
I have tried to convince you of a few of things:
Massive data is available and in order to process it scalable algorithms are necessary
I/O-efficient algorithms have applications “outside” computer science and have big potential for (interdisciplinary) collaboration
I/O-efficient algorithms are theory and practice put together and support educational efforts
Challenging, rewarding, fun!
![Page 57: Efficient Algorithms for Large-Scale GIS Applications](https://reader033.fdocuments.net/reader033/viewer/2022051216/56814e78550346895dbc10bb/html5/thumbnails/57.jpg)
Collaboration
Rewarding, good response • Duke Nicholas School of the Environment• NCSU Dept. of Marine, Earth and Atmospheric Sciences• GRASS, ESRI
TerraFlow • Incorporated in GRASS [AMT’02]• Current work with U. Muenster [GE]
• 2 MS students port TerraFlow to VisualC++ under Windows and make it ArcInfo extension
Extends projects and brings up new problems • LIDAR data