Geometric Algorithms and Data Structures
-
Upload
samantha-mason -
Category
Documents
-
view
71 -
download
7
description
Transcript of Geometric Algorithms and Data Structures
© Neeraj SuriEU-NSF ICT March 2006
Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de
Geometric Algorithms and Data Structures
Prof. Neeraj SuriAndreas JohanssonConstantin SarbuAbdelmajid Khelil
ICS-II - 2006 2Lecture 14: Geometric Algorithms and Data Structures
Outline
Introduction Geometric Data Structures
Quadtree□Region quadtree□Point quadtree
K-d tree Strip tree K-d trie Binary trie
Multidimensional Data Z-Order Multidimensional data Data mining
ICS-II - 2006 3Lecture 14: Geometric Algorithms and Data Structures
Geometric Problems (1)
Algorithmic geometry: Study of the algorithmic complexity of elementary geometric problems
Geometric problems: Are often abstract formulations of practical problems (similar to graph theory)
Some geometric problems and their interpretation: Given a set of points in the plane. Find all the points within
a rectangle□„Clipping“ in VR□Find tuples in a database with values within given bounds for
attributes A1 and A2□Generalization for searching in a k-dimensional field (all points
contained in a k-dimensional field)
ICS-II - 2006 4Lecture 14: Geometric Algorithms and Data Structures
Geometric Problems (2)
Given a set of rectangles in the plane. Find all pairwise intersecting rectangles
□Correctness test at designing Very Large Scale Integration (VLSI), chip layers as rectangles
Given a set of 3-dimensional objects (compounds). Find pair wise intersecting objects
□Ensuring the rule distance resp. the safety margin in CAD Given a set of rectangles in the plane. Find the slice plane.
□Geographic Information Systems (GIS), approximation of generic forms through rectangles, determining areas with specific properties on distinct maps (e.g. find regions which are sandy (map 1), wet (map 2), and between 200 and 300 m altitude (elevation map))
ICS-II - 2006 5Lecture 14: Geometric Algorithms and Data Structures
Geometric Problems (3)
Given a set of polyhedrons in space. Determine the edges or portion of edges that are visible or hidden from a viewpoint.
□Computation of a realistic view of a 3-dimensional scene□Determining the coverage area of a transmitter, the area with
no reception Given a set of points in a k-dimensional space and a query-
point P. Find the point S closest to P.□Voice recognition: A spoken word is characterized by features
and compared with the vocabulary (point set in a k-dimensional space).
ICS-II - 2006 6Lecture 14: Geometric Algorithms and Data Structures
Classification of Geometric Problems
2 classes of problems: Set problems: Compute the property of a set of objects S
you’re interested in.□E.g. the outline of the area covered by S
Search problems: Given a set of objects S and a query-object q. Find all objects in S that have a specific relation with q.
Set problems are often reducible to search problems E.g. Plane-Sweep algorithms reduce a k-dimensional set
problem to a (k-1)-dimensional search problem Search problems are solved by organizing S with the aid
of appropriate data structures and indexing
ICS-II - 2006 7Lecture 14: Geometric Algorithms and Data Structures
First Problem
How do we efficiently represent this figure?
ICS-II - 2006 8Lecture 14: Geometric Algorithms and Data Structures
Representing Figures (1)
How about a matrix representation?
Black = 1, empty = 0
0 0 1 1
0 0 1 1
0 0 1 1
0 0 0 1
1 0 0 0
1 1 0 0
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Not very effective
ICS-II - 2006 9Lecture 14: Geometric Algorithms and Data Structures
Representing Figures (2)
Idea: represent areas, not points
Now represent the areas using another structure
Quadtrees do this
1 1
1 1
1 1
1
1
1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1
1 1
1 1
1
1
1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
ICS-II - 2006 10Lecture 14: Geometric Algorithms and Data Structures
Overview of Quadtrees
Quadtree is a generic term Quadtree: A class of hierarchical data structures that are
based on recursive decomposition of space Differentiation is possible based on:
Data type represented by the Quadtree : Point data, regions, curves, surfaces, and volumes
Principle of decomposition: regular vs. input-driven Resolution: Fixed vs. variable number of decomposition steps
Examples: Region quadtree Point quadtree
Literature: Samet, H.; “The Quadtree and Related Hierarchical Data Structures”, ACM
Comp. Surveys, Vol. 16, No. 2, June 1984 (available from ACM DL)
ICS-II - 2006 11Lecture 14: Geometric Algorithms and Data Structures
Region Quadtree
Successive subdivision of the image array into 4 equal-sized quadrants.
Basic idea: Figure as an image array, i.e. every pixel of the figure has a
value of 1, all other pixels have a value of 0 The entire area (image array) is subdivided into 4 equal-
sized quadrants (usually 2k dimensional) Upon each division one has to check if the image array of a
quadrant is homogeneous (i.e. only 1s or only 0s)□homogeneous no further subdivision□heterogeneous further subdivisions until homogeneous
(possibly single pixels)
ICS-II - 2006 12Lecture 14: Geometric Algorithms and Data Structures
Region Quadtree: Terminology
NW NE
SW SEE
N
W
S
ICS-II - 2006 13Lecture 14: Geometric Algorithms and Data Structures
Region Quadtree: Terminology
NWNE SW
SE
GREY
BLACK
WHITE
0 1
1 0
0 1
0
Leaf nodes are said to be either BLACK or WHITE Non-leaf nodes are said to be GREY
ICS-II - 2006 14Lecture 14: Geometric Algorithms and Data Structures
Region Quadtree: Example
Step1
1 1
1 1
1 1
1
1
1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1
1 1
1 1
1
1
1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
ICS-II - 2006 15Lecture 14: Geometric Algorithms and Data Structures
Region Quadtree : Example
1 1
1 1
1 1
1
1
1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
Step2
ICS-II - 2006 16Lecture 14: Geometric Algorithms and Data Structures
Region Quadtree : Example
1 1
1 1
1 1
1
1
1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
Step3
ICS-II - 2006 17Lecture 14: Geometric Algorithms and Data Structures
Region Quadtree: Set Operations
Quadtrees are especially useful for performing set operations Overlap (intersection) Overlays (union)
Example: From data provided on forests, grassland, fields, nature
reserve and polder, identify which areas are in agricultural use (typical overlay problem)
ICS-II - 2006 18Lecture 14: Geometric Algorithms and Data Structures
Overlays with Quadtrees: Example
ICS-II - 2006 19Lecture 14: Geometric Algorithms and Data Structures
Overlays with Quadtrees: Algorithm (1)
Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2
if the node in QT1 is BLACK, then the corresponding node in the resulting quadtree is also BLACK
if the node in QT1 is WHITE, then the node in the resulting quadtree is set to the node in QT2
if the node in QT1 is GREY, then set the node in the resulting quadtree to
GREY if QT2 is GREY GREY if QT2 is WHITE BLACK if QT2 is BLACK
if both nodes are gray, the algorithm returns after processing the next level to consolidate if necessary.
ICS-II - 2006 20Lecture 14: Geometric Algorithms and Data Structures
Overlays with Quadtrees: Algorithm (2)
BLACK x BLACK
WHITE x x
GREY GREY GREY1)
1) A check for a merger need to be performed to determine if all 4 sons are BLACK.
Decision Table:
Example:
ICS-II - 2006 21Lecture 14: Geometric Algorithms and Data Structures
Intersection with Quadtrees (Example)
ICS-II - 2006 22Lecture 14: Geometric Algorithms and Data Structures
Intersection with Quadtrees: Algorithm (1)
Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2
if the node in QT1 is BLACK and the node in QT2 is BLACK,then set the corresponding node in the resulting QT to BLACK
if the node in QT1 or QT2 is WHITE, then the resulting node is WHITE
if the node in QT1 is GREY, then set the node to
GREY if QT2 is also GREYWHITE if QT2 is WHITEGREY if QT2 is BLACK
if both nodes are grey, the algorithm returns after processing the next level to consolidate if necessary.
ICS-II - 2006 23Lecture 14: Geometric Algorithms and Data Structures
Intersection with Quadtrees: Algorithm (2)
WHITE x WHITE
BLACK x x
GREY GREY GREY1)
1) A check for a merger need to be performed to determine if all 4 sons are WHITE.
Decision Table:
Example:
ICS-II - 2006 24Lecture 14: Geometric Algorithms and Data Structures
Complexity Analysis
Complexity is proportional to the number of nodes in the quadtree best case: whole area unicolored (1 node) worst case: “Salt and Pepper”, i.e. all inner nodes are grey,
need to go down to pixel level (depends on the resolution)
ICS-II - 2006 25Lecture 14: Geometric Algorithms and Data Structures
Point-Quadtree: Definition
Point data 2-D points can be stored and indexed in a point-
quadtree A point-quadtree splits the space into 4 quadrants at the
insertion point The insertion order is thus important (it determines the
structure of the tree)
ICS-II - 2006 26Lecture 14: Geometric Algorithms and Data Structures
Point-Quadtree (Example)
(100,100)
(0,0) (100,0)
(0,100)
(35,40)Chicago
(5,45)Denver
(25,35)Omaha
(50,10)Mobile (90,5)
Miami
(85,15)Atlanta
(80,65)Buffalo
(60,75)Toronto
Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami
ICS-II - 2006 27Lecture 14: Geometric Algorithms and Data Structures
Point-Quadtree (Example)
Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami
Chicago
Mobile
Buffalo Atlanta Miami
(100,100)
(0,0) (100,0)
(0,100)
(35,40)Chicago
(5,45)Denver
(25,35)Omaha
(50,10)Mobile (90,5)
Miami
(85,15)Atlanta
(80,65)Buffalo
(60,75)Toronto
Denver Toronto Omaha
ICS-II - 2006 28Lecture 14: Geometric Algorithms and Data Structures
„find all points (records) within a given distance from another point (record)”
Point-Quadtree (Search Example)
Find all the cities, at most 8 units from the point (83,10)
Chicago
Mobile
Buffalo Atlanta Miami
(100,100)
(0,0) (100,0)
(0,100)
(35,40)Chicago
(5,45)Denver
(25,35)Omaha
(50,10)Mobile
(90,5)Miami
(85,15)Atlanta
(80,65)Buffalo
(60,75)Toronto
Denver Toronto Omaha
ICS-II - 2006 29Lecture 14: Geometric Algorithms and Data Structures
Point-Quadtree (Search Example)
The root is (35,40) NW, NE, SW can be ignored
Next is Mobile (50,10) NW and SW can be ignored
Are Atlanta or Miami within 8? Solutions based on
approximations with rectangles (bounding box), can contain negative reports
Exact solution with a circle
Find all the cities, at most 8 units from the point (83,10)
(100,100)
(0,0) (100,0)
(0,100)
(35,40)Chicago
(5,45)Denver
(25,35)Omaha
(50,10)Mobile
(90,5) Miami
(85,15)Atlanta
(80,65)Buffalo
(60,75)Toronto
ICS-II - 2006 30Lecture 14: Geometric Algorithms and Data Structures
Search in Point-Quadtrees
Especially suitable for search problems of the following type: “find all points (records) within a given distance from another point (record)”
Point Quadtrees are quite efficient for 2 dimensions. In k > 2 dimensions however, Point Quadtrees have a large branching factor and thus contain many NULL-pointers
Chicago
Mobile
Buffalo Atlanta Miami
Denver Toronto Omaha
ICS-II - 2006 31Lecture 14: Geometric Algorithms and Data Structures
K-d Trees
k-dimensional point data We want to avoid the large fan-out of point quadtree
Quadtrees (22=4-way split) Octrees (23=8-way split) In general: 2k-way split
A k-d tree is a binary search tree with the distinction that at each level, a different coordinate (dimension) is tested to determine the direction of the branch 2-way split Node consists of
□2 child pointers□Name□Key
ICS-II - 2006 32Lecture 14: Geometric Algorithms and Data Structures
K-d Tree: Basic Idea
Construct a binary Tree At each step, choose one of the coordinates as a basis of
dividing the rest of the points For example, at the root, choose x as the basis
□Like binary search trees, all items to the left of root will have the x-coordinate less than that of the root
□All items to the right of the root will have the x-coordinate greater than (or equal to) that of the root
Choose y as the basis for discrimination for the root’s children
Choose x again for the root’s grandchildren
ICS-II - 2006 33Lecture 14: Geometric Algorithms and Data Structures
K-d Tree: Example
Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami
(100,100)
(0,0) (100,0)
(0,100)
(35,40)Chicago
(5,45)Denver
(25,35)Omaha
(50,10)Mobile (90,5)
Miami
(85,15)Atlanta
(80,65)Buffalo
(60,75)Toronto
Fewer NULL pointers!
Denver
MiamiOmaha
K-d tree Alternation of discriminator
xToronto
yBuffalo
xAtlanta
xChicagox≥xchicago
x<xchicago
yMobile
y≥ymobiley<ymobile
ICS-II - 2006 34Lecture 14: Geometric Algorithms and Data Structures
Adaptive k-d Tree
Like k-d tree, but Division is between (not on) data points. Division not by alternating the discriminator, but according
to the dimension with the maximum spread (max-min). Balanced k-d Tree Internal nodes contain only split coordinates and their
value (e.g. X=30) The records are stored at the terminal nodes (leaves) Insertion of one record requires rebuilding the tree (
Static structure ) Deletion of one record is highly complex Search is like k-d tree
ICS-II - 2006 35Lecture 14: Geometric Algorithms and Data Structures
Exampleadaptive k-d tree
(k=2)
(100,100)
(0,0) (100,0)
(0,100)
(35,40)Chicago
(5,45)Denver
(25,35)Omaha
(50,10)Mobile (90,5)
Miami
(85,15)Atlanta
(80,65)Buffalo
(60,75)Toronto
55,x
30,x 40,y
15,x 25,y 10,y 70,x
Chicago(35,45)
Mobile(50,10)
Toronto(60,75)
Buffalo(80,65)
Denver(5,45)
Omaha(25,35)
Atlanta(85,15)
Miami(90,5)
ICS-II - 2006 36Lecture 14: Geometric Algorithms and Data Structures
Comparison
Region Quadtree parallelizable
Point Quadtree: parallelizable, dynamic
K-d Tree: Not easily parallelizable, dynamic, better sequential data
structure Adaptive k-d Tree:
Not easily parallelizable, static, balanced, optimized search
ICS-II - 2006 37Lecture 14: Geometric Algorithms and Data Structures
Curvilinear Data: Strip Tree (Example)
QP
B C
D E
Selected as splitting point for A, since Wl > Wr
Strip Tree:Splitting point for C
Wl
Wr
• Strips become successively thinner
• The splitting finishes when all strips are thinner than a predefined value
A
Root strip
Basic idea: Represent the curve by strips enclosing portions of it
ICS-II - 2006 38Lecture 14: Geometric Algorithms and Data Structures
Strip Tree: Algorithm
Recursive Splitting Join the endpoints of the curve (i.e. P and Q) The root corresponds to a rectangle enclosing the curve
and whose sides are parallel to line PQ The next split point
□Lies on the curve and on one side of the strip rectangle□Has maximum distance to line PQ
Node Structure The node is an 8-tuple and contains
□2 pairs of X,Y coordinates (the diagonal endpoints)□The strip width on each side of the line connecting the
endpoints□Pointers to the 2 sons
ICS-II - 2006 39Lecture 14: Geometric Algorithms and Data Structures
Representation of Arbitrary Curves
Curves are well represented by chains, however indexing them is difficult
A strip-tree is a quadtree variant for representing arbitrary curves by hierarchical decomposition
Useful in applications that involve search and set operations
ICS-II - 2006 40Lecture 14: Geometric Algorithms and Data Structures
Trees and Tries
We have seen (normal) trees for storing figures
We can also use Tries! Tries store the key “along the way”
ICS-II - 2006 41Lecture 14: Geometric Algorithms and Data Structures
Kd-Tries: Example
L R
UD
L R
L R
D UDU
L R
UD UD UD
L: leftR: right
D: DownU: Up
X dim
Y dim
• Key stored along the path from the root, Ex: “RDRU”• The complete keys are located at the leaves
RDRU
ICS-II - 2006 42Lecture 14: Geometric Algorithms and Data Structures
Binary Tries
0 1
10
0 1
0 1
0 101
0 1
10 10 10
A binary trie is a binary tree, whereby left sons correspond to a “0” at the corresponding position in the key, and right sons correspond to a “1”
100101
ICS-II - 2006 43Lecture 14: Geometric Algorithms and Data Structures
Geometric Interpretation of the Binary Trie
A trie compresses a 1-dimensional space with 2d addresses through coding to a string with d characters In previous example: d=3+3=6
The root represents the complete space Left son (first character = 0) represents the lower half of
the search space Right son (first character = 1) represents the upper half of
the search space.
ICS-II - 2006 44Lecture 14: Geometric Algorithms and Data Structures
Binary Tries, Revisited
0 1
10
0 1
0 1
0 101
0 1
10 10 10
X0X1X2
000 001 010 011 100 101 110 111
Y0Y1Y2
000
001
010
011
100
101
110
111
100101
Binary x coordinate of the
cell
Binary y coordinate of
the cell
In 2D each key is a pair of bit sequences (x,y)
The path to the key is composed of bits that are taken from the x and y coordinates on a rotating basis
ICS-II - 2006 45Lecture 14: Geometric Algorithms and Data Structures
Observations
Kd-trie splits by rotating x and y coordinates A kd-trie is unique for a given set of keys
Trie structure does not depend on the insertion order Geometric kd-tries generate a total order of the search
space Two points P1 and P2 in the kd-Space will always have the
same order
ICS-II - 2006 46Lecture 14: Geometric Algorithms and Data Structures
Building a Linear Order
Given a 2D grid how (1) to find a linear order for the cells of the grid such that
cells close together in space are also (as far as possible) close to each other in the linear order, and
(2) to define this order recursively for a grid that is obtained by a hierarchical subdivision of space.
The most popular solution is Bit interleaving (Z-Order)
ICS-II - 2006 47Lecture 14: Geometric Algorithms and Data Structures
Z-Order
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 1 1 1
1 1 1 1
1 1
1 1
1 1
1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
0 0 0 0
0 0 0 0
0 00 0
0 0 0 0
Y0Y1Y2
Start with a vertical split for X0 (Z=X0)
000
001
010
011
100
101
110
111
X0X1X2
000 001 010 011 100 101 110 111
• Addresses in a 2-dimensional space are identified by pairs (x,y) of values• Each x and y value is a sequence of d bits• This results in a grid with 2d x 2d cells• How to build the addresses using bit interleaving?
ICS-II - 2006 48Lecture 14: Geometric Algorithms and Data Structures
Z-Order
00 00 00 00
00 00 00 00
00 00 00 00
00 00 00 00
10 10 10 10
10 10 10 10
10 10
10 10
10 10
10 10
11 11 11 11
11 11 11 11
11 11 11 11
11 11 11 11
01 01 01 01
01 01 01 01
01 0101 01
01 01 01 01
Horizontal split for Y0 (Z=X0Y0)
X0X1X2
Y0Y1Y2
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
ICS-II - 2006 49Lecture 14: Geometric Algorithms and Data Structures
Z-Order
000 000 001 001
000 000 001 001
000 000 001 001
000 000 001 001
100 100 101 101
100 100 101 101
100 100
100 100
101 101
101 101
110 110 111 111
110 110 111 111
110 110 111 111
110 110 111 111
010 010 011 011
010 010 011 011
011 011010 010
010 010 011 011
Vertical split for X1 (Z=X0Y0X1)
X0X1X2
Y0Y1Y2
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
ICS-II - 2006 50Lecture 14: Geometric Algorithms and Data Structures
Z-Order
0000 0000 0010 0010
0000 0000 0010 0010
0001 0001 0011 0011
0001 0001 0011 0011
1000 1000 1010 1010
1000 1000 1010 1010
1001 1001
1001 1001
1011 1011
1011 1011
1100 1100 1110 1110
1100 1100 1110 1110
1101 1101 1111 1111
1101 1101 1111 1111
0100 0100 0110 0110
0100 0100 0110 0110
0111 01110101 0101
0101 0101 0111 0111
Horizontal split for Y1 (Z=X0Y0X1Y1)
X0X1X2
Y0Y1Y2
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
ICS-II - 2006 51Lecture 14: Geometric Algorithms and Data Structures
Z-Order
00000 00001 00100 00101
00000 00001 00100 00101
00010 00011 00110 00111
00010 00011 00110 00111
10000 10001 10100 10101
10000 10001 10100 10101
10010 10011
10010 10011
10110 10111
10110 10111
11000 11001 11100 11101
11000 11001 11100 11101
11010 11011 11110 11111
11010 11011 11110 11111
01000 01001 01100 01101
01000 01001 01100 01101
01110 0111101010 01011
01010 01011 01110 01111
Vertical split for X2 (Z=X0Y0X1Y1X2)
X0X1X2
Y0Y1Y2
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
ICS-II - 2006 52Lecture 14: Geometric Algorithms and Data Structures
Z-Order
000000 000010 001000 001010
000001 000011 001001 001011
000100 000110 001100 001110
000101 000111 001101 001111
100000 100010 101000 101010
100001 100011 101001 101011
100100 100110
100101 100111
101100 101110
101101 101111
110000 110010 111000 111010
110001 110011 111001 111011
110100 110110 111100 111110
110101 110111 111101 111111
010000 010010 011000 011010
010001 010011 011001 011011
011100 011110010100 010110
010101 010111 011101 011111
Horizontal split for Y2 (Z=X0Y0X1Y1X2Y2)
X0X1X2
Y0Y1Y2
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
Lowest z
z-low und z-hi are located in the left lower and right upper corner
highest z
ICS-II - 2006 53Lecture 14: Geometric Algorithms and Data Structures
Z-Order
000000 000010 001000 001010
000001 000011 001001 001011
000100 000110 001100 001110
000101 000111 001101 001111
100000 100010 101000 101010
100001 100011 101001 101011
100100 100110
100101 100111
101100 101110
101101 101111
110000 110010 111000 111010
110001 110011 111001 111011
110100 110110 111100 111110
110101 110111 111101 111111
010000 010010 011000 011010
010001 010011 011001 011011
011100 011110010100 010110
010101 010111 011101 011111
X0X1X2
Y0Y1Y2
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
If each possible z-value represents a cell in the grid, this yields the following space filling curve:
ICS-II - 2006 54Lecture 14: Geometric Algorithms and Data Structures
Example: Point Data
X0X1X2
Y0Y1Y2
000000 000010 001000 001010
000001 000011 001001 001011
000100 000110 001100 001110
000101 000111 001101 001111
100000 100010 101000 101010
100001 100011 101001 101011
100100 100110
100101 100111
101100 101110
101101 101111
110000 110010 111000 111010
110001 110011 111001 111011
110100 110110 111100 111110
110101 110111 111101 111111
010000 010010 011000 011010
010001 010011 011001 011011
011100 011110010100 010110
010101 010111 011101 011111
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
Data point: A = (3 , 5) = (011 ,
101)
Bit interleaving: z = 011011
This gives simple method for translating
between x,y coordinates and z-values
A
ICS-II - 2006 55Lecture 14: Geometric Algorithms and Data Structures
Example: Region Data
X0X1X2
Y0Y1Y2
000000 000010 001000 001010
000001 000011 001001 001011
000100 000110 001100 001110
000101 000111 001101 001111
100000 100010 101000 101010
100001 100011 101001 101011
100100 100110
100101 100111
101100 101110
101101 101111
110000 110010 111000 111010
110001 110011 111001 111011
110100 110110 111100 111110
110101 110111 111101 111111
010000 010010 011000 011010
010001 010011 011001 011011
011100 011110010100 010110
010101 010111 011101 011111
000 001 010 011 100 101 110 111
000
001
010
011
100
101
110
111
001 10
0111
The object with a z-value of 001 contains all
elements with a prefix equal to 001
ICS-II - 2006 56Lecture 14: Geometric Algorithms and Data Structures
Bit Interleaving: Recursive Definition
A vertical split differentiates values of X0
A horizontal split differentiates values of Y0
The address is given by the z-value (00,01,10,11) The z-value represents the path in the kd-trie We can use the z-values alone, s.t. we don’t need the
kd-trie anymore
01 11
00 10
Y0=1
Y0=0
X0=0 X0=1
L R
UDD U
00 01 10 11
1101
1100 1110
1111
ICS-II - 2006 57Lecture 14: Geometric Algorithms and Data Structures
Explanation
Z-order encoding preserves the spatial proximity of points homogeneous regions are represented compactly the elements are clustered => efficient access to secondary
storage Z-order coded data can be stored into secondary storage
using conventional prefix B+ trees efficient “range queries” are possible direct access via z-value
ICS-II - 2006 58Lecture 14: Geometric Algorithms and Data Structures
Geometric Data Structures fornon-Geometric Data?
Application of geometric data structures for geometric problems is obvious Geographic Information System (GIS) Computer graphic
A further application of geometric data structures: multidimensional databases OLAP (Online Analytical Processing) Data-mining
ICS-II - 2006 59Lecture 14: Geometric Algorithms and Data Structures
Multidimensional Data Space
CokeFantaBeerMilkJuiceWater
1 2 3 4 5 6 7
WestEast
SouthNorthReg
ion
Pro
duct
Day
Each cell corresponds to an observation point, described by the attributes of individual cells. Each cell contains an observation, e.g. the sales value of Product “Coke” on Day “4” in Region “East”.
ICS-II - 2006 60Lecture 14: Geometric Algorithms and Data Structures
Multidimensional (MD) Data Space
Each observed fact w can be expressed as a function of the dimensions, which define the multidimensional data space:
w = f(x,y,z)DOM(f) = DOM(x) x DOM(y) x DOM(z)
A fact w0 is the value of function f for the specific values (x0,y0,z0)
w0 = f(x0,y0,z0)
ICS-II - 2006 61Lecture 14: Geometric Algorithms and Data Structures
Sparseness in the MD Space
Typically, only a small fragment of the space defined by DOM(a) x … x DOM(z) is actually used
Addressing in the MD space (a multi-dimensional array) is easy and fast
However inefficient memory usage Need to find mechanisms to compress the MD space
Linearization of the data space by totally ordering the facts with the aid of space filling curves
Extraction of all facts into a table, then join this table with descriptive dimension tables
ICS-II - 2006 62Lecture 14: Geometric Algorithms and Data Structures
Linearization of the MD Space
Linearization with the aid of space filling curves (e.g. Z-Transforms or Hilbert construction)
The principle is based on a coding, that generates a total order of all points in the data space
The indexing is done by conventional, order preserving indexing methods (e.g. B+-Trees)
The mechanism is well suited for 2-4 dimensions (x,y,z,t) for tracking applications and range queries
ICS-II - 2006 63Lecture 14: Geometric Algorithms and Data Structures
Data-Mining
Till now: Storage und search of data Evaluation and interpretation of results is done using
Data-Mining
Typical problem:“Where, in supermarket, should we put the beer that should be sold as early as possible (close date expiry, low sales volume ..)”
ICS-II - 2006 64Lecture 14: Geometric Algorithms and Data Structures
Data-Mining
Overview of basic techniques for data-mining
VarianceDetection
Association ClusteringNumerical Prediction
Classification
Forecast, Prediction
Knowledge Discovery
Data Mining
ICS-II - 2006 65Lecture 14: Geometric Algorithms and Data Structures
Prediction: Classification
Data entries are classified according to a certain property
Purchased Lending Lending to sortyear Total last year out1994 1578 5 yes2000 3410 203 No1982 2558 310 yes... ... ... ...
New data entry is automatically assigned
Purchased Lending Lending to sortyear total last year out1988 589 39 ?
ICS-II - 2006 66Lecture 14: Geometric Algorithms and Data Structures
Prediction: Numerical Prediction
Numerical prediction is similar to classification, however, a value is predicted instead of a class.
Most important application: Weather forecast
Yesterday Today TomorrowTemp. Pressure Temp. Pressure Temp.17,0 990 19,2 1001 20,510,8 1011 12,1 973 8,230,5 1000 30,4 994 29,9... ... ... ... ...
14,2 980 17,0 991 ?
ICS-II - 2006 67Lecture 14: Geometric Algorithms and Data Structures
Knowledge Discovery: Association
Tries to find common rules between the characteristics of data. Interesting relations are returned.
Example: From the previous weather data one could derive the following rules:
With a probability of 0.89: IF "Air pressure today" > "Air pressure yesterday"AND "Temperature today" > 12°
THEN "Temperature tomorrow" > "Temperature today"
With a probability of 0.75:IF "Air pressure today" < "Air pressure yesterday"AND "Temperature today" > 15°
THEN "Temperature tomorrow" < "Temperature today"
ICS-II - 2006 68Lecture 14: Geometric Algorithms and Data Structures
Knowledge Discovery: Variance Detection
Given a data pool, variance detection tries to distinguish normal data entries from “Outlier” entries
Example:A home security system has 100 Sensors (temperature, light barrier, sound detector, ....) should detect intruders. Hereby, flying birds, shade in the moonlight or car headlight should not have any impact on the operation of the system.
The system gets a database describing “safe" configurations (where no alarm has to be triggered). The system creates a Model of the non-alarm-cases. Data for real intrusions are not provided!
Using this model, updates from sensors can be checked: If they do not fit in the non-alarm-cases, an alarm is triggered.
ICS-II - 2006 69Lecture 14: Geometric Algorithms and Data Structures
Knowledge Discovery: Clustering
Find similar data entries and group them into clusters
Example: Exam, the percentage that exercises E1 .. E5 were correctly answered?
Student E1 E2 E3 E4 E5 S1 20 84 11 17 74S2 62 41 57 81 19S3 79 33 60 68 30S4 19 93 25 23 87S5 28 89 0 26 79Ø 41,6 68 30,6 43 57,8
Clustering may divide the students taking the exam into 2 groups: G1 = {S1, S4, S5} : good at exercises E2 und E5, G2 = {S2, S3} : good at exercises E1, E3 und E4.
Possibility of individual support!
ICS-II - 2006 70Lecture 14: Geometric Algorithms and Data Structures
k-means Clustering: Example
F
HIJ
K
L
G
M
NOP
Q
Data1
Data2
ICS-II - 2006 71Lecture 14: Geometric Algorithms and Data Structures
k-means Clustering: Algorithm
1. Fix the number of desired clusters Parameter k.2. Place K random points into the space initial group
centroids.3. For all m data objects
determine the Euclidian distance of the object (as vector) from all centroids und assign the object to the closest centroid.
4. For all k centroidsdetermine the real center of the assigned cluster
(average). These are the new centroids.5. Repeat steps 3 and 4 , until the centroids no longer move
(Old and new ones are so close to each other, so that no real improvement is more remarkable).
ICS-II - 2006 72Lecture 14: Geometric Algorithms and Data Structures
k-means Algorithm: Properties
Finds a local optimum, but does not necessarily find the most optimal configuration (global optimum) Is a Heuristic
Significantly sensitive to the initial randomly selected cluster centers Optimizations Randomly modify the results between different rounds The k-means algorithm can be run multiple times
Operates with linear optimization Highly stable and frequently used approach Operates also for very large data sets with a controllable
complexity
Ian H. Witten, Eibe Frank “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” Academic Press, San Diego, CA; 2000; ISBN 1-55860-552-5