Exploring Spatial Datasets Using Discriminative Pattern ... · A won by Obama, IN footprint of...
Transcript of Exploring Spatial Datasets Using Discriminative Pattern ... · A won by Obama, IN footprint of...
Exploring Spatial Datasets Using Discriminative Pattern Mining and Pattern Similarity MeasureLunar and Planetary Institute ([email protected])
Tomasz F. Stepinski Wei DingDept. of Computer Science, Univ. of Massachusetts Boston ([email protected])
MotivationComplex multi-attributed spa-tial datasets hide knowledge that needs to be discovered by exploring their structure. We propose association analysis-based strategy for exploration spatial datasets posessing prior binary classi-fication.
Input data :>
Lunar and Planetary Institute ([email protected])
Josue Salazar
Example: Analysis of 2008 presidential election Innovation
mining for discriminative
patterns
class 2
class 1
class1
multi-attribute spatial datasetwith prior binary classification
Each spatial element is a transaction containing values of exploratory attributes
cluster 1
clust
er 2
cluste
r 3
aggomerative clusteringof patterns
Segmentation of class 1into clusters of similar patterns of exloratoryattributes
Algorithm
11
11
1
1 11 1
1 11
22
22
22
2 2
11 1
11
1
2
22
2
2 2
footprint of pattern Y(2 objects)
footprint of pattern X(4 objects)
12 __A B C D
pattern Y
attributes
1 2 __A B C D
pattern X
attributes
S (X, Y ) = 4i=1 wiS i (X i , Y i )Σ
11
11
1 2
22
attribute A
SA(XA , YA) = s(xA, yA)
11
12
2
11
1attribute C
S (− , YC) =2
k=1PX (x k )s(xk , yC)ΣC
22
22
21
2
2attribute B
S ( , XB ) =2
k=1P y (yk )s(yk , XC )ΣB −
11
11
2
11
2attribute D
S (− , − ) =2
l=1
2
k=1PX (x l )PY (yk )s(x l , yk )Σ ΣD
Pattern similarity
z , z , ..., z are ordinal values such that z = x + 1 and z = y - 1.i
1 2
1
k
k i
2008 election results + 13 socio-economic indicatorsfrom the US Census Bureau for 3108 counties.
Example 1 :>McCain voting block (red) and Obama voting block (blue) that are dissimilar in socio-economic sense and geographically apart.
Example 2 :>McCain voting block (red) and Obama voting block (green) that are dissimilar in socio-economic sense but geographically collocated.
Visual analytics :>Discriminative patterns are calculated for four groups (A, B, E, and F) of counties.
In each group patterns are ordered using ag-glomerative clustering.
Clustering heat map is a distance matrix with rows ordered according to clustering.
s(x i , y i ) =2 × log P (x i z1 z2 . . . zk yi )
log P (x i ) + log P (yi )
A
BC
D
E
FG
H
3 - 12
13 - 20
21 - 27
28 - 37
38 - 58
59 - 100
1 - 2
3 -4
5 - 6
7 - 8
9 - 10
11 - 13
0 - 0.25
0.25 - 0.5
0.5 - 1
1 - 2
2 - 3
3 - 4
4 - 13
0 - 0.05
0.05 - 0.18
0.18 - 0.32
0.32 - 0.46
0.46 - 0.62
0.62 - 0.82
0.82 - 1
pattern size patter length
pattern sizepattern length
patternoverlap
patterndissimilarity
} }
pattern set A (Obama) pattern set E (McCain)B F
} }
pattern set A (Obama) pattern set E (McCain)B F
pop. dens.
urban pop. %
female pop. %
fore
ign born %
per capita
income
household income
HS edu.
bachelor edu.
white pop. %
poverty %
owned house %
soc . sec. re
cipent %
soc. sec. in
come
lowest (1)
low (2)average (3)
high (4)
highest (5)
Obama block 1 (1- 872)
Obama block 2 (928 -3364)
Voted for Obama but not in disciminate patternssupport (3365 - 3610)
McCain block (3611- 6680)
Voted for McCain but not in disciminate patternssupport (6681 - 6970)
no value ( _ )
socio-economic indicators
E
A won by Obama, IN footprint of Obamaand NOT in footprint of McCain
153,611,411 67,040,847 62.14
won by Obama, NOT in footprint of Obamaand NOT in footprint of McCain
B
495
361 16,696,346 9,568,427 56.24
C won by Obama, NOT in footprint of Obamabut IN in footprint of McCain
9 199,478 88,945 51.07
D won by Obama, IN footprint of Obamaand IN footprint of McCain
1 210,554 61,494 52.90
won by McCain, IN footprint of McCainand NOT in footprint of Obama
1688 51,289,510 23,224,203 62,11
F won by McCain, NOT in footprint of McCainand NOT in footprint of Obama
472 31,269,880 15,772,301 59.01
G won by McCain, NOT in footprint of McCainbut IN footprint of Obama
62 23,518,016 8,941,422 55.91
H won by McCain, IN footprint of McCainand IN footprint of Obama
20 2,255,368 1,024,861 60.83
set description # of counties population # voted winning %
won by Obama
won by McCain