Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009...
-
Upload
giles-jones -
Category
Documents
-
view
219 -
download
3
Transcript of Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009...
Dimacs Graph Mining Dimacs Graph Mining (via Similarity Measures)(via Similarity Measures)
Ye Zhu StephanieREU-DIMACS, July 17, [email protected]
Mentor : James Abello
Talk OutlineTalk OutlineI. From Data to Graphs via Similarity
Measure II. Our Research Project Input: REU participants information DIMACS Workshop Data Output: A variety of Graphs
III. Main Questions a. Choose good similarity measures
b. Visualize and detect “interesting” patterns
Original data records for Original data records for building building REU-Participants graphsREU-Participants graphs
General MethodGeneral MethodStep 1: Compute a similarity measure
among the data records shown above.
Since a record can be viewed as a unweighted/weighted set of attributes we use unweighted/weighted version of an standard metric among finite sets that uses the size of the intersection over the size of the union between two sets
ComputationComputation
pet
fat
dog cat
jnm
jnm
nm,attj)),attj),W(w(W(w
,attj)),attj),W(w(W(w),wWJ(w
i)ttj with w(freq of a gw(attj)
) lw(wj,attjgw(attj)W
i)ttj with w(freq of a )lw(wi,attj
max
min
log
log
28.08.079.075.085.07.09.0
06.075.0000
General MethodGeneral MethodStep 1: Compute a similarity
measure among the data records shown above.
Step 2: Deal with different types of data records respectively.
Computing Edge WeightComputing Edge WeightTo deal with different types of
information, we partition the attributes into different classes according to their value types and compute a similarity measure for each class and then combine these values using a convex combination
Eg. Total Weight=0.3*Weighted Coeff+0.7*Unweighted
Coeff
REU participants exampleREU participants exampleHow to calculate the Edge How to calculate the Edge Weight? Weight?
Unweighted
Unweighted
Weighted
REU participants exampleREU participants exampleHow about the Vertices' How about the Vertices' Weight(ball size)Weight(ball size)
We can simply convert these 3 columns to three-digit numbers !!!
General MethodGeneral MethodStep 1: Compute a similarity
measure among the data records shown above.
Step 2: Deal with different types of data records respectively.
Step 3: Build weighted graph where each record is now treated as a vertex and two vertices are joined by an edge with weight equal to their computed similarity
General MethodGeneral MethodStep 1: Compute a similarity measure
among the data records shown above.Step 2: Deal with different types of
data records respectively. Step 3: Build weighted graph where
each record is now treated as a vertex and two vertices are joined by an edge with weight equal to their computed similarity
Step 4: Visualize the graph use GraphView Software and find interesting clusters
Workshop Abstract Workshop Abstract ExampleExampleRead in all workshop abstracts
fileDelete stop words ->
unimportant wordsGet a count of number of
appearances (freqency) of ALL words left in All workshop abstracts
Compute Jaccard Coefficient
ConclusionConclusionWe have shown how data set
records can be transformed into a weighted graph by using a similarity measure among records
This methodology allows us to use powerful graph clustering techniques to analyze and visualize data bases.
ReferencesReferences[1] GraphView system[2] C Gasperin, P Gamallo, A Agustini, G Lopes, V
Lima 2001- Using syntactic contexts for measuring word similarity
[3] Resnik, Philip (1999) Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research