Dr. David Dailey [email protected] Dr. Beverly Gocal [email protected] Dr. Deborah Whitfield...

15
Dr. David Dailey [email protected] Dr. Beverly Gocal [email protected] Dr. Deborah Whitfield [email protected]

Transcript of Dr. David Dailey [email protected] Dr. Beverly Gocal [email protected] Dr. Deborah Whitfield...

Page 1: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Dr. David [email protected]

Dr. Beverly [email protected]. Deborah Whitfield

[email protected]

Page 2: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Introduction Graph distance String Distance

◦ Definitions◦ Examples◦ Implementation◦ Theoretical Results◦ String Space Examples

Page 3: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Distance ◦ may be defined for any structure

Overlap of the substructures of two structures◦ Strings◦ Graphs◦ Algebraic structures◦ Semi-groups◦ Trees

Web site and web page similarity

Page 4: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Past 15 years◦ Over 20 papers on graph similarity◦ Several more on string similarity

Semi-Group Let T=(S, A) together with the

concatenation operation, where A consists of the set of axioms ◦ x, y S, xy S◦ x, y, z S, x(yz) = (xy)z

Page 5: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Graph: Let T=(S, A) together with a relation ~ where A consists of the set of axioms◦ x, y S, x ~ y y ~ x◦ x , (x ~ x)

String Let T=(S,A) together with an associative operation (expressed by concatenation). ◦ Then let Sn be defined recursively by

S1 = S and Sn = S x Sn-1 and S* be defined as the infinite union of ordered tuples:

S1 S2 …Sn

Page 6: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Levenshtein distance calculates minimum number of transformations

Largest shared substructure Smallest super structure All of these approaches are relative

Page 7: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Enumerate all substructures within T and U Union those two sets (T* U*) =Z |Z|-dimensional vector space z(T) be the number of occurrences of

structure z as a substructure of T Calculate Minkowski distance d(T,U)

Page 8: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.
Page 9: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Alphabet S = {a,b,c}, = abaac and = cbaac *= {a,b,c,ab, ba,aa,ac,aba,baa,aac, abaa, baac,

abaac} * = {a,b,c,cb,ba,aa,ac,cba, baa, aac,cbaa,

baac,cbaac} Z= { a, b, c, ab, cb, ba, aa, ac, cba, aba, baa, aac,

cbaa, abaa, baac, cbaac, abaac } (underlined elements are unique to and boldfaced are unique to *)

Equal frequency: I = {b, c, ba, aa, ac, baa, aac, baac}

Different frequency: D={a}, Unique: O= {ab, cb, cba ,aba, cbaa, abaa, cbaac,

abaac} |I| = 8 , |D| = 1, and |O| = 8

Page 10: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

|I| = 8 , |D| = 1, and |O| = 8 |I| +|D| +|O| = |Z| = 18 . Contribution of O is |O| Contribution of I is 0 - substrings appear

equally often Contribution of D, in this case will be 1. d(,) = contribution(I)+ contribution(D)+

contribution(O) = 9

Page 11: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

A= aabc B= abcd S= {a, a, aa, aab, aabc, ab, abc, b, bc, c} T= {a, ab, abc, abcd, b, bc, bcd, c, cd, d} Counts for S and T

◦ a:2 aa:1 aab:1 aabc:1 ab:1 abc:1 b:1 bc:1 c:1◦ a:1 ab:1 abc:1 abcd:1 b:1 bc:1 bcd:1 c:1 cd:1 d:1

Differences: a:1 aad:1 aab:1 aabc:1 ab:0 abc:0 abcd:1 b:0 bc:0 bcd:1 c:0 cd1:0 d:1

Distance (aabc, abcd) = 8

Page 12: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Too tedious by hand http://srufaculty.sru.edu/david.dailey/javasc

ript/StringDistances.html

Distance (aabc, abcd) = 8

Page 13: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Conjecture: if ||=||=n and and share no substrings in common (i.e., |I D|=0), then d() = n(n+1)

Conjecture: if ||=||=n and a and b share no substrings in common (i.e., |I D|=0), then d() = n(n+1)

Lemma: if =an then d()= n2 + n(n+1)/2

Conjecture: if ||=||=n , then d()=d()=d()=d()=

n2 + n(n+1)/2

Page 14: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Pretty pics

Page 15: Dr. David Dailey david.dailey@sru.edu Dr. Beverly Gocal beverly.gocal@sru.edu Dr. Deborah Whitfield deborah.whitfield@sru.edu.

Exhaustive substructure vector space Calculate distance Interesting observations used to study

structure similarity based on size