Tham khao do thi khoang cach

download Tham khao do thi khoang cach

of 22

description

Tham khao do thi khoang cach cua van ban

Transcript of Tham khao do thi khoang cach

  • th khong cch ca vn bn v mt s ng dngKTLabH Quang ThyPhng Th nghim Cng ngh Tri thc - KTLabTrng HCN, i hc Quc gia H Ni,Ngy 31/5/2014**

  • Ni dung th khong cch v ng dng o Google chun v ng dngTin hc x hi

    ***

  • th khong cch: gii thiu Charu C. Aggarwal, Peixiang Zhao (2013). Towards graphical models for text processing. Knowl. Inf. Syst. 36(1): 1-21. Charu C. AggarwalResearch Scientist, IBM T. J. Watson Research Center in Yorktown Heights, BSc. IIT Kanpur (1993). PhD. MIT (1996).Awards: IBM Corporate (2003), IBM Outstanding Innovation (2008), IBM Research Division (2008), IBM Outstanding Technical Achievement (2009).Associate editor of Journals : ACM TKDD, Data Mining and Knowledge Discovery, ACM SIGKDD Explorations, and the Knowledge and Information Systems.http://www.informatik.uni-trier.de/~ley/pers/hd/a/Aggarwal:Charu_C=: 60 bi tp ch, 135 bi hi ngh, 2 sch, Peixiang ZhaoAssistant Professor, Florida State Univ. at TallahasseeBsc (2001), MSc (2004), PhD (2007) HK, PhD (2012) UIUChttp://www.informatik.uni-trier.de/~ley/pers/hd/z/Zhao:Peixiang.html: 4 bi tp ch, 16 bi hi ngh*

  • th khong cch: nh nghaPht biu y khc i cht so vi bi boCho ng liu C = {ti liu min ng dng} v V ={t c ngha trong C}. V d, V = {t trong C} \ {t dng}Vi mt ti liu D: th khong cch bc k ca D trn C l th G(C, D, k) = (N(C), A(D, k)) vi N(C) l tp nh, A(D,k) l tp cungN(C) = {nt v: vV v v xut hin trong D}. vV: xut hin 01 ln N(C). Trong N(C): gi nt i hoc t i. D nhn c t D sau khi loi b mi t V, gi nguyn th t cc t. Tp cung A(D,k) cha cung (i ,j) c hng t nt i ti nt j nu nh t i i trc t j vi khong cch k t trong D. Cung (i, j) c trng s m nu c nhiu nht m ln t i xut hin trc t j vi khong cch k trong D.*

  • th khong cch: v d t bi boV = {t ting Anh} \ {t dng}D ly t bi ng dao Mary had a little lamb l Mary had a little lamb, little lamb, little lamb, Mary had a little lamb, its fleece was white as snow. D=Mary little lamb, little lamb, little lamb, Mary little lamb, fleece white snow. Cc th khong cch bc 0,1,2:Bc 0: cc t n t kt ni. Bc k+1: thm cung v thm trng s*

  • th khong cch: tnh chtTnh cht tha:f(D): s lng t c ngha trong D k c bin(D): s lng t phn bit trong D chnh l s nt ca th |N(C)|n(D)*(k+1) k*(k-1)/2 |A(D,k)| f(D)*(k+1)Chng minh trong bi bo. Tnh phng ca ti liu ch cha t phn bit th khong cch bc khng qu 2 tng ng vi cc ti liu ch cha cc t phn bit l cc th phng (planar). Tnh n iuD1 l on con ca D2 G(C, D1, k) l th con ca G(C, D2, k).Chng minh trong bi bo.Lu : Ngc li khng lun ng G(C, D1, k) th con G(C, D2, k) khng D1 l on con ca D2: phc tp cu trc nm bt t ca th khong cch! Cc k hu ch cho truy hi theo on text chnh xc: Truy hi thng tin da trn th: xc nh bao ng ca tp vn bn cn tm: hiu qu hn trnh din khng gian vector nh ch s theo t kha.*

  • th khong cch: tnh chtTnh bo tn on giaoD1, D2 c xu chung F G(C, D1, k) v G(C, D1, k) chia s th con G(C, F, k).Suy din trc tip t tnh n iu. Tm kim ti liu c on v mt ch Gi thit: Mt ch c c trng bi tp S gm m t kha lin thng xy dng clique_c hng_hai chiu cha cc nt (t) ny.clique_c hng_hai chiu: mi cp nt u tn ti cung hai hng ( th y ) v mt chu trnh n ni mi nh clique.Tn s kt hp giao theo cung ca clique vi th G(C, D, k) cho bit s ln cc t kha tng ng xut hin trong D hnh vi cc b ca ch . Tnh cht xut hin clique hai chiuCho F1 l clique hai chiu cha m nt v D l ti liu thuc C. Cho E l giao theo cung ca tp cc cung ca G(C, D, k) c cha trong F1. Gi q l tng cc tn s ca cc cung trong E th q chnh l s ln cc t kha trong cc nt tng ng vi F1 xut hin vi khong cch k trong ti liu.

    *

  • TKC: Xc nh ch khc nhau Xc nh cc on lin quan cc ch khc nhauS1, S2 : tp t kha tng ng vi cc ch khc nhau.F1, F2: hai clique tng ng vi S1 v S2Gi F12 l clique cha cc nt ca S1+S2Xt E1 (D), E2 (D), E12 (D) l giao theo cung ca G(C, D, k) vi F1, F2, F12. E12 (D) l bao ng cc cung ca E1 (D) E2 (D)Tnh cc b cc ch khi tn s cc cung trong E1(D), E2(D) ln nhng tn s cc cung E12(D)-(E1(D) E2(D)) l nh. Bi ton xc nh tnh cc b cc ch Tm cc ti liu D m tn s theo cung ca (E1(D) E2(D)) l ln hn s1 v tn s theo cung trong E12(D)-(E1(D) E2(D)) l nh hn s2.

    *

  • TKC: phng n v hngnh ngha th khong cch v hng bc k ca ti liu D theo C l th G(C, D, k) = (N(D), A(D, k)): N(D) nh trng hp c hngA(D,k) l tp cung tng t nh trng hp c hng song c tnh c hai chiu (v trc v v sau).V d, th khong cch v hng bc 2 ca ti liu trong v d trc: th KC v hng nhn c bng cch i cung c hng thnh v hng. th v hng gi thng tin khong cch v b qua thng tin th t .Cha cp ng dng th KC v hng song (i) d thi hnh thun li cho KPDL; (ii) *

  • th khong cch: ng dng KPDLHai phng n p dngk thut c vi thay biu din ti t bng biu din th khong cch: d dng thi hnh.Dng cho khai ph DL v qun l cu trc: tng tc d dng hn cc phng php khai ph cu trc phc tp tnh tonS th khong 4-5 ln so vi biu din sn cC th lm chm song khng qu nng n. *

  • th khong cch: cc ng dng KPDLPhn cmCc thut ton phn cm lp hoc phn cp.da trn ht ging.Thut ton EM. Phn lp.Phn lp Bayes th ngyPhn lp k-lng ging gn nht hoc phn lp trng tmPhn lp da trn lut. nh ch s v truy hientire structural fragmentsTm kim chnh xc: cpTm kim gn ng Tm kim th con thng xuynPht hin o vn (Plagiarism detection)GA, GB th khong cch hai ti liuMCG (GA, GB) l th con chung ln nht gia hai ti liu..*

  • th khong cch: Mt s bn lunKhong cchTnh sau khi loi b t dng ?L do ? Nn chng tnh khong cch gi nguyn t dng. p dng tm kim mu trong nht k s kinCc hnh ng l t kha.Xy dng th khong cchMu tun t: Phn cmMu c th t: Pht hin th con thng xuyn.p dng cho cc bi ton x l vn bnTm tt vn bn: Biu din cu, biu din vn bn theo th khong cch, tnh quan trng, tng t hai cu Thay nt c ch s bng ch ..p dng cho phn lp a nhn, a th hin vn bnBiu din vn bn qua th khong cchp dng tnh cht cc b ca ch *

  • p dng khai ph mu t nht k s kinHai thch thc ca KPQT

    C2. i ph vi nht k s kin phc tp vi c trng a dngC4. i ph vi sai lch khi nim~ i ph vi nht k s kin qu ln

    Mt s ti liu nghin cu

    [Aalst13] Wil M. P. van der Aalst (2013). A General Divide and Conquer Approach for Process Mining. FedCSIS 2013: 1-10.[BA12a] R. P. Jagadeesh Chandra Bose,Wil M. P. van der Aalst (2012).Process diagnostics using trace alignment: Opportunities, issues, and challenges. Inf. Syst. 37(2): 117-141.[BAZP11]c R. P. Jagadeesh Chandra Bose, Wil M.P. van der Aalst, Indre Zliobaite and Mykola Pechenizkiy (2011). Handling Concept Drift in Process Mining.CAiSE 2011: 391-405.[Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands.*[Manifesto12] Wil van der Aalst et al. (2012). Process Mining Manifesto, BPM 2011 Workshops (Part I, LNBIP 99), pp. 169194.

  • Khai ph mu: Tru tng ha s kin *Tru tng ha s kinAbstractions of Eventsd liu s kin ni ti vt quy trnh qu c th hoc/v c nhiu mc tru tngXu hnh ng c th hnh ng gn vi quy trnh hn

    [Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands

  • Khai ph mu: Phn cm vt*Phn cm vtTrace ClusteringCc vt c tnh tng ng

    [Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands

  • Khai ph mu: Tin ha quy trnh*Tin ha quy trnhConcept DriftQuy trnh thay i theo thi gianCc vng i qu trnh kinh doanh khc nhau

    [Bose12] R. P. Jagadeesh Chandra Bose (2012). Process Mining in the Large: Preprocessing, Discovery, and Diagnostics. PhD Thesis, Eindhoven University of Technology, The Netherlands

  • Tru tng ha qu trnh kinh doanh*

    [Smir11] Sergey Smirnov (2011). Business Process Model Abstraction. PhD Thesis, The University of Potsdam.

  • 2. Khong cch Google chun v ng dngCc ti liu lin quan Rudi Cilibrasi, Paul M. B. Vitnyi (2004). The Google Similarity Distance Automatic Meaning Discovery Using Google. CoRR abs/cs/0412098.Rudi Cilibrasi, Paul M. B. Vitnyi (2007). The Google Similarity Distance. IEEE Trans. Knowl. Data Eng. 19(3): 370-383. C 1036 citation trong Google Scholar.Paul M. B. Vitnyi (2012). Information Distance: New Developments. CoRR abs/1201.1221.Andrew R. Cohen, Paul M. B. Vitnyi (2013). Normalized Google Distance of Multisets with Applications. CoRR abs/1308.3177.Cc tc giPaul M. B. Vitnyi: DBLP c 76 bi tp ch, 69 bi hi ngh, 69 bi thng bo, http://www.informatik.uni-trier.de/~ley/pers/hd/v/Vit=aacute=nyi:Paul_M=_B=.htmlRudi Cilibrasi: 4 bi hi ngh, 6 bi hi ngh, 9 bi thng bo, . http://www.informatik.uni-trier.de/~ley/pers/hd/c/Cilibrasi:Rudi.html*

  • Khong cch Google chunLp luni tng nhn c theo ngha en cc t: t chc gene ACGT ca chut hoc vn bn ni dung ca truyn Chin tranh v Ha bnh ca Lev Tolxtoi. i tng nhn c theo tn gi ca n: cu to gene ACGT ca chut hoc vn bn CT&HB ca Lev Tolxtoi. i tng ch nhn bit bng tn nh home hoc red khi m ch ci cha ni iu g.S dng tri thc min o tng t gin tip. Thng gp, v d nh TAC: Hai thnh phn (Track) ca TAC 2014 (http://www.nist.gov/tac/) l Knowledge Base Population (KBP) v Biomedical Summarization (BiomedSumm). Khong cch thng tin chun Cho hai xu x v y: vi K(x), K(y), K(x,y) phc tp Kolmogorov, di bit ca CT tnh ngn nht sn ra xu x, y., xy.. E(x,y) thc s l mt khong cch: ba tnh cht*

  • Khong cch Google chunLp luni tng nhn c theo ngha en cc t: t chc gene ACGT ca chut hoc vn bn ni dung ca truyn Chin tranh v Ha bnh ca Lev Tolxtoi. i tng nhn c theo tn gi ca n: cu to gene ACGT ca chut hoc vn bn CT&HB ca Lev Tolxtoi. i tng ch nhn bit bng tn nh home hoc red khi m ch ci cha ni iu g.S dng tri thc min o tng t gin tip. Thng gp, v d nh TAC: Hai thnh phn (Track) ca TAC 2014 (http://www.nist.gov/tac/) l Knowledge Base Population (KBP) v Biomedical Summarization (BiomedSumm). Khong cch thng tin chun Khong cch thng tin hai xu x v y: vi K(x), K(y), K(x,y) phc tp Kolmogorov, di bit ca CT tnh ngn nht sn ra xu x, y., xy.. E(x,y) thc s l mt khong cch: ba tnh chtKhong cch thng tin chun: *

  • Khong cch Google chunKhong cch nn chunKhong cch thng tin chun l cha tnh ton c (uncomputable) . Dng chng trnh nn d liu c sn thay th K.Cho b nn C: C(x) l di nn ca xKhong cch nn chun

    Khong cch Google chun

    G(x), G(x,y) l m ha Google ca x v (x,y)x= {trang web cha xu x}; xy={trang web cha c 2 xu}

    M ha Google

    *

  • CM N*KT-SISLAB*