Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search...
-
Upload
brenda-casey -
Category
Documents
-
view
216 -
download
0
Transcript of Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search...
![Page 1: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/1.jpg)
Indexing and Retrieval Semantic Search
Fatemeh LashkariUNB University
May 7th 2014
![Page 2: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/2.jpg)
2
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
![Page 3: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/3.jpg)
3
Indexing Inverted Index
Sort-based inversion Single-pass in memory inversion
HYB Index Prefix search Autocompletion search Expansion query and faceted search Fast error tolerant search Support ‘’select’’ and ‘’join’’ in database-style
![Page 4: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/4.jpg)
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
4
![Page 5: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/5.jpg)
5http://broccoli.cs.uni-freiburg.de/demos/BroccoliFreebase/
Semantic Search Query: “astronauts walk on moon”
![Page 6: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/6.jpg)
6
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
![Page 7: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/7.jpg)
7
Semantic Search Architecture
Indexing Query Process
Answers of the question
OntologyText Collection
![Page 8: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/8.jpg)
8
Indexing Semantic Search Semantic Search Architecture Index process
Parsing Index Maintenance
Outline
![Page 9: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/9.jpg)
9
Parsing Preprocessing
Stemming Lower case General Motors general motors Remove some of stop words
• e.g is, do, a, of, ..
Annotation text Annotators Machine learning approaches
![Page 10: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/10.jpg)
10
Indexing Semantic Search Semantic Search Architecture Index process
Parsing Index Structure
Index Maintenance
Outline
![Page 11: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/11.jpg)
11
Index Structure The fast and efficient index does not
need the whole vocabulary of the indexed collection in main memory
need to sort postings need merge postings
• cache efficiently
![Page 12: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/12.jpg)
12
Indexing Semantic Search Semantic Search Architecture Index Process
Parsing Index Structure Building Index
Index Maintenance
Outline
![Page 13: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/13.jpg)
13
Building Index (Tasks to Decide)
How many index do we need?• Index for relation• Index for text
What is the structure of vocabulary?
What is the structure of posting?
What are statistic information that a posting contains? e.g <docId, position, score, entity>
apple: <6, 10, 0.3, class: fruit> <4, 2,0.9, class: company>
![Page 14: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/14.jpg)
14
Building Index (Tasks to Decide)
How to compute score to improve the final result?
How to save index?• Distribute index• Process query parallel
Which methods of compression can be used?
![Page 15: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/15.jpg)
15
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
![Page 16: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/16.jpg)
16
Index Maintenance Strategies for maintaining index:
Merge-based (remerge) In-place Hybrid index update operation Geometric partitioning
![Page 17: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/17.jpg)
17
Thank You
![Page 18: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/18.jpg)
18
Reference1] Bast, Hannah, and Marjan Celikik. "Fast construction of the HYB index." ACM
Transactions on Information Systems (TOIS) 29.3 (2011): 16.
2] Bast, Holger, and Ingmar Weber. "Type less, find more: fast autocompletion search with a succinct index." Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006
[3]Celikik, Marjan, and Hannah Bast. "Fast single-pass construction of a half-inverted index." String Processing and Information Retrieval. Springer Berlin Heidelberg, 2009.
[4] Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003)
[5]Celikik, Marjan, and Holger Bast. "Fast error-tolerant search on very large texts." Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 2009.
[6] Bast, Holger, Debapriyo Majumdar, and Ingmar Weber. "Efficient interactive query expansion with complete search." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007.
![Page 19: Fatemeh Lashkari UNB University May 7 th 2014. 2 Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance.](https://reader036.fdocuments.net/reader036/viewer/2022082517/56649ddf5503460f94ad909f/html5/thumbnails/19.jpg)
19
Reference[7] Bast, Hannah, et al. "A case for semantic full-text search." Proceedings of the 1st Joint
International Workshop on Entity-Oriented and Semantic Search. ACM, 2012.
[8] Bast, Holger, et al. "ESTER: efficient search on text, entities, and relations." Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007.
[9]Bast, Holger, Fabian Suchanek, and Ingmar Weber. "Semantic Full-Text Search with ESTER: Scalable, Easy, Fast." Data Mining Workshops, 2008. ICDMW'08. IEEE International Conference on. IEEE, 2008.
[10] Bast, Hannah, et al. "Broccoli: Semantic full-text search at your fingertips." arXiv preprint arXiv:1207.2615 (2012).
[11] Bast, Hannah, and Elmar Haussmann. "Open information extraction via contextual sentence decomposition." Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 2013.
[12] Cheng, Tao, and Kevin Chen-Chuan Chang. "Beyond pages: supporting efficient, scalable entity search with dual-inversion index." Proceedings of the 13th International Conference on Extending Database Technology. ACM, 2010.