Using Lucene for Search within XIS

18
XIS Lucene Indexing and Search

description

Allex Lyons, a programmer at Access Innovations, Inc., talks about the decision made by this company to apply a faster, more reliable and efficient Lucene index to XIS for searching docsets, instead of a random access file.

Transcript of Using Lucene for Search within XIS

Page 1: Using Lucene for Search within XIS

XIS Lucene Indexing and Search

Page 2: Using Lucene for Search within XIS

What is XIS? XIS is a XML schema-based database system used to

store user data All records are stored in individual XML files Option to zip XML files available with XIS Project DTD

Page 3: Using Lucene for Search within XIS

How XIS Data Is Stored Docsets

Stores records with multiple fields (similar to SQL Table) Can also have subfields and lists of field values nested within a

record Can look up values from other fields in other Docsets or other

tables Tables

Stores a single list of values Can be referenced by other Docsets Can be directly accessible for editing or kept hidden from user

view

Page 4: Using Lucene for Search within XIS

How to Create a XIS Project Create DTD file for XIS project

Specify MAI Thesaurus to link to project Create Docset and Tables Specify ID lengths for each Docset Create fields for Docsets

Save DTD to dhserver/projects/projects/xml folder Create XIS Project folder under dhserver/data Create subfolders for each Docset under XIS Project

folder as well as Tables directory XIS Projects can only be created by administrators

Page 5: Using Lucene for Search within XIS

Starting a XIS Project Start Data Harmony server where project is located Log in to Admin module

Start MAI Thesaurus Start XIS Project Index XIS Project, especially if just created

Run startXis program Enter server, port, thesaurus, username, and password

to log in

Page 6: Using Lucene for Search within XIS

Indexing a XIS Project

Page 7: Using Lucene for Search within XIS

XIS Login Screen

Page 8: Using Lucene for Search within XIS

XIS Project View

Page 9: Using Lucene for Search within XIS

XIS Docset View

Page 10: Using Lucene for Search within XIS

XIS Table View

Page 11: Using Lucene for Search within XIS

XIS Record Format Saved in XML file Starts with tag to represent Docset name along with ID

as attribute Fields are listed within Docset tag along with values.

Subfields are nested within their parent fields

Page 12: Using Lucene for Search within XIS

XIS Search View

Page 13: Using Lucene for Search within XIS

XIS Search Results

Page 14: Using Lucene for Search within XIS

Current XIS Indexing and Search Uses text-based indexes Creates large number of index files (one for each field) Generates temporary files for results Uses less reliable RandomAccessFile search Has limited amount of search operands Does not take into account numerical values

Page 15: Using Lucene for Search within XIS

Lucene vs. Current XIS Index Fewer index files needed Allows for broader searches

Fuzzy matching Start and end wildcard searches

Recognizes numerical and date fields as such Can be utilized to remove stopwords

Page 16: Using Lucene for Search within XIS

New Lucene Search Process Establish index reader to perform search Submit query string containing fields and parameters Return results

Page 17: Using Lucene for Search within XIS

Other Lucene Functions Will be used for adding, updating, and deleting XIS

records Indexes will be housed on Data Harmony server

Page 18: Using Lucene for Search within XIS

Any Questions?