Solr
-
Upload
peter-svehla -
Category
Technology
-
view
354 -
download
0
description
Transcript of Solr
![Page 1: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/1.jpg)
Solr
![Page 2: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/2.jpg)
What is it?
• Text search index (engine)• Open source• Not a search product• A tool that allows you to create a search
solution
![Page 3: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/3.jpg)
What is it like?
• Google, Google Appliance.• FAST• Oracle Secure Enterprise Search• etc.
![Page 4: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/4.jpg)
Google Appliance:
• Sucks data in• Can’t really configure• Stuck with results• Bonnet is locked
![Page 5: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/5.jpg)
Solr:
• You need to feed data in• Highly configurable• Search results can be tuned• There is no bonnet
![Page 6: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/6.jpg)
Why am I doing a talk?
• Did a course• LucidWorks content• Presented by FindWise• FindWise are a search specialist that use a
range of search engines
![Page 7: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/7.jpg)
Caveats
• Course was in Solr 4.1.0, we use 3.6.1 for APVMA
• Course focussed on search, not ingestion or presentation
• Java API recommended for ingestion• ‘Browse’ interface uses Velocity templates for
presentation, but probably isn’t good enough for most projects.
![Page 8: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/8.jpg)
Where does Solr fit?
![Page 9: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/9.jpg)
Application Architecture
![Page 10: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/10.jpg)
Apache Tika
• Data import handler• Used to be part of Lucene• XML• PDF• Word• Excel• etc.
![Page 11: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/11.jpg)
Manifold CF
• Apache• Connector framework• Used to connect to content repositories (source)• Sharepoint• Documentum• CMIS• JDBC• RSS
![Page 12: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/12.jpg)
Hydra
• FindWise• Although Solr supports validation (e.g.
‘required’), don’t use it for data cleanup.• Validation failure inconvenient: whole job fails• Feed in clean data.• Use Hydra for cleanup.
![Page 13: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/13.jpg)
Apache ZooKeeper
• Used for SolrCloud• Clustering and sharding• Solr 4.1.0 only• Side project for Hadoop• Used to manage Hadoop clusters
![Page 14: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/14.jpg)
Inside
![Page 15: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/15.jpg)
General Approach
• Design schema• Prototyping• Integration
![Page 16: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/16.jpg)
Design Schema
• A data modelling exercise• schema.xml• Dynamic fields can be useful in the first pass:
<dynamicField name=“*" type="string" indexed="true" />
![Page 17: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/17.jpg)
Prototyping
• Get the data in (index)• csv, XML, JSON• post.jar• URL to search and inspect raw results• ‘browse’ interface allows developer to
understand how the search is working• solrconfig.xml
![Page 18: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/18.jpg)
Integration
• Not covered• Content ingestion• Presentation of results• Up to you…
![Page 19: Solr](https://reader036.fdocuments.net/reader036/viewer/2022070321/558cdeb0d8b42a015a8b4626/html5/thumbnails/19.jpg)
Demo