Implementing Local Search with Apache Solr and Lucene
description
Transcript of Implementing Local Search with Apache Solr and Lucene
Implementing Local Search with Apache Solr and LuceneGrant Ingersoll
Lucid Imagination, Inc.
Topics
•Use Cases
•Concepts of Local Search
•Local Search support in Apache Solro Indexingo Filteringo Searchingo Facetingo Sorting
•Demo
Lucid Imagination, Inc.
Use Cases
•Asset Management
•Social Networkingo Find all friends near me
•Targeted, local search results and adso “restaurants in Austin Texas”o “Starbucks, 55313”
•Business Intelligenceo Restrict doc set for analysis by location
Lucid Imagination, Inc.
Spatial Search Concepts
•Spatial Data Typeso Points (latitude/longitude)o Lineso Shapes
•Maps and overlayso Streets, POI
•Integration with unstructured texto Metadata, descriptions, user reviews, etc.
http://www.openstreetmap.org/?lat=44.9744&lon=-93.2484&zoom=14&layers=B000FTFT
Lucid Imagination, Inc.
Application Needs
•Query Parsing
•Efficient distance calculationso Euclidean, Great Circle (Haversine), Vincenty’s
•Filteringo Bounding Box
•Sort by Distance
•Relevance Enhancement
•Faceting
•Advanced: shape intersections, routes
Lucid Imagination, Inc.
State of Solr Spatial
•Native Field Types for Latitude/Longitude as well as n-dimensional Point
•Native support for:o Filtering by distanceo Boosting by distanceo Sorting by distanceo Faceting by distance (sort of)
•Still needed:o Pseudo Fieldso Query Parser support for geocodingo Shapes
Lucid Imagination, Inc.
Configuration
•Schemao <fieldType name="point" class="solr.PointType" dimension="2"
subFieldSuffix="_d"/>o <fieldType name="location" class="solr.LatLonType"
subFieldSuffix="_coordinate"/>o <fieldtype name="geohash" class="solr.GeoHashField"/>
•Solrconfig:o None!
Lucid Imagination, Inc.
Indexing
•Just like always:
<doc>
<field name="id">6H500F0</field>
<field name="name">Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300</field>
…
<field name="store">45.17614,-93.87341</field>
</doc>
Lucid Imagination, Inc.
Distance Functions
•Most spatial operations (sorting, boosting, filtering, faceting) stem from the use of Solr’s built-in Function Query capabilityo http://wiki.apache.org/solr/FunctionQueryo dist(Power, pointA, pointB) – n-dimensional distance calculationo sqedist(pointA, pointB) – Squared Euclideano hsin, ghhsin – Haversine (great circle) distanceo geodist – Hides the details of other distance measures
•Most people should just use geodist(), but others may want more control
Filtering
• Accuracy matters!• geofilt – Radius based filter
o &q=*:*&fq={!geofilt pt=45.15,-93.85 sfield=store d=5}
o ...&q=*:*&fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5
• bbox – Bounding Box (less accurate)o &q=*:*&fq={!bbox}&sfield=st
ore&pt=45.15,-93.85&d=5
Lucid Imagination, Inc.
Boosting and Sorting
•Increase the score of a document based on the distance:o &q={!func}geodist()&sfield=store&pt=45.15,-93.85&sort=score a
sc
•Sort based on Distanceo &q=*:*&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&sort
=geodist() asc
Lucid Imagination, Inc.
Faceting
•Use the FRange Functionalityo Not ideal, but workso http://localhost:8983/solr/select?
&q=*:*&sfield=store&pt=45.15,-93.85&facet.query={!frange l=0 u=5}geodist()&facet.query={!frange l=5.001 u=3000}geodist()&facet=true
Lucid Imagination, Inc.
Resources
•http://wiki.apache.org/solr/SpatialSearch
•http://www.lucidimagination.com/search/?q=spatial
•https://www.ibm.com/developerworks/java/library/j-spatial/o Outdated, but covers the concepts
•@gsingers