Rapid prototyping with solr - By Erik Hatcher
-
Upload
lucenerevolution -
Category
Technology
-
view
1.235 -
download
3
description
Transcript of Rapid prototyping with solr - By Erik Hatcher
![Page 1: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/1.jpg)
Rapid Prototyping with Solr
Erik Hatcher, Lucid Imagination erik.hatcher @ lucidimagination.com, May 25, 2011
![Page 2: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/2.jpg)
Abstract § Got data? Let's make it searchable! This interactive
presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
3
![Page 3: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/3.jpg)
My Background § Erik Hatcher
• Lucid Imagination § Technical Staff
• Co-author § Java Development with Ant / Ant in Action (Manning) § Lucene in Action (Manning)
• Apache Software Foundation § Committer – Lucene / Solr § PMC – Lucene TLP § Member
4
![Page 4: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/4.jpg)
Why prototype? § Demonstrate Solr can handle your data and
searching needs; mitigate risk, learn the unknown
§ It’s quick and easy, with very little time investment
§ Immediate functional user interface impresses decision makers and target users; get buy-in • The user interface IS the app
5
![Page 5: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/5.jpg)
Prior Art § Hoss’ amazing ISFDB work
• http://www.lucidimagination.com/blog/tag/isfdb/ § Previous “Rapid Prototyping with Solr” presentations
• Data.gov Catalog on Solr: http://www.lucidimagination.com/blog/2010/11/05/data-gov-on-solr/
• Rich text files on Solr: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and-Videos/Rapid-Prototyping-Search-Applications-Solr
• CSV (conference attendee data) on Solr: http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681
6
![Page 6: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/6.jpg)
Rapid Prototyping using CSV § Fired up Solr’s example configuration § /update/csv
• http://localhost:8983/solr/update/csv? commit=true&stream.file=EuroCon2010.csv&fieldnames=first,last,company,title,country&header=true&f.country.map=Great+Britain:United+Kingdom
§ Tweak configuration • schema: domain-centric field names • solrconfig: /browse request handler • Template adjustments
§ Instant classic search results view, tree map visualization of facet data, and random selection of contest winners
7
![Page 7: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/7.jpg)
CSV results
8
![Page 8: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/8.jpg)
… using rich text files § curl "http://localhost:8983 /solr/update/extract?
stream.file=/docs/file.pdf &literal.id=/docs/file.pdf
9
![Page 9: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/9.jpg)
… using Data.Gov catalog data § /update/csv – again!
10
![Page 10: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/10.jpg)
Explaining
11
![Page 11: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/11.jpg)
Suggest
12
![Page 12: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/12.jpg)
Venn Viz
13
![Page 13: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/13.jpg)
E-commerce data § http://bbyopen.com/ § Product data, via easy HTTP JSON API
14
![Page 14: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/14.jpg)
Ingesting the data require 'solr’!#...!1.upto(max_pages) do |page|! puts "Processing page #{page}"! json = fetch_page(page)! ! response = JSON.parse(json, :symbolize_names=>true)! puts "Total products: #{response[:total]}" if page == 1!! mapping = {! :id => :sku,! :name_t => :name,! :thumbnail_s => :thumbnailImage,! :url_s => :url,! :type_s => :type,! :category_s => Proc.new {|prod| ! prod[:categoryPath].collect {|cat| cat[:name]}.join(' >> ')},! :department_s => :department,! :class_s => :class,! :subclass_s => :subclass,! :sale_price_f => :salePrice! }!! Solr::Indexer.new(response[:products], mapping, ! {:debug => debug, :buffer_docs => 500}).index!end!
15
![Page 15: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/15.jpg)
solr-ruby’s secret power § Solr::Indexer.new(
source, mapping, options ).index
§ “Quacks like a duck” § source simply #each’s § mapping simply #[]’s
16
![Page 16: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/16.jpg)
… on Prism
17
![Page 17: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/17.jpg)
What is Prism? § Yet another opinionated brainstorm from Erik § https://github.com/lucidimagination/Prism § Under the covers
• Ruby § because it’s beautiful
• Sinatra § to be lightweight and have elegant flexible routing
• Velocity § because it is easy to learn and use, and has powerful features, facilitates
edit/refresh work
§ Separate from Solr, Rack-savvy, allows easy coding of new routes and capabilities
§ Designed to work with any arbitrary Solr instance, and already has some basic LucidWorks Enterprise capability
§ Totally a proof-of-concept at this point – just a quick hack
18
![Page 18: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/18.jpg)
… on Solritas
19
![Page 19: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/19.jpg)
Solritas? § Pronounced: so-LAIR-uh-toss § Celeritas is a Latin word, translated as "swiftness" or
"speed". It is often given as the origin of the symbol c, the universal notation for the speed of light - http:// en.wikipedia.org/wiki/Celeritas
§ Technically it’s the VelocityResponseWriter (wt=velocity) • simply passes the Solr response through the Apache
Velocity templating engine § http://wiki.apache.org/solr/VelocityResponseWriter § Built into Solr, available instantly out of the box at:
http://localhost:8983/solr/browse
20
![Page 20: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/20.jpg)
… on Blacklight
21
![Page 21: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/21.jpg)
Blacklight? § http://projectblacklight.org/ § Blacklight is a free and open source Ruby on Rails based
discovery interface (a.k.a. “next-generation catalog”) especially optimized for heterogeneous collections. You can use it as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed.
§ Production sites: • http://search.lib.virginia.edu/ • http://searchworks.stanford.edu/
§ Features: • Authentication • Saved searches • Bookmarks – saved result items • Selected items – for exporting to 3rd party systems • Customizable / extensible UI
22
![Page 22: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/22.jpg)
Prototyping Tips and Tools § Get data into Solr in the simplest possible way
• CSV – if it fits, it’s really nice § Schema adjusting
• <dynamicField name="*" type="string" multiValued="true"/> • <copyField source="*" dest="text"/>
§ Data analysis • Understand what Solr is doing with your fields • Solr’s Schema Browser and /admin/luke request handler
§ UI • /browse – easy tweaking of <solr-home>/conf/velocity/*.vm
templates
23
![Page 23: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/23.jpg)
Now what? § Script the indexing process: full and
incremental/delta § Work with real users on real needs § Integrate into production systems § Iterate on schema enhancements and
configuration tweaks § Deploy to staging/production environments and
work at scale: collection size, real queries and volume, hardware and JVM settings
24
![Page 24: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/24.jpg)
Test § Performance § Scalability § Relevance § Automate all of the above, start baselines,
avoid regressions
25
![Page 25: Rapid prototyping with solr - By Erik Hatcher](https://reader034.fdocuments.net/reader034/viewer/2022051323/547b54c2b479595e098b4dbf/html5/thumbnails/25.jpg)
Thanks!
26