Vagif Jalilov Rivet Logic
description
Transcript of Vagif Jalilov Rivet Logic
![Page 1: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/1.jpg)
Integrating Apache Solr with Alfresco WCM for Faceted Search and Navigation of Next-Generation Web Sites
Vagif JalilovRivet Logic
![Page 2: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/2.jpg)
About Rivet Logic• Award-winning professional services focused on:
– Enterprise Content Management– Web Content Management– Collaboration and Social Communities
• Using Leading Open Source Software
![Page 3: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/3.jpg)
Business Case for Alfresco & Solr• Large scale sites• Need for real-time updates• Full-text search• Faceted search
![Page 4: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/4.jpg)
Technical Challenges for Search• Accurately index each page
– Solution: Assembly of relevant content to index• Targeted, real-time indexing
– Solution: Trigger indexing from publishing mechanism
![Page 5: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/5.jpg)
Possible Index Solutions• Spidering/Crawling
– Follow navigational & cross-links– Parse HTML and fetch relevant content– Spider full (or partial) site each time
• Real-time Indexing– Triggered by FSR deployment– Process only change-set (incremental updates)– Assemble relevant page content
![Page 6: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/6.jpg)
Source Control• Source code & libs• View templates• Site navigation• Web content
CMS (Alfresco)• Binary Content
Typical Web Application
![Page 7: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/7.jpg)
Source Control• Source code & libs• (View templates)
CMS (Alfresco)• Binary Content• Web Content• Site Navigation• (View templates)
“Managed” (Riveted) Web Application
![Page 8: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/8.jpg)
Page Composition
Section-html.xml
Related-links.xml
Supporting-items.xml
Meta-content.xml
Page-metadata.xml
dynamic
dynamic
![Page 9: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/9.jpg)
Content Delivery
(http://crafterrivet.org)
![Page 10: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/10.jpg)
Alfresco WCM Lifecycle
![Page 11: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/11.jpg)
Indexing Architecture
![Page 12: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/12.jpg)
Solr Customizations• Custom Solr
– Schema.xml• Fields (Type, Indexed/Stored)• Unique key
– Solrconfig.xml• “dismax” type request handler to define queried fields• ExtractingRequestHandler (indexing RT docs)
![Page 13: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/13.jpg)
Custom Solr Schema <field name="page_url" type="string" indexed="true" stored="true"
required="true"/> <field name="page_title" type="text" indexed="true" stored="true"/> <field name="page_category" type="string" indexed="true"
stored="true"/> <field name="page_type" type="string" indexed="true"
stored="true"/> <field name="page_last_modified" type="date" indexed="true"
stored="true"/> <field name="page_text" type="text" indexed="true" stored="true"/> <field name="page_file_size" type="int" indexed="false"
stored="true"/> </fields>
<uniqueKey>page_url</uniqueKey>
![Page 14: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/14.jpg)
ExtractingRequestHandler <!-- Solr Cell: http://wiki.apache.org/solr/ExtractingRequestHandler --> <requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler" startup="lazy">
<lst name="defaults"> <str name="fmap.content">page_text</str> <str name="fmap.title">page_title</str> <str name="uprefix">ignored_</str> </lst> </requestHandler>
<dynamicField name="ignored_*" type="ignored"/>
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");up.addFile(new File(filePath));SolrServer solrServer = new CommonsHttpSolrServer(solrServerUrl);solrServer.request(up);solrServer.commit();
![Page 15: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/15.jpg)
Custom RequestHandler <!-- DisMaxRequestHandler allows easy searching across multiple
fields for simple user-entered phrases. It's implementation is now just the standard SearchHandler with a default query type of "dismax". see http://wiki.apache.org/solr/DisMaxRequestHandler --> <requestHandler name=”solrDemoDismax" class="solr.SearchHandler" > <lst name="defaults"> <str name="defType">dismax</str> <str name="qf"> page_title^5.0 page_text^1.0 </str> </lst> </requestHandler>
![Page 16: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/16.jpg)
Compilation• Compiler Engine processes all instructions• Dispatches to appropriate Page Type Compiler
![Page 17: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/17.jpg)
Content Deployment & Solr Update
![Page 18: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/18.jpg)
Compiler Instructions<updates deploy-root=”/path/to/content/root"> ...
<update>/solutions/security/article.xml</update><delete>/products/widget/top-section.xml</delete>...
</updates>
![Page 19: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/19.jpg)
Compilation Types1. Web Pages (HTML)2. Rich Text (PDF)
![Page 20: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/20.jpg)
Web Page Compilation & Indexing
Indexer Instructions
![Page 21: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/21.jpg)
HTML Indexer Instruction<?xml version="1.0" encoding="ISO-8859-1"?><add> <doc> <field name="page_url">/solutions/content-mgmt/overview.html</field> <field name="page_title">Increase productivity and streamline workflow
throughout the enterprise</field> <field name="page_description">Commercial enterprises and government agencies
face significant challenges as they strive to meet a rapidly growing need to manage thousands ...</field>
<field name="page_category”>Solutions</field> <field name="page_type">Web Page</field> <field name="page_last_modified">2009-12-18T15:03:57Z</field> <field name="page_text">Rivet Logic addresses many of today's workplace
challenges with Enterprise Content Management (ECM) solutions that enable organizations to transform traditional content repositories and static intranets into dynamic, collaborative work environments through open source functionality. Through ...</field>
</doc> </add>
![Page 22: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/22.jpg)
Rich Text Compilation & Indexing
![Page 23: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/23.jpg)
Rich Text Indexer Instruction<?xml version="1.0" encoding="ISO-8859-1"?><add> <doc> <field
name=”page_file">/docroot/static/about-us/press-releases/2010/rl_crafter_studio.pdf</field>
<field name=”page_url”>/about-us/press-releases/2010/rl_crafter_studio.pdf</field>
<field name="page_title”>Rivet Logic launches Crafter Studio for user friendly Web content authoring and publishing.</field>
<field name="page_category">News</field> <field name="page_type">Press Release</field> <field name="page_last_modified">2007-12-19T08:00:00Z</field> <field name="page_file_size”>135</field> </doc></add>
![Page 24: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/24.jpg)
Compiler Configuration
![Page 25: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/25.jpg)
Compiler Configuration<compiler-config>
<page-types><page-type
name="Solution Page”compiler="com.rivetlogic.index.compile.ArticleCompiler"><uri-pattern pattern=".*/page-content/solutions/.*(article|
page-metadata|meta-content).xml$" /><properties>
<property field=“page_type” value=“Web Page”/><property field=“page_category”
value=“Solutions”/></properties>
</page-type><page-type
name="Press Release Page”
compiler="com.paetec.index.model.compile.PressReleaseCompiler"><uri-pattern pattern=".*/press-releases/.*/(press-release|
meta-content).xml$" /><properties>
<property field=“page_type” value=“Press Release”/>
<property field=“page_category” value=“News”/></properties>
</page-type><page-types>
<compiler-config>
![Page 26: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/26.jpg)
Search UI• Full text search• Faceted search on category & type• Pagination or search result clustering• Keyword highlighting in search results• Track user queries
![Page 27: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/27.jpg)
Search Results Page
![Page 28: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/28.jpg)
Clustered Results
![Page 29: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/29.jpg)
Summary• Requirements:
– Real time updates– Full editorial control– Faceted search
• Solution– Alfresco CMS– Alfresco plugin for Solr indexing– Compile updates & index– Serve in UI (ft search + facets)
![Page 30: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/30.jpg)
Q & A• Thank you for attending :-)• Questions, comments…
![Page 31: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/31.jpg)
Appendix
![Page 32: Vagif Jalilov Rivet Logic](https://reader035.fdocuments.net/reader035/viewer/2022081421/56816507550346895dd779ee/html5/thumbnails/32.jpg)
Search Model/API