Enabling Search in your Cassandra Application with DataStax Enterprise

21
Solutions Engineer @MarcSelwan Marc Selwan Enabling Search in your Cassandra Application with Datastax Enterprise 1

Transcript of Enabling Search in your Cassandra Application with DataStax Enterprise

Page 1: Enabling Search in your Cassandra Application with DataStax Enterprise

Solutions Engineer @MarcSelwan

Marc Selwan

Enabling Search in your Cassandra Application with Datastax Enterprise

1

Page 2: Enabling Search in your Cassandra Application with DataStax Enterprise

Why Search?

Confidential

Page 3: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential

Page 4: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential

The bright blue butterfly hangs on the breeze.

[the] [bright] [blue] [butterfly] [hangs] [on] [the] [breeze]

Terms

Page 5: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html

Page 6: Enabling Search in your Cassandra Application with DataStax Enterprise

What is Solr Missing?

Not a Database

Doesn’t Cluster

Not transparently

sharded

Requires ETL to injest

application data

Doesn’t Reindex

Page 7: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential

7

OLTP DB Search Cluster

Your ApplicationDB API Search API

YourETL

Transactional Workloads

Search Workloads

Open Source Search Reference Architecture

Page 8: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential

+ =

Page 9: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential

9

DSE Search Reference Architecture

Search+

Cassandra

80

10

3050

70

60

40

20

Your Application

CQLEasy CQL APIAll the goodness of DataStax driverDistributed, Replicated, Always OnData locality and shared memory• Automatic indexing on db insert• Higher ingestion throughput• Distributed query optimizationCompared to open source search• No separate search cluster to manage• Probably less total hardware required• No “Split Brain” data inconsistencies• No ETL or synch to build and maintain• No app level data management code

Page 10: Enabling Search in your Cassandra Application with DataStax Enterprise

Data stored in Cassandra

Indexes stored in Solr/Lucene

Page 11: Enabling Search in your Cassandra Application with DataStax Enterprise

Disk

Memory

Solr Cassandra

Page 12: Enabling Search in your Cassandra Application with DataStax Enterprise

Disk

MemoryMem-Table

IndexSegment

s

Ram Buffer

IndexSegment

s

IndexSegment

s

Mem-Table

Mem-table

IndexSegments

SSTables

Commit Log

Coordinator

IndexSegments

Shard Router

UPDATE videos (videoid, tags)SET tags = {‘cat tubes’, ‘Al Gore’s Internet’, ‘NoSQL Fairytales’}WHERE voided = b3a76c6b-7c7f-4af6-964f-803a9283c401

Page 13: Enabling Search in your Cassandra Application with DataStax Enterprise

OSS Solr

Disk

Memory

IndexSegment

s

Ram Buffer

IndexSegment

s

IndexSegment

s

IndexSegment

s

IndexSegment

s

Not Searchable

Searchable

Page 14: Enabling Search in your Cassandra Application with DataStax Enterprise

DSE Search

Disk

Memory

IndexSegment

s

Ram Buffer

IndexSegment

s

IndexSegment

s

IndexSegment

s

IndexSegment

s

Searchable

Page 15: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential

Let’s see this in action!

Page 16: Enabling Search in your Cassandra Application with DataStax Enterprise

Search in Retail

Page 17: Enabling Search in your Cassandra Application with DataStax Enterprise

Filter queries: These are awesome because the result set gets cached in memory.

SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "fq":"categories:Books", "sort":"title asc"}' limit 10;

Faceting: Get counts of fields

SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "facet":{"field":"categories"}}' limit 10;

Geospatial Searches: Supports box and radiusSELECT * FROM amazon.clicks WHERE solr_query='{"q":"asin:*", "fq":"+{!geofilt pt=\"37.7484,-122.4156\" sfield=location d=1}"}' limit 10;

Joins: Not your relational joins. These queries 'borrow' indexes from other tables to add filter logic. These are fast!

SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5;

Fun all in one.

SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "facet":{"field":"categories"}, "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5;

Page 18: Enabling Search in your Cassandra Application with DataStax Enterprise

How do you get started??

Page 19: Enabling Search in your Cassandra Application with DataStax Enterprise

Confidential

1) Spin up a new C* Cluster with search enabled using the DSE installer.$ sudo service dse cassandra -s

2) Run your schema DDL to create the C* keyspace and tables.

3) Run dse_tool on the videos table*$ dsetool create_core keyspace.table generateResources=true reindex=true

4) Write a CQL query with a Solr Search in it.

SELECT * FROM keyspace.tableWHERE solr_query=‘column:*’

*This will create lucene indexes on ALL the columns in your table.

Page 20: Enabling Search in your Cassandra Application with DataStax Enterprise

Behind the scenes…dse_tool

schema.xmlsolrconfig.xml

CQL Query$ dsetool create_core killrvideo.videos generateResources=true

<?xml version="1.0" encoding="UTF-8" standalone="no"?><schema name="autoSolrSchema" version="1.5"><types>…<fields><field indexed="true" multiValued="false" name="added_date" stored="true" type="TrieDateField"/><field indexed="true" multiValued="false" name="location" stored="true" type="TextField"/><field indexed="true" multiValued="false" name="preview_image_location" stored="true" type="TextField"/><field indexed="true" multiValued="false" name="name" termVectors="true" stored="true" type="TextField"/><field indexed="true" multiValued="true" name="tags" termVectors="true" stored="true" type="TextField"/><field indexed="true" multiValued="false" name="userid" stored="true" type="UUIDField"/><field indexed="true" multiValued="false" name="videoid" stored="true" type="UUIDField"/><field indexed="true" multiValued="false" name="location_type" stored="true" type="TrieIntField"/><field indexed="true" multiValued="false" name="description" termVectors="true" stored="true" type="TextField"/></fields><uniqueKey>videoid</uniqueKey></schema>

<!--======= Copyright DataStax, Inc. Please see the included license file for details.--><!-- For more details about configurations options that may appear in this file, see http://wiki.apache.org/solr/SolrConfigXml.--><config> <!-- In all configuration below, a prefix of "solr." for class names is an alias that causes solr to search appropriate packages, including org.apache.solr.(search|update|request|core|analysis) You may also specify a fully qualified Java classname if you have your own custom plugins. -->…

SELECT * FROM killrvideo.videos WHERE solr_query=‘name:*’

Page 21: Enabling Search in your Cassandra Application with DataStax Enterprise

Thank you!

25