Riak Search - Erlang Factory London 2010

80
Erlang Factory· London· June 2010 Basho Technologies Rusty Klophaus - @rklophaus Riak Search A Full-Text Search and Indexing Engine based on Riak

description

Riak Search is a distributed data indexing and search platform built on top of Riak. The talk will introduce Riak Search, covering overall goals, architecture, and core functionality, with specific focus on how Erlang is used to manage and execute an ever-changing population of ad hoc query processes.

Transcript of Riak Search - Erlang Factory London 2010

Page 1: Riak Search - Erlang Factory London 2010

Erlang Factory· London· June 2010

Basho Technologies

Rusty Klophaus - @rklophaus

Riak SearchA Full-Text Search

and Indexing Engine

based on Riak

Page 2: Riak Search - Erlang Factory London 2010

Why did we build it?

What are the major goals?

How does it work?

2

Page 3: Riak Search - Erlang Factory London 2010

Part One

Why did we build

Riak Search?

3

Page 4: Riak Search - Erlang Factory London 2010

Riak is

a scalable, highly-available, networked,

open-source key/value store.

4

Page 5: Riak Search - Erlang Factory London 2010

Key/Value

CLIENT RIAK

5

Writing to a Key/Value Store

Page 6: Riak Search - Erlang Factory London 2010

Object

CLIENT RIAK

6

Writing to a Key/Value Store

Page 7: Riak Search - Erlang Factory London 2010

Key

Object

CLIENT RIAK

Querying a Key/Value Store

7

Page 8: Riak Search - Erlang Factory London 2010

Key + Instructions

Object(s)

CLIENT RIAK

Walk to Related

Keys

Querying Riak via LinkWalking

8

Page 9: Riak Search - Erlang Factory London 2010

Key(s) + JS Functions

Computed Value(s)

CLIENT RIAK

Map

Reduce

Map

Querying Riak via Map/Reduce

9

Page 10: Riak Search - Erlang Factory London 2010

Key/Value Stores

like

Key-Based Queries

10

Page 11: Riak Search - Erlang Factory London 2010

where Category == "Shoes"

CLIENT RIAK

WTF!? I'm aKV store!

Query by Secondary Index

11

Page 12: Riak Search - Erlang Factory London 2010

"Converse AND Shoes"

CLIENT RIAK

This is getting old.

Full-Text Query

12

Page 13: Riak Search - Erlang Factory London 2010

These kinds of queries

need an Index.

*Market Opportunity!*

13

Page 14: Riak Search - Erlang Factory London 2010

Part Two

What are the major

goals of Riak Search?

14

Page 15: Riak Search - Erlang Factory London 2010

Your Application

Riak

An application built on Riak.

15

Page 16: Riak Search - Erlang Factory London 2010

Your Application

RiakIndex

Object

Hrm... I need an index.

16

Page 17: Riak Search - Erlang Factory London 2010

Your Application

Riak???

Hrm... I need an index with more features.

17

Page 18: Riak Search - Erlang Factory London 2010

Your Application

RiakLucene

Lucene should do the trick...

18

Page 19: Riak Search - Erlang Factory London 2010

Your Application

Lucene Lucene Lucene Riak

...shard to add more storage capacity...

19

Page 20: Riak Search - Erlang Factory London 2010

Your Application

Lucene Lucene Lucene

Lucene Lucene Lucene

Lucene Lucene Lucene

Riak

...replicate to add more throughput.

20

Page 21: Riak Search - Erlang Factory London 2010

Your Application

Lucene Lucene Lucene

Lucene Lucene Lucene

Lucene Lucene Lucene

Riak

...replicate to add more throughput.

21

Operations nightmare!

Page 22: Riak Search - Erlang Factory London 2010

Your Application

Riak-ifiedLucene

Riak

What do we really want?

22

Page 23: Riak Search - Erlang Factory London 2010

Your Application

RiakSearch

Riak

What do we really want?

23

Page 24: Riak Search - Erlang Factory London 2010

Functionality? Be like Lucene (and more).

• Lucene Syntax

• Leverages Java Lucene Analyzers

• Solr Endpoints

• Integration via Riak Post-Commit Hook (Index)

• Integration via Riak Map/Reduce (Query)

• Near-Realtime

• Schema-less

24

Page 25: Riak Search - Erlang Factory London 2010

Operations? Be like Riak.

• No special nodes

• Add nodes, get more compute and storage

• Automatically load balance

• Replicas for durability and performance

• Index and query in parallel

• Swappable storage backends

25

Page 26: Riak Search - Erlang Factory London 2010

Part Three

How do we do it?

26

Page 27: Riak Search - Erlang Factory London 2010

A Gentle Introduction to

Document Indexing

27

Page 28: Riak Search - Erlang Factory London 2010

Every dog has his day.#1

day, 1

dog, 1

every, 1

has, 1

his, 1

Inverted IndexDocument

The Inverted Index

28

Page 29: Riak Search - Erlang Factory London 2010

The dog's bark is worse than his bite.

Every dog has his day.

Let the cat out of the bag.

It's raining cats and dogs.

#1

#2

#3

#4

Combined Inverted IndexDocuments

and, 4

bag, 3

bark, 2

bite, 2

cat, 3

cat, 4

day, 1

dog, 1

dog, 2

dog, 4

every, 1

has, 1

...

The Inverted Index

29

Page 30: Riak Search - Erlang Factory London 2010

"dog AND cat"

AND

dog cat

At Query Time...

30

Page 31: Riak Search - Erlang Factory London 2010

AND

dog cat

dog, 1

dog, 2

dog, 4

cat, 3

cat, 4

At Query Time...

31

Page 32: Riak Search - Erlang Factory London 2010

AND(Merge Intersection)

1

2

4

3

4

Result: 4

At Query Time...

32

Page 33: Riak Search - Erlang Factory London 2010

OR(Merge Union)

1

2

4

3

4

Result: 1, 2, 3, 4

At Query Time...

33

Page 34: Riak Search - Erlang Factory London 2010

Complex Behavior from Simple Structures

34

Page 35: Riak Search - Erlang Factory London 2010

Storage Approaches...

35

Page 36: Riak Search - Erlang Factory London 2010

Riak Search uses

Consistent Hashing

to store data on

Partitions

36

Page 37: Riak Search - Erlang Factory London 2010

Partitions = 10

Number of Nodes = 5

Partitions per Node = 2

Replicas (NVal) = 2

Introduction to Consistent Hashing and Partitions

37

Page 38: Riak Search - Erlang Factory London 2010

Object

Introduction to Consistent Hashing and Partitions

38

Page 39: Riak Search - Erlang Factory London 2010

Document Partitioning

vs.

Term Partitioning

39

Page 40: Riak Search - Erlang Factory London 2010

...and the

Resulting Tradeoffs

40

Page 41: Riak Search - Erlang Factory London 2010

Every dog has his day.#1

Document Partitioning @ Index Time

41

Page 42: Riak Search - Erlang Factory London 2010

"dog OR cat"

Document Partitioning @ Query Time

42

Page 43: Riak Search - Erlang Factory London 2010

Every dog has his day.#1

day, 1

dog, 1

every, 1

has, 1

his, 1

Term Partitioning @ Index Time

43

Page 44: Riak Search - Erlang Factory London 2010

day, 1 has, 1

every, 1his, 1

dog, 1

Term Partitioning @ Index Time

44

Page 45: Riak Search - Erlang Factory London 2010

"dog OR cat"

Term Partitioning @ Query Time

45

Page 46: Riak Search - Erlang Factory London 2010

Document Partitioning Term Partitioning

+ Lower Latency Queries

- Lower Throughput

- Lots of Disk Seeks

- Higher Latency Queries

+ Higher Throughput

- Hotspots in Ring (the "Obama" problem)

Tradeoffs...

46

Page 47: Riak Search - Erlang Factory London 2010

Riak Search: Term Partitioning

47

Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets.

Optimizations:

• Term splitting to reduce hot spots

• Bloom filters & caching to save query-time bandwidth

• Batching to save query-time & index-time bandwidth

Support for either approach eventually.

Page 48: Riak Search - Erlang Factory London 2010

Diving Deeper:

The Lifecycle of a Query

48

Page 49: Riak Search - Erlang Factory London 2010

Parse the Query

49

Page 50: Riak Search - Erlang Factory London 2010

meeting AND (face OR phone)

The Query

50

Page 51: Riak Search - Erlang Factory London 2010

[{land, [

{term,"meeting",[]},

{lor,[

{term,"face",[]},

{term,"phone",[]}

]}

]}]

The Query as an Erlang Term (Parse w/ Leex and Yecc)

51

Page 52: Riak Search - Erlang Factory London 2010

#land

#term

"meeting"

#lor

#term #term

"face""phone"

The Query as a Graph

52

Page 53: Riak Search - Erlang Factory London 2010

Plan the Query

53

Page 54: Riak Search - Erlang Factory London 2010

Catalog CatalogCatalog Catalog

Catalog CatalogCatalog Catalog

System Catalog

54

Page 55: Riak Search - Erlang Factory London 2010

Term TermID

Term Weight

&

File Offset

System Catalog

55

Page 56: Riak Search - Erlang Factory London 2010

#land

#term

"meeting"

#lor

#term #term

"face""phone"23 @ node B

17 @ node A 13 @ node C

Consult the System Catalog for Term/Node Weights

56

Page 57: Riak Search - Erlang Factory London 2010

#land

#term

"meeting"

#lor

#term #term

"face""phone"23 @ node B

17 @ node A 13 @ node C

Use Term Weights to Plan the Query

57

Page 58: Riak Search - Erlang Factory London 2010

#land

#term

"meeting"

#lor

#term #term

"face""phone"23 @ node B

17 @ node A 13 @ node C

Use Term Weights to Plan the Query

58

Page 59: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

23 @ node B

17 @ node A13 @ node C

Use Term Weights to Plan the Query

59

Page 60: Riak Search - Erlang Factory London 2010

[{node, {land, [ {node, {lor, [ {term,{"email","body","face"}, [ {node_weight,'[email protected]', 13} ]}, {term,{"email","body", "phone"}, [ {node_weight,'[email protected]', 17} ]} ]}, '[email protected]' }, {term, {"email","body","meeting"}, [ {node_weight,'[email protected]', 23} ]} ]}, '[email protected]'}]

The Node-Assigned Query as an Erlang Term

60

Page 61: Riak Search - Erlang Factory London 2010

Execute the Query

61

Page 62: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

62

Page 63: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

63

Page 64: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

64

Page 65: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

65

Page 66: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes

66

Page 67: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes & Stream the Results

67

Page 68: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes & Stream the Results

68

Page 69: Riak Search - Erlang Factory London 2010

#land

#term

"meeting" #lor

#term #term

"face""phone"

#node@A

#node@B

Spawn the Query Processes & Stream the Results

69

Page 70: Riak Search - Erlang Factory London 2010

#land

#term

'disconnect' #lor

#term #term

'disconnect''disconnect'

#node@A

#node@B

Terminate When Finished

70

Page 71: Riak Search - Erlang Factory London 2010

Message Format

71

Page 72: Riak Search - Erlang Factory London 2010

Message ::

{results, [Result]} |

{results, disconnect}

Result ::

{DocID, Properties}

DocID ::

term()

Properties ::

proplist()

The Message Format

72

Page 73: Riak Search - Erlang Factory London 2010

{results, [

{375, []},

{961, [{color, "red"}]},

{155, [{pos, [1,2,5]}]}

]}

The Message Format

73

Page 74: Riak Search - Erlang Factory London 2010

Yay for Erlang!

74

• Clean lines between load balancing and logic, single- and multi-node look the same

• Easy to create new operators, rapid development of experimental features

• Linked processes make cleanup a breeze

• Significant code reduction over early Java prototypes

Page 75: Riak Search - Erlang Factory London 2010

Part Four

Review

75

Page 76: Riak Search - Erlang Factory London 2010

"Converse AND Shoes"

CLIENT RIAK

WTF!? I'm a

KV store!

Riak Search turns this...

76

Page 77: Riak Search - Erlang Factory London 2010

"Converse AND Shoes"

CLIENT RIAK

Gladly!

...into this...

77

Page 78: Riak Search - Erlang Factory London 2010

"Converse AND Shoes"

CLIENT RIAK

Keys or Objects

...into this...

78

Page 79: Riak Search - Erlang Factory London 2010

Your Application

RiakSearch

Riak

...while keeping operations easy.

79

Page 80: Riak Search - Erlang Factory London 2010

Thanks! Questions?

Search Team:

John Muellerleile - @jrecursive

Rusty Klophaus - @rklophaus

Kevin Smith - @kevsmith

Currently working with a small set of Beta users.

Open-source release planned for Q3.

www.basho.com