Introducing the App Engine datastore

Post on 01-Nov-2014

3.577 views 1 download

Tags:

description

Describes the App Engine datastore. Explains how indexing and queries work.

Transcript of Introducing the App Engine datastore

Thursday, May 26, 2011

2

Hands on with the App Engine Datastore

Ikai LanMay 9th, 2011

Thursday, May 26, 2011

About the speaker

• Ikai Lan - Developer Programs Engineer, Developer Relations• Twitter: @ikai• Google Profile: http://profiles.google.com/ikai.lan

3

Thursday, May 26, 2011

Goals of this talk

• Understand a bit of how the datastore works underneath the hood

• Have a conceptual background for the persistence codelab

5

Thursday, May 26, 2011

Understanding the datastore

• The underlying Bigtable• Indexing and queries• Complex queries• Entity groups• Underlying infrastructure

6

Thursday, May 26, 2011

Datastore layers

7

Complex queries

Entity Group Transactions

Queries on properties

Key range scan

Get and set by key

Datastore ✓ ✓ ✓ ✓ ✓Megastore ✓ ✓ ✓ ✓Bigtable ✓ ✓

Thursday, May 26, 2011

Datastore layers

8

Complex queries

Entity Group Transactions

Queries on properties

Get and set by key, key range scans

Datastore ✓ ✓ ✓ ✓Megastore ✓ ✓ ✓Bigtable ✓

Complex queries

Entity Group Transactions

Queries on properties

Key range scan

Get and set by key

Datastore ✓ ✓ ✓ ✓ ✓Megastore ✓ ✓ ✓ ✓Bigtable ✓ ✓

Thursday, May 26, 2011

What does a Bigtable row look like?

9

Source: http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/bigtable-osdi06.pdf

Thursday, May 26, 2011

Bigtable API

• “Give me the column ‘name’ at key 123”• “Set the column ‘name’ at key 123 to ‘ikai’”• “Give me all columns where the key is greater than 100 and less

than 200”

10

Thursday, May 26, 2011

Datastore layers

11

Complex queries

Entity Group Transactions

Queries on properties

Get and set by key, key range scans

Datastore ✓ ✓ ✓ ✓Megastore ✓ ✓ ✓Bigtable ✓

Complex queries

Entity Group Transactions

Queries on properties

Key range scan

Get and set by key

Datastore ✓ ✓ ✓ ✓ ✓Megastore ✓ ✓ ✓ ✓Bigtable ✓ ✓

Thursday, May 26, 2011

Megastore API

• “Give me all rows where the column ‘name’ equals ‘ikai’”• “Transactionally write an update to this group of entities”• “Do a cross datacenter write of this data such that reads will be

strongly consistent” (High Replication Datastore)• Megastore paper: http://www.cidrdb.org/cidr2011/Papers/

CIDR11_Paper32.pdf

12

Thursday, May 26, 2011

Datastore layers

13

Complex queries

Entity Group Transactions

Queries on properties

Get and set by key, key range scans

Datastore ✓ ✓ ✓ ✓Megastore ✓ ✓ ✓Bigtable ✓

Complex queries

Entity Group Transactions

Queries on properties

Key range scan

Get and set by key

Datastore ✓ ✓ ✓ ✓ ✓Megastore ✓ ✓ ✓ ✓Bigtable ✓ ✓

Thursday, May 26, 2011

App Engine Datastore API

• “Give me all Users for my app where the name equals ‘ikai’, company equals ‘Google’, and sort them by the ‘awesome’ column, descending”

14

Thursday, May 26, 2011

Thursday, May 26, 2011

Queries

Thursday, May 26, 2011

Let’s save an Entity with the low-level Java API DatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai"); ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography", "Ikai is a great man, a great, great man."); datastore.put(ikai);

16

Thursday, May 26, 2011

Get an instance of the DatastoreServiceDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai"); ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography", "Ikai is a great man, a great, great man."); datastore.put(ikai);

17

Fetch a client instance

Thursday, May 26, 2011

Instantiate a new EntityDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai"); ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography", "Ikai is a great man, a great, great man."); datastore.put(ikai);

18

Set the Entity Kind

Thursday, May 26, 2011

Instantiate a new EntityDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai"); ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography", "Ikai is a great man, a great, great man."); datastore.put(ikai);

19

Set a unique key

Thursday, May 26, 2011

Set indexed propertiesDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai"); ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography", "Ikai is a great man, a great, great man."); datastore.put(ikai);

20

First argument is the property name

Second argument is the property value

Thursday, May 26, 2011

Set unindexed propertiesDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai"); ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography", "Ikai is a great man, a great, great man."); datastore.put(ikai);

21

This property will be saved, but we will not run queries against it

Thursday, May 26, 2011

Commit the entity to the datastoreDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com");

ikai.setProperty("firstName", "ikai"); ikai.setProperty("company", "google");

ikai.setUnindexedProperty("biography", "Ikai is a great man, a great, great man."); datastore.put(ikai);

22

Save the thing!

Thursday, May 26, 2011

What happens when we save?

23

Write the entity

Write the indexes

Make the write RPC Success!

Thursday, May 26, 2011

What actually gets written?

24

Bigtable key Value

AppId:User:ikai@google.com ( Protobuf serialized entity - includes firstName, company and biography values )

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key Value

AppId:User:firstName:ikai:ikai@google.com ( Empty )

AppId:User:company:google:ikai@google.com ( Empty )

Entities table

Indexes table

Thursday, May 26, 2011

Now let’s run a query

• If we have the key, we can fetch it right away by key• What if we don’t? We need indexes.

25

Thursday, May 26, 2011

Let’s run a queryDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Query queryByName = new Query("User");

queryByName.addFilter("firstName", FilterOperator.EQUAL, "ikai");

List<Entity> results = datastore.prepare( queryByName).asList( FetchOptions.Builder.withDefaults());

// Roughly equivalent to: // SELECT * from User WHERE firstname = ‘ikai’;

26

Thursday, May 26, 2011

Step 1: Query the indexes table

27

Bigtable key Value

AppId:User:ikai@google.com ( Protobuf serialized entity - includes firstName, company and biography values )

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key Value

AppId:User:firstName:ikai:ikai@google.com ( Empty )

AppId:User:company:google:ikai@google.com ( Empty )

Entities table

Indexes table

Scan the indexes table for values >= AppId:User:firstName:

Thursday, May 26, 2011

Step 2: Start extracting keys

28

Bigtable key Value

AppId:User:ikai@google.com ( Protobuf serialized entity - includes firstName, company and biography values )

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key Value

AppId:User:firstName:ikai:ikai@google.com ( Empty )

AppId:User:company:google:ikai@google.com ( Empty )

Entities table

Indexes table

That gets us this row - extract the key ikai@google.com

Thursday, May 26, 2011

Step 3: Batch get the entities themselves

29

Bigtable key Value

AppId:User:ikai@google.com ( Protobuf serialized entity - includes firstName, company and biography values )

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key Value

AppId:User:firstName:ikai:ikai@google.com ( Empty )

AppId:User:company:google:ikai@google.com ( Empty )

Entities table

Indexes tableNow let’s go back to the entities table and fetch that key. Success!

Thursday, May 26, 2011

Key takeaways

• This isn’t a relational database– There are no full table scans– Indexes MUST exist for every property we want to query– Natively, we can only query on matches or startsWith queries– Don’t index what we never need to query on

• Get by key = one step. Query on property value = 2 steps

30

Thursday, May 26, 2011

Let’s run a more complex query!DatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Query queryByName = new Query("User");

queryByName.addFilter("firstName", FilterOperator.EQUAL, "ikai");

queryByName.addFilter("company", FilterOperator.EQUAL, "google");

List<Entity> results = datastore.prepare( queryByName).asList( FetchOptions.Builder.withDefaults());

// Roughly equivalent to: // SELECT * from User WHERE firstname = ‘ikai’ // AND company = ‘google’;

31

Thursday, May 26, 2011

Query resolution strategies

• This query can be resolved using built in indexes– Zig zag merge join - we’ll cover this example

• Can be optimized using composite indexes

32

Thursday, May 26, 2011

Zig zag across multiple indexes

33

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Begin by scanning indexes >= AppId:User:company:google

Thursday, May 26, 2011

Zig zag across multiple indexes

34

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

There’s at least a partial match, so we “jump” to the next index

Thursday, May 26, 2011

Zig zag across multiple indexes

35

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.comMove to the next index. Start a scan for keys >= AppId:User:firstName:ikai:david@google.com

Thursday, May 26, 2011

Zig zag across multiple indexes

36

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.comOkay, so that’s a twist. The first value that matches has key ikai@google.com! Does this value exist in the first index?

Thursday, May 26, 2011

Zig zag across multiple indexes

37

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Let’s advance the original cursor to >= AppId:User:company:google:ikai@google.com

Thursday, May 26, 2011

Zig zag across multiple indexes

38

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Alright! We found a match. Let’s add the key to our in memory list and go back to the first index

Thursday, May 26, 2011

Zig zag across multiple indexes

39

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Let’s move on to see if there are any more matches. Let’s start at max@google.com

Thursday, May 26, 2011

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Zig zag across multiple indexes

40

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Are there any keys >= AppId:User:firstName:ikai:max@google.com?

Thursday, May 26, 2011

Zig zag across multiple indexes

41

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

No. We’re at the end of our index scans. Let’s do a batch key of our list of keys: [ ‘ikai@google.com’ ]

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:ikai:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Thursday, May 26, 2011

Batch get the entities themselves

42

Bigtable key Value

AppId:User:ikai@google.com ( Protobuf serialized entity - includes firstName, company and biography values )

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Entities table

Now let’s go back to the entities table and fetch that key. Success!

Thursday, May 26, 2011

Let’s change the shape of the data

• Zig zag performance is HIGHLY dependent on the shape of the data

• Let’s go ahead and muck with the data a bit

43

Thursday, May 26, 2011

Same query, sparsely distributed matches

44

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Thursday, May 26, 2011

Same query, sparsely distributed matches

45

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Begin by scanning indexes >= AppId:User:company:google

Thursday, May 26, 2011

Same query, sparsely distributed matches

46

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Move to the next index. Start a scan for keys >= AppId:User:firstName:ikai:david@google.com

Thursday, May 26, 2011

Same query, sparsely distributed matches

47

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Oh ... no matches. Let’s move back to the first index and move the cursor down

Thursday, May 26, 2011

Same query, sparsely distributed matches

48

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Okay, we’ve got another Googler

Thursday, May 26, 2011

Same query, sparsely distributed matches

49

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Move to the next index. Start a scan for keys >= AppId:User:firstName:ikai:ikai@google.com

Thursday, May 26, 2011

Same query, sparsely distributed matches

50

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Oh ... no matches here either. Let’s go back to the first index.

Thursday, May 26, 2011

Same query, sparsely distributed matches

51

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:firstName:alfred:alfred@acme.com

AppId:User:firstName:ikai:ikai@acme.com

AppId:User:firstName:igor:ikai@google.com

AppId:User:firstName:ikai:ikai@megacorp.com

AppId:User:firstName:zed:zed@megacorp.com

Bigtable key

AppId:User:company:acme:alfred@acme.com

AppId:User:company:google:david@google.com

AppId:User:company:google:ikai@google.com

AppId:User:company:google:max@google.com

AppId:User:company:megacorp:zed@megacorp.com

Oh ... no matches here either. Let’s go back to the first index.

... if these indexes were huge, we could be here for a while!

Thursday, May 26, 2011

What happens in this case?

• If we traverse too many indexes, the datastore throws a NeedIndexException

• We’ll want to build a composite index

52

Thursday, May 26, 2011

Composite index

53

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:company:acme:firstName:alfred:alfred@acme.com

AppId:User:company:google:firstName:david:david@google.com

AppId:User:company:google:firstName:ikai:ikai@google.com

AppId:User:company:google:firstName:max:max@google.com

AppId:User:company:megacorp:firstName:zed:zed@megacorp.com

Thursday, May 26, 2011

Composite index

54

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:company:acme:firstName:alfred:alfred@acme.com

AppId:User:company:google:firstName:david:david@google.com

AppId:User:company:google:firstName:ikai:ikai@google.com

AppId:User:company:google:firstName:max:max@google.com

AppId:User:company:megacorp:firstName:zed:zed@megacorp.com

Search for all keys >= AppId:User:company:google:firstName:ikai

Thursday, May 26, 2011

Composite index

55

Read more: http://code.google.com/appengine/articles/storage_breakdown.html

Bigtable key

AppId:User:company:acme:firstName:alfred:alfred@acme.com

AppId:User:company:google:firstName:david:david@google.com

AppId:User:company:google:firstName:ikai:ikai@google.com

AppId:User:company:google:firstName:max:max@google.com

AppId:User:company:megacorp:firstName:zed:zed@megacorp.com

Well, that was much faster, wasn’t it?

Thursday, May 26, 2011

Composite index tradeoffs

• Created at entity save time - incurs additional datastore CPU and storage quota

• You can only create 200 composite index• You need to know the possible queries ahead of time!

56

Thursday, May 26, 2011

Complex Queries takeaways

• This isn’t a relational database – There are no full table scans– Indexes MUST exist for every property we want to query

• Performance depends on the shape of the data• Worse case scenario: if your query matches are highly sparse• Build composite indexes when you need them

57

Thursday, May 26, 2011

Thursday, May 26, 2011

Entity Groups

Thursday, May 26, 2011

Why entity groups?

• We can perform transactions within this group - but not outside• Data locality - data are stored “near” each other• Strongly consistent queries when using High Replication

datastore within this entity group

59

Thursday, May 26, 2011

Entity groups and transactions

• A hierarchical structuring of your data into Megastore’s unit of atomicity

• Allows for transactional behavior - but only within a single entity group

• Key unit of consistency when using High Replication datastore

60

Thursday, May 26, 2011

Example: Data for a blog hosting service

61

Comment

Blog

Entry

User

Has manyHas many

Has many

Thursday, May 26, 2011

Example: Data for a blog hosting service

62

Comment

Blog

Entry

User

Has manyHas many

Has many

This can be structured as an entity group (tree structure)!

Thursday, May 26, 2011

Structure this data as an entity group

63

Blog

Entry

User

Blog

Entry Entry

CommentCommentComment

Entity group root

Thursday, May 26, 2011

How are entity groups stored?

64

Bigtable key ValueAppId:User:ikai@google.com ( Protobuf serialized User )

AppId:User:ikai@google.com/Blog:123 ( Protobuf serialized Blog )

AppId:User:ikai@google.com/Blog:123/Entry:456 ( Protobuf serialized Entry )

AppId:User:ikai@google.com/Blog:123/Entry:789 ( Protobuf serialized Entry )

AppId:User:ikai@google.com/Blog:123/Entry:456/Comment:111

( Protobuf serialized Comment )

AppId:User:ikai@google.com/Blog:123/Entry:456/Comment:222

( Protobuf serialized Comment )

AppId:User:ikai@google.com/Blog:123/Entry:789/Comment:333

( Protobuf serialized Comment )

Read more: http://code.google.com/appengine/docs/python/datastore/entities.html

Entities table

Thursday, May 26, 2011

How are entity groups stored?

65

Bigtable key ValueAppId:User:ikai@google.com ( Protobuf serialized User )

AppId:User:ikai@google.com/Blog:123 ( Protobuf serialized Blog )

AppId:User:ikai@google.com/Blog:123/Entry:456 ( Protobuf serialized Entry )

AppId:User:ikai@google.com/Blog:123/Entry:789 ( Protobuf serialized Entry )

AppId:User:ikai@google.com/Blog:123/Entry:456/Comment:111

( Protobuf serialized Comment )

AppId:User:ikai@google.com/Blog:123/Entry:456/Comment:222

( Protobuf serialized Comment )

AppId:User:ikai@google.com/Blog:123/Entry:789/Comment:333

( Protobuf serialized Comment )

Read more: http://code.google.com/appengine/docs/python/datastore/entities.html

Entities table Entity groups have a single root entity

Thursday, May 26, 2011

How are entity groups stored?

66

Bigtable key ValueAppId:User:ikai@google.com ( Protobuf serialized User )

AppId:User:ikai@google.com/Blog:123 ( Protobuf serialized Blog )

AppId:User:ikai@google.com/Blog:123/Entry:456 ( Protobuf serialized Entry )

AppId:User:ikai@google.com/Blog:123/Entry:789 ( Protobuf serialized Entry )

AppId:User:ikai@google.com/Blog:123/Entry:456/Comment:111

( Protobuf serialized Comment )

AppId:User:ikai@google.com/Blog:123/Entry:456/Comment:222

( Protobuf serialized Comment )

AppId:User:ikai@google.com/Blog:123/Entry:789/Comment:333

( Protobuf serialized Comment )

Read more: http://code.google.com/appengine/docs/python/datastore/entities.html

Entities table

Child entities embed the entire ancestry in their keys

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

67

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

68

Create the root entity

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

69

This is the first child entity - notice the third argument, which specifies the parent entity key

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

70

The next deeper entity sets the blog as the parent

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

71

We can also opt to not provide a key name and just use a parent key for a new entity

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

72

Start a new transaction

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

73

Put the entities in parallel

Thursday, May 26, 2011

Let’s write an entity group transactionallyDatastoreService datastore = DatastoreServiceFactory

.getDatastoreService();

Entity ikai = new Entity("User", "ikai@google.com"); Entity blog = new Entity("Blog", "ikaisays.com", ikai.getKey()); Entity entry = new Entity("Entry", "datastore-intro", blog.getKey()); // Auto assign an ID Entity comment = new Entity("Comment", entry.getKey()); Transaction tx = datastore.beginTransaction(); // Helper function for clarity datastore.put(Arrays.asList(ikai, blog,entry, comment)); tx.commit();

74

Actually commit the changes

Thursday, May 26, 2011

Step 1: Commit

75

Commit Changes to entities visible

Changes to entities and indexes visible

Roll the timestamp forward on the root entity

Thursday, May 26, 2011

Step 2: Entity visible

76

Commit Changes to entities visible

Changes to entities and indexes visible

On read, check for the most recent timestamp on the root entity

This is the version we want since it represents a complete write

Thursday, May 26, 2011

Step 3: Indexes updated

77

Commit Changes to entities visible

Changes to entities and indexes visible

Indexes are written - now we can query for this entity with the new properties

Thursday, May 26, 2011

Entity group and transactions takeaways

• Structure data into hierarchical trees– Large enough to be useful, small enough to maximize

transactional throughput

• Transactions need an entity group root - roughly 1 transaction/second– If you write N entities that are all part of 1 entity group, it counts as

1 write

• Optimistic locking used - can be expensive with a lot of contention

78

Thursday, May 26, 2011

General datastore tips

• Denormalize as much as possible– As much as possible, treat datastore as a key-value store

(Dictionary or Map like structure)– Move large reporting to offline processing. This lets you avoid

unnecessary indexes

• Use entity groups for your data• Build composite indexes where you need them - “need” depends

on shape of your data

79

Thursday, May 26, 2011

Thursday, May 26, 2011

Questions?

Thursday, May 26, 2011

Thursday, May 26, 2011