Indexing in Distributed Actor Systems -...

21
Indexing in Distributed Actor Systems Philip Bernstein Microsoft Mohammad Dashti EPFL Tim Kiefer TU Dresden David Maier Portland State Univ 8 th CIDR January 9, 2017

Transcript of Indexing in Distributed Actor Systems -...

Page 1: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Indexing in Distributed Actor Systems

Philip BernsteinMicrosoft

Mohammad DashtiEPFL

Tim KieferTU Dresden

David MaierPortland State Univ

8th CIDRJanuary 9, 2017

Page 2: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Stateful Object-Oriented Applications

Today’s interactive apps are built around a stateful, object-oriented middle tier

Multi-player games, IoT, social networking, mobile, telemetry

They comprise a large fraction of new app development

Naturally object-oriented, modeling real-world objects

Examples of objects

Gaming: players, games, grid positions, lobbies, player profiles, leaderboards, in-game money, and weapon caches

Social: chat rooms, messages, photos, and news items

IoT: sensors, virtual sensors (flood, break-in), buildings, vehicles, locations

2

Page 3: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Application Properties

Properties of these apps

Objects are active for minutes to days, sometimes forever

App manages a lot of state: millions of objects, knowledge graphs, images, videos

App does heavy computation: complex actions, render images, compute over graphs, …

Properties of the system

Scale out to large number of servers

Compute servers must scale out independently of storage servers

Geo-distributed for worldwide low-latency access

3

Page 4: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Middle-tier Objects Comprise a Distributed DB

Many objects outlive the processes that created them

Many (but not all) objects are persistent

Latest state is in main memory. Storage might be stale

Active objects are in-memory for fast response

4

Page 5: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Actor Systems

Many of these apps are implemented using actor systems

Simplifies distributed programming

Actors are objects that …

Communicate only via asynchronous message-passing

Messages are queued in the recipient's mailbox

No shared-memory state between actors

Process one message at a time

No multi-threaded execution inside an actor

5

Page 6: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Orleans Actor Programming Framework

Orleans is an open-source actor framework built on C#

Ensures apps are fault tolerant and scalable

https://dotnet.github.io/orleans/

Virtual actor model

Each actor has a unique location-independent ID, always valid

Actors are transparently activated on invocation

On activation, actor invokes its constructor to initialize its state (e.g., read from storage)

Actor can save state at any time (e.g., to storage)

Runtime automates fault-tolerance, load balancing, actor lifecycle, …

6

Page 7: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Actor-Oriented Database System (AODB)

Current distributed actor systems lack DB functionality

But users frequently ask for it (and hack it)

Vision: Actor-Oriented DB System

Indexes, queries, streams, transactions, replication, geo-distribution, views, triggers

AODB’s main distinguishing features

Compatible with actor framework’s programming model (developer friendly)

In-memory and elastically scales out to hundreds of servers

Agnostic to the storage system, e.g., cloud storage services7

FrontendClients

Transactions

Persistence

Geo-distribution

Indexing

Actor Middle-Tier

AODB Plug-ins

Cloud Storage

Page 8: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Scalable and Storage-Agnostic

Elastic scalability implies

Limited ability to co-locate functionality

Functionality must be parallelizable

Scale-out is more important than a fast path

Storage agnostic implies each DB feature

Must work for persisted and non-persisted objects

Must not require the storage system to support it

Should benefit from a storage system that does support it

Must cope with storage latency of cloud storage

8

Page 9: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Requirements for AODB Indexes

Statically choose indexed fields

Optional uniqueness constraints (e.g., ensure Player.Email is unique)

Index is eventually-consistent with actor and fault tolerant

Can index active actors only (e.g., offer a tournament to certain on-line players)

Can index persistent and non-persistent actors

Leverage actor storage that supports indexing

Support actor storage that does not support indexing

9

Page 10: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Challenges

Lookup should avoid activating actors

No type extents

No multi-actor transactions

10

Page 11: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Fault Tolerance

12

HashIndex on Player.Location in Storage

PlayerA

HashIndex on Player.Location

Player Storage

2. Update index

1. Update storage 3. Update index storage

Index is comprised of actors, to gain benefits of Orleans

Suppose we have an index on Player.Location

Ensure recoverability after each write to storage

Page 12: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Our solution: Multi-step Fault-tolerant Workflow

13

PlayerAHashIndex on

Player.Location

PlayerStorage

HashIndex on Player.Location in Storage

Localworkflow queue

Workflow queueStorage

Page 13: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Our solution: Multi-step Fault-tolerant Workflow

14

PlayerAHashIndex on

Player.Location

PlayerStorage

6. Removeworkflowrecord ID

4.2. Update

1. Add update to queue

Localworkflow queue

Workflow queueStorage

4. Batch update the index

4.1. Check if Player has the workflow record, too

Batch writeto Storage

2. 5.

Cont.

3. Update storageincludingworkflow record ID

HashIndex on Player.Location in Storage

Page 14: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Index Physical Representation

15

HashIndex on Player.Location

PlayerA

PlayerC

PlayerE

PlayerD

PlayerB

PlayerF

PlayerA

PlayerC

PlayerE

PlayerD

PlayerB

PlayerF

HashIndex on Player.Location for actors on Server 1

HashIndex on Player.Location for actors on Server 2

PlayerAPlayerC

PlayerE

PlayerD

PlayerB

PlayerF

HashIndex on Player in Redmond

HashIndex on Player in Bellevue

Entire index in one actor One index-actor per index bucket One index-actor per server

Page 15: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

public class PlayerProperties{

public int Rank { get; set; }

[Index]public string Location { get; set; }

}

public class Player : IndexableGrain<PlayerState, PlayerProperties>, IPlayer

{public Task Move(Direction d){

State.Location =d.GetDestination(State.Location);

return WriteStateAsync();}

public Task<string> GetLocation(){

return Task.FromResult(State.Location);}

}

public interface IPlayer : IIndexableGrain<PlayerProperties>{

Task Move(Direction d);

Task<string> GetLocation();}

Programming Interface: Index Definition

16

public class PlayerState{

public string Name { get; set; }public int Rank { get; set; } public string Location { get; set; }

}

Page 16: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Programming Interface: Index Lookup

17

IOrleansQueryable<IPlayer> activePlayersInRedmond = from player in GrainFactory.GetActiveGrains<IPlayer, PlayerProperties>()where player.Location == "Redmond"select player;

//IOrleansQueryable extends IQueryable interfaceforeach(IPlayer player in activePlayersInRedmond){

Console.WriteLine(player.GetPrimaryKeyLong());}

Use LINQ to access the index

Page 17: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Performance

18

20

30

40

50

60

70

80

90

5 10 15 20

Thro

ugh

pu

t(k

ilo r

eq

ue

sts/

seco

nd

)

Number of middle-tier servers

none one-bucket perkey persilo

5

10

15

20

25

30

0 1 2 3 4

Thro

ugh

pu

t(k

ilo r

eq

ue

sts/

seco

nd

)

Number of Indexes

1

2

3

4

5

6

7

notindexed

A-index NFTI-index

FTI-index

SMindex

Thro

ugh

pu

t(k

ilo r

equ

ests

/sec

on

d)

Index Type

Page 18: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Future Work on Indexing

Transactionally update actor and index

Range indexes

Richer materialized views

Offer indexing with other AODB features, e.g., transactions, queries, geo-dist’n

19

Page 19: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Status of Orleans’ AODB Features

Stream processing (January 2015)

Geo-distribution and multi-master replication (January 2016)

Distributed transactions (preview, this month) [MSR Technical Report]

Indexing (prototype, August 2016)

20

Page 20: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

Acknowledgments

Sebastian Burckhardt, Sergey Bykov, Julian Dominguez, Tova Milo, Jorgen Thelin, Microsoft Studios and the Orleans community.

More at https://dotnet.github.io/orleans/

21

Page 21: Indexing in Distributed Actor Systems - cidrdb.orgcidrdb.org/cidr2017/slides/p29-bernstein-cidr17-slides.pdf · On activation, actor invokes its constructor to initialize its state

22

Thank you!