Scaling to Millions of Concurrent SPARQL Queries on the Cloud

25
Sep 2010 Scaling to Millions of Concurrent SPARQL Queries on the Cloud OWLIM Replication Cluster @ Amazon EC2

description

OWLIM Replication Cluster running on 100 AWS EC2 instances (old presentation from 2010)

Transcript of Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Page 1: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Sep 2010

Scaling to Millions of Concurrent SPARQL Queries on the Cloud

OWLIM Replication Cluster @ Amazon EC2

Page 2: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Goals

• Test the scalability of OWLIM RC on a really large cluster

• Can we break the million queries per hour barrier?

#2 OWLIM Replication Cluster @ AWS Sep 2010

Page 3: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

INTRODUCTION

OWLIM Replication Cluster @ AWS #3 Sep 2010

Page 5: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Benchmarking AWS

• Extensive performance tests of EC2 instances

– I/O, CPU, Network

– BSBM (SPARQL), RDF materialisation

• High Memory EC2 instances offer (surprisingly) good performance for RDF related processing

– Comparable to local non-virtualised hardware

#5 OWLIM Replication Cluster @ AWS Sep 2010

Page 6: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Benchmarking AWS – testbeds

#6 OWLIM Replication Cluster @ AWS Sep 2010

CPU cores RAM (GB) Virtualisation

Local-L 2×2.4 GHz 8 ESX

Local-XL 4×2.9 GHz 12 No

Local-3XL 8×3.3 GHz 48 No

L 2×2 ECU* 7.5 Xen

XL 4×2 ECU* 15 Xen

High-Mem XL 2×3.25 ECU* 17 Xen

High-Mem 2XL 4×3.25 ECU* 34 Xen

High-Mem 4XL 8×3.25 ECU* 68 Xen

High-CPU XL 8×2.5 ECU* 7 Xen

1 ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor

Page 7: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Benchmarking AWS – BSBM 100M results

#7 OWLIM Replication Cluster @ AWS Sep 2010

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 4 16 32 64

Qu

ery

mix

es /

ho

ur

concurrent clients

Local-L

L-ub

Local-XL

XL-ub

HM-XL-ub

HM-2XL-ub

Local-3XL

Local-3XL-SSD

HM-4XL-ub

HC-XL-ub

Page 8: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Benchmarking AWS – RDF materialisation

#8 OWLIM Replication Cluster @ AWS Sep 2010

0

1000

2000

3000

4000

5000

6000

ma

teri

ali

sa

tio

n t

ime

(se

c)

UMBEL

DBP-SKOS

Page 9: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

OWLIM Replication Cluster

• Improves scalability with respect to concurrent user requests

• How does it work?

– Each write request is multiplexed to all repository instances

– Each read request is dispatched to one instance only

– To ensure load-balancing, read requests are sent to the instance with the shortest execution queue

#9 OWLIM Replication Cluster @ AWS Sep 2010

Page 10: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

OWLIM CLUSTER ON EC2 – BENCHMARKS

OWLIM Replication Cluster @ AWS #10 Sep 2010

Page 11: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

AWS testbed setup

• OWLIM Replication Cluster

– One Master node, 10-100 Slave nodes

– 100 million triples / 16GB database size

• BSBM 100M dataset

– Each cluster node has a replica of the database

– 1000 concurrent BSBM clients

• Amazon EC2

– Master node – HM-2XL (34GB RAM, 4x3.25 ECU)

– Slave nodes – HM-XL (17 GB RAM, 2x3.25 ECU)

– Ubuntu (x64)

#11 OWLIM Replication Cluster @ AWS Sep 2010

Page 12: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Total QMpH (Query Mix per Hour)

#12 OWLIM Replication Cluster @ AWS Sep 2010

0

50000

100000

150000

200000

250000

10 20 30 40 50 60 70 80 90 100

tota

l Q

Mp

H

cluster size (HM-XL nodes)

BSBM-100M, 1000 concurrent clients

1000 clients

Page 13: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Total QMpH – summary

• (almost) Linear scalability of the cluster

• 20 nodes handle more than 1 million SPARQL queries per hour (40,000 QMpH)

– 1 Query Mix = 25 SPARQL queries

• 100 nodes handle 5 million SPARQL queries per hour (200,000 QMpH)

#13 OWLIM Replication Cluster @ AWS Sep 2010

Page 14: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

QMpH per cluster node

#14 OWLIM Replication Cluster @ AWS Sep 2010

1800

1900

2000

2100

2200

2300

2400

10 20 30 40 50 60 70 80 90 100

QM

pH

pe

r n

od

e

cluster size (HM-XL nodes)

BSBM-100M, 1000 concurrent clients

1000 clients

trendline (Power)

Page 15: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

QMpH per cluster node – summary

• Low parallelisation overhead

– Only 10% deterioration in QMpH per cluster node when the cluster grows 10 times (from 10 to 100 nodes)

– Cluster nodes handle 2,000-2,300 QMpH (a standalone HM-XL node on EC2 handles ~2,500 QMpH)

#15 OWLIM Replication Cluster @ AWS Sep 2010

Page 16: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

What about the cost?

• 100,000 SPARQL queries per 1$ on AWS

– ~4,000 Query Mixes / $ • 1 Query Mix = 25 SPARQL queries

– EC2 pricing • Master node (on-demand HM-2XL) – $1.00/hour

• Slave node (on demand HM-XL) – $0.50/hour

#16 OWLIM Replication Cluster @ AWS Sep 2010

Page 17: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

What about the cost (2)

#17 OWLIM Replication Cluster @ AWS Sep 2010

3400

3600

3800

4000

4200

4400

4600

10 20 30 40 50 60 70 80 90 100

Qu

ery

Mix

es /

$

cluster size

Query Mixes per 1 USD

QMpH/$

Page 18: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

DETAILED CLUSTER METRICS

OWLIM Replication Cluster @ AWS #18 Sep 2010

Page 19: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Cluster monitoring

• Amazon CloudWatch provides instance level monitoring for EC2

– CPU load, Bandwidth utilisation, I/O, …

– Minimum granularity of monitoring periods – 1 minute

• OWLIM Cluster metrics

– Monitor Master and a random Slave for ~180 min

– Many test runs • a single run takes a few minutes

– Idle CPU/IO/Network on diagram is the time between test runs

#19 OWLIM Replication Cluster @ AWS Sep 2010

Page 20: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

CPU load (Master)

#20 OWLIM Replication Cluster @ AWS Sep 2010

0

10

20

30

40

50

60

70

80

0 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

%

time (min)

CPU load (Master)

CPU load

Page 21: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

CPU load (Slave)

#21 OWLIM Replication Cluster @ AWS Sep 2010

0

20

40

60

80

100

120

0 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

125

130

135

140

145

150

155

%

time (min)

CPU load (random Slave)

CPU load

Page 22: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Network traffic (Master)

#22 OWLIM Replication Cluster @ AWS Sep 2010

0

5

10

15

20

25

30

35

0 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

MB

/s

time (min)

Network traffic (Master)

inbound (MB/s)

outbound (MB/s)

Page 23: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Network traffic (Slave)

#23 OWLIM Replication Cluster @ AWS Sep 2010

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0 4 8

12

16

20

24

28

32

36

40

44

48

52

56

60

64

68

72

76

80

84

88

92

96

100

104

108

112

116

120

124

128

132

136

140

144

148

152

156

MB

/s

time (min)

Network traffic (random Slave)

inbound (MB/s)

outbound (MB/s)

Page 24: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

I/O (Slave)

#24 OWLIM Replication Cluster @ AWS Sep 2010

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

0 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

MB

/s

time (min)

I/O (random Slave)

Disk Read (MB/s)

Disk Write (MB/s)

Page 25: Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Q & A

Questions? @ontotext

#25 OWLIM Replication Cluster @ AWS Sep 2010