Adventures in RDS Load Testing
-
Upload
mike-harnish -
Category
Technology
-
view
454 -
download
1
description
Transcript of Adventures in RDS Load Testing
Adventures in RDS Load TestingMike Harnish, KSM Technology Partners LLC
ObjectivesEmpirical basis for evaluation
Of RDS as a platform for future development Of performance of different configurations
Platform for future load testing Of different configurations, schemas, and load profiles
Not strictly scientific Did not try to isolate all possible sources of variability
Not benchmarkingNot exhaustive
Some configurations not tested
Why RDS? Why Oracle?Why not DynamoDB/NoSQL?
Nothing at all against them Testing platform design does not exclude them
Why not MySQL/SQLServer? Ran out of time
Why not PostgreSQL? Ran out of time, but would be my next choice
RDBMS migration path
How We TestedProvision RDS serversGenerate test dataIntroduce distributed load
Persistent and relentless Rough-grained “batches” of work For a finite number of transactions
Monitor servers With Cloudwatch
Analyze per-batch statistics
RDS Server Configurationsdb.m2.4xlarge
High-Memory Quadruple Extra Large DB Instance: 68 GB of memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit platform, High I/O Capacity, Provisioned IOPS Optimized: 1000Mbps
At 3000 and 1000 PIOPS $3.14 base/hour, Oracle license included The largest supported instance type for Oracle
db.m1.xlarge Extra Large DB Instance: 15 GB of memory, 8 ECUs (4 virtual
cores with 2 ECUs each), 64-bit platform, High I/O Capacity, Provisioned IOPS Optimized: 1000Mbps
No PIOPS $1.13 base/hour, license included, on-demand
Test SchemaCREATE TABLE loadgen.account(
account_id NUMBER(9)
CONSTRAINT pk_account PRIMARY KEY,
balance NUMBER(6,2) DEFAULT 0 NOT NULL);
CREATE TABLE loadgen.tx(
tx_id NUMBER(9) CONSTRAINT pk_tx PRIMARY KEY,
account_id NUMBER(9) CONSTRAINT fk_tx_account
REFERENCES loadgen.account(account_id),
amount NUMBER(6,2) NOT NULL,
description VARCHAR2(100),
tx_timestamp TIMESTAMP DEFAULT SYSDATE);
CREATE INDEX loadgen.idx_tx_lookup ON loadgen.tx(account_id, tx_timestamp)
…
CREATE SEQUENCE loadgen.seq_tx_id
…
Baseline Test Data5,037,003 accounts353,225,005 transactions
Roughly 70 initial transactions per account
300GB provisioned storage Mostly to get higher PIOPS
Using ~67GB of it According to CloudWatch
Test Environment
t1.micro
c1.xlarge• 8 vCPU• 20 ECU• 7GB memory• High network performance
RDS Instances
SQLPlus JDBC
Processing View
RDS Instances(Victims)
ProducerConsumers
(12-24)Stats
CollectorTx Queue Stats Queue
Lightweight Batch Specs (2000b by 500tx){"targetReadRatio":3,"targetWriteRatio":1,"size":500,"run":"run01","id":13,"accountRange":{"start":10001,"count":5040800}
Batch Performance Stats(Also JSON formatted – tl;dr)
• 1M JDBC tx/run• 3 read : 1 write ratio• Randomized over the known
set of pre-loaded accounts• Commit per tx (not per
batch)
.csv
Transaction SpecificationsRead Transaction
Query random ACCOUNT for balance Query TX for last 10 tx by TIMESTAMP DESC Scan the returned cursor
Write Transaction Insert a random (+/-) amount into the TX table for a random
account Update the ACCOUNT table by applying that amount to the
current balance Commit (or rollback on failure)
[1] db.m2.4xlarge, 3000 PIOPS(4 consumers @ 6 threads ea)
Cumulative: 5765 tps
2 78 154 230 306 382 458 534 610 686 762 838 914 990 10661142121812941370144615221598167417501826190219780
2000
4000
6000
8000
10000
12000
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Run 01
ElapsedTimeMillis NetTPS
Batch Received by Stats Collector
Mil
lise
con
ds
Ela
pse
d p
er B
atch
TP
S
[1] db.m2.4xlarge, 3000 PIOPS(4 consumers @ 6 threads ea)
Run 01 Monitoring Results
Peaked @ 2200 Write IOPS
Disk Queue Depth > 100
What’s up with Read IOPS?
2 78 154 230 306 382 458 534 610 686 762 838 914 990 10661142121812941370144615221598167417501826190219780
5000
10000
15000
20000
25000
30000
0
2000
4000
6000
8000
10000
12000
14000
Run 02
ElapsedTimeMillis TotalTxPerSecond
Batch Received by Stats Collector
Mil
lise
con
ds
Ela
pse
d p
er B
atch
TP
S
[2] db.m2.4xlarge, 3000 PIOPS(4 consumers @ 6 threads ea) … again
Cumulative: 4804 tps???
[2] db.m2.4xlarge, 3000 PIOPS(4 consumers @ 6 threads ea) … again
Run 02 Monitoring Results
Peaked @ 2500+ Write IOPS
Disk Queue Depthtracks Write IOPS (or vice versa)
[3] db.m2.4xlarge, 3000 PIOPS(4 consumers @ 6 threads ea) … third run
Cumulative: 4842 tps
2 78 154 230 306 382 458 534 610 686 762 838 914 990 10661142121812941370144615221598167417501826190219780
5000
10000
15000
20000
25000
30000
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Run 03
ElapsedTimeMillis TotalTxPerSecond
Batch Received by Stats Collector
Mil
lise
con
ds
Ela
pse
d p
er B
atch
TP
S
[3] db.m2.4xlarge, 3000 PIOPS(4 consumers @ 6 threads ea) … third run
Run 03 Monitoring Results
Peaked @ 2500+ Write IOPS
Disk Queue Depthtracks Write IOPS (or vice versa)
Very curious what’s going on in this interval, from peak to end of run
2 78 154 230 306 382 458 534 610 686 762 838 914 990 10661142121812941370144615221598167417501826190219780
2000
4000
6000
8000
10000
12000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Run 04
ElapsedTimeMillis TotalTxPerSecond
Batch Received by Stats Collector
Mil
lise
con
ds
Ela
pse
d p
er B
atch
TP
S
[4] db.m2.4xlarge, 1000 PIOPS(2 consumers @ 6 threads ea)
Cumulative: 2854 tps
Dialed back concurrency, on the hunch that Oracle is resetting too many connections
[4] db.m2.4xlarge, 1000 PIOPS(2 consumers @ 6 threads ea)
Run 04 Monitoring Results
2 78 154 230 306 382 458 534 610 686 762 838 914 990 10661142121812941370144615221598167417501826190219780
10000
20000
30000
40000
50000
60000
70000
80000
0
1000
2000
3000
4000
5000
6000
Run 05
ElapsedTimeMillis TotalTxPerSecond
Batch Received by Stats Collector
Mil
lise
con
ds
Ela
pse
d p
er B
atch
TP
S
[5] db.m2.4xlarge, 1000 PIOPS(4 consumers @ 6 threads ea)
Cumulative: 2187 tps
Dialing back up made it worse
[5] db.m2.4xlarge, 1000 PIOPS(4 consumers @ 6 threads ea)
Run 05 Monitoring Results
2 78 154 230 306 382 458 534 610 686 762 838 914 990 10661142121812941370144615221598167417501826190219780
2000
4000
6000
8000
10000
12000
0
200
400
600
800
1000
1200
Run 06
ElapsedTimeMillis TotalTxPerSecond
Batch Received by Stats Collector
Mil
lise
con
ds
Ela
pse
d p
er B
atch
TP
S
[6] db.m1.xlarge, No PIOPS(2 consumers @ 6 threads ea)
Cumulative: 1061 tpsSome early flutter, but not much
[6] db.m1.xlarge, No PIOPS(2 consumers @ 6 threads ea)
Run 06 Monitoring Results
Different colors than on previous slides
Latency: Run 1 (3000 PIOPS)
2 74 146 218 290 362 434 506 578 650 722 794 866 938 101010821154122612981370144215141586165817301802187419460
5
10
15
20
25
0
500
1000
1500
2000
2500
Run 01 Batch Latencies (all milliseconds)
MedianWriteLatency AvgTxLatencyMs HighWriteLatency
Batch Received by Stats Collector
Ave
rag
eTx/
Med
ian
Wri
te L
aten
cy
Hig
h W
rite
Lat
ency
Latency: Run 6 (No PIOPS)
2 74 146 218 290 362 434 506 578 650 722 794 866 938 101010821154122612981370144215141586165817301802187419460
5
10
15
20
25
30
35
40
45
0
500
1000
1500
2000
2500
3000
3500
Run 06 Batch Latencies (all milliseconds)
AvgTxLatencyMs MedianWriteLatency HighWriteLatency
Batch Received by Stats Collector
Ave
rag
eTx/
Med
ian
Wri
te L
aten
cy
Hig
h W
rite
Lat
ency
Pricing
Single AZ Multi-AZ
Instance Type PIOPSStorage(GB)
HourlyO/D**
PIOPS/Month
Storage/GB-month*
Cost/Month
HourlyO/D**
PIOPS/Month
Storage/GB-month*
Cost/Month
Runs 1,2,3 db.m2.4xlarge 3000 300 $3.14 $0.10 $0.13 $2,598.30 $6.28 $0.20 $0.25 $5,196.60
Runs 4,5 db.m2.4xlarge 1000 300 $3.14 $0.10 $0.13 $2,398.30 $6.28 $0.20 $0.25 $4,796.60
Run 6 db.m1.xlarge 0 300 $1.13 $0.10 $0.10 $843.60 $2.26 $0.20 $0.20 $1,687.20
*Non-PIOPS storage also incurs I/O requests at $0.10/million requests**Oracle “license-included” pricing. Significant savings for reserved instances.
(does not include cost of backup storage)
Conclusions and Takeaways PIOPS matters
For throughput and latency
Need larger sampling periods To mitigate the effect of warm-up of instruments and subject
Need to try different R/W ratios And to gauge how they impact realized PIOPS
Backup and restore takes time Consider use of promotable read replicas, for platforms that support it Otherwise I might have had more samples
Questions?