Scaling tokopedia-past-present-future
-
Upload
rhein-mahatma -
Category
Business
-
view
18.253 -
download
3
Transcript of Scaling tokopedia-past-present-future
CBN
apache server oracle server
Network Topology
Apps Topology
Internet
apache server
oracle server
http req http resp
sql
* We didn’t have storage * pictures uploaded is stored on the same machine * Web page & static content is served by single apache * We didn’t use CDN * We didn’t even know what is CDN
WHY??
Apps Topology
Internet
apache app server
oracle server
http req http resp
sql
Internet
apache upload / statis server
oracle server
http upload http resp
sql
Internet
apache upload / statis server
http req http resp
access web page upload pictures read staticpicturescss + js
* Oracle express edition reach it’s limit * No Partition * No Replication * Poor indexing * Read/Write and Query on the same Master DB.
WHY??
Apps Topology
Internet
apache app server
PostgreSQL Master
http req http resp
sql insert sql update sql delete
PostgreSQL Slave
sql iquery
WAL streaming Replication
* We have a lot of new products every second * We have to show search results in real time * But every second the sorting keep changing * PostgreSQL load is just too much!!!
WHY??
* We were using apache + mod perl * Apache consume a lot of resource * Our code has a lot of memory leak
WHY??
* We found out about NginX is very light and fast * We use nginx as load balancer * Replace apache modperl with nginx-perl * We have 1 nginx load balancer with several nginx-perl servers * For load balancing method, we mix round robin and clustering
SOLUTION
siege -c100 -t5s -i -b -q 'http://www.tokopedia.com/ebenhaezer' siege: invalid option -- 'q' siege: invalid option -- 'q' ** SIEGE 2.72 ** Preparing 100 concurrent users for battle. The server is now under siege... Lifting the server siege... done.
Transactions: 14788 hits Availability: 100.00 % Elapsed time: 4.59 secs Data transferred: 63.50 MB Response time: 0.03 secs Transaction rate: 3221.79 trans/sec Throughput: 13.83 MB/sec Concurrency: 87.52 Successful transactions: 7481 Failed transactions: 0 Longest transaction: 0.43 Shortest transaction: 0.00
Apps Topology
PostgreSQL Master
sql insert sql update sql delete
PostgreSQL Slave
sql iquery
WAL streaming Replication
Internet
http req http resp
NginX Load Balancer
nginx-perl #1 nginx-perl #2 nginx-perl #3 nginx-perl #n
proxy_pass
SOLR
Import
SOLR query
* Hardware limitation * We used SATA HDD not SSD * Disk Utilities 100% * No back up, No Failover * Capacity is critical * Users keep uploading pictures
WHY??
SOLUTION
Internet
nginx-perl #1
PostgreSQL Master
http req http resp
nginx-perl #2 nginx-perl #3 nginx-perl #n
NginX Load Balancer
proxy_pass
PostgreSQL Slave
replication
MongoDBprimary
MongoDBsecondary
replication
SOLR
Redis
query & update
3rd Party API such asLogistics, Banks,
Payment GwETC
Internet
We Start To Know About NginX, NoSQL
In-Memory Storage GlusterFS Storage
Scale out (not scale up) and many more…..
Lesson Learn??
* One of our GlusterFS Server is broken. Image read/write is super slow.
* We were using version of postgresql which has some bugs on indexing.
WHY??
* Mobile First Company
* Zero Downtime
* Full to Cloud
* Re-architech to SoA
* Open API to Public
* Deploy New Tech, such as replace perl with Go Lang
* Advance Alert & Monitoring
* Redundancy and Failover
* Multiple 3rd party
* Datawarehouse such as Cubes, Pentaho etc
* Machine Learning, Business Intelligence
* Build things that can be share with others
* Really pay attention on security
* and many more……
* User cannot access Tokopedia * Pictures are not showing * css and js are not loaded * Sometime it just show a blank page * Some ISPs do Ads Injection * ALL WITHOUT REASONS
FACTS
Works well* Using NginX Geo Module * All HTTPS since Q4 2014 * Try CDN Load balancing
Don’t work at all* Talked to ISP * “Fight” in idEA
What we’ve done