Kanthaka - High Volume CDR Analyzer
-
Upload
pushpalanka-jayawardhana -
Category
Technology
-
view
3.620 -
download
1
description
Transcript of Kanthaka - High Volume CDR Analyzer
![Page 1: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/1.jpg)
Big Data CDR Analyzer
080201N – M.K.P.R. Jayawardhana
080254D – P.K.A.M. Kumara
080331L – W.D.A.I. Paranawithana
080357V – T.D.K. Perera
Project Supervisors- Mr. Thilina Anjitha – hSenid Dr.Shahani Markus Weerawarana
![Page 2: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/2.jpg)
Overview
• Background • Current Situation • Scope and Assumptions • Kanthaka – big data CDR Analyzer System • Technology Comparison - Map Reduce - No SQL Databases • Architecture • Project Plan • Risks and Possible Remedies • References
![Page 3: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/3.jpg)
Background Mobile Promotions
![Page 4: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/4.jpg)
Current Situation
• Promotions based only on their network usage
• Use only active call switch for triggering promotions
• No way of analyzing and processing high volume CDR records
• No efficient CDR analyzing method
• No access to historical data
• Complex rules not supported
&@$*#
![Page 5: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/5.jpg)
to rescue
• Selecting eligible users for both commercial organizations based and network usage based promotions.
Eg- giving 20% discount for pizza lovers within age group 16-40 who have called pizza hut more than 5 times a month
• High volume CDR analysis.
• Near real time selection of eligible users for promotions.
![Page 6: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/6.jpg)
• CDR Analyzer system which
▫ can process 30 million records per day
▫ can produce results within 10-15 seconds
▫ provides a GUI to define dynamic rules
▫ can be used to offer real-time sales promotions
for mobile subscribers
![Page 7: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/7.jpg)
Scope and Assumptions Scope
30 M
Multiple Rules
Offer Promotion
30 M
Single Rule
Select eligibilities for promotion only
Real system operation Operation expect by Kanthaka
![Page 8: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/8.jpg)
Assumptions
• CDR records can be only in .CSV format.
• Event type can be in different types like SMS, Voice call, MMS, USSD, Top-up, GPRS, LBS.
• CDR can be received as batches to the system asynchronously.
• Only 6 attributes out of many attributes will be considered during processing.
![Page 9: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/9.jpg)
Technology Comparison
![Page 10: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/10.jpg)
Lot of data + higher speed
--> Scale out system
![Page 11: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/11.jpg)
Map Reduce Hadoop map-reduce • Can handle lot of data • Latency is high that not suitable where results are expected in near real time
To count words of size of 100KB file Start time = 01.04.44 End time =01.05.12 Total time = 28 sec
![Page 12: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/12.jpg)
DB Technology Comparison
• RDMS
▫ Provide ACID properties
▫ Use sharding to scale up
▫ Managing overhead is huge in scaling up
▫ Performance degrade with higher data load
▫ Less partition tolerant
![Page 13: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/13.jpg)
DB Technology Comparison Ctd.
• NoSQL
▫ Lot of available options(Cassandra, HBase, MongoDB, Hive)
▫ Promised easy scale up(Lot of big users – Facebook, Twitter)
▫ Provide BASE properties under CAP theorem
▫ Hard to model the system into limited data model
▫ Partition tolerant
▫ More memory --> Higher performance
![Page 14: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/14.jpg)
DB Technology Comparison Ctd.
• NewSQL
▫ Provide ACID properties
▫ Familiar relational data model
▫ Options available(ScaleDB, VoltDB)
▫ Totally run on memory, hence need lot of memory
▫ Promised speed
▫ Persistency achieved by replaying logs
![Page 15: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/15.jpg)
With persistency, less restricted hardware, proven performance,
best to try out is NoSQL.
• Cassandra – a key-value pair column family store(Used at Facebook, Twitter, eBay)
• HBase – a key value pair column family store (Facebook)
• MongoDB – document store(Adobe)
• Hive – HDFS based database
![Page 16: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/16.jpg)
YCSB Benchmarks
• With more big users, active mailing lists, most promising technologies (secondary index, counters) best to try out is Cassandra.
![Page 17: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/17.jpg)
Technology selection
Technologies left behind Technologies selected
• Complex Event Processing engines(CEP)
▫ No persistency
• Rules Engine
▫ More layers More latency
• Hadoop
• NoSQL DB- Hbase, MongoDB, Hive
• NoSQL DB - Cassandra
![Page 18: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/18.jpg)
Architecture
![Page 19: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/19.jpg)
Project Plan
Milestones Target date Status
First chapters of final report - Done
ERU abstracts - Accepted
ERU Paper 31/07/2012 Due
Architecture 06/06/2012 Done
Setting up the Cassandra cluster 06/06/2012 Done
GUI for rule define 15/06/2012 On going
Bulk data load to Cassandra 15/06/2012 On going
System Requirement Specification 20/06/2012 Due
Query data from database periodically 26/06/2012 Due
Initial Design Document 27/06/2012 Due
Algorithm for Pre-processing 10/07/2012 Due
Testing 10/07/2012 Due
Final report 10/08/2012 Due
![Page 20: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/20.jpg)
Risks and Possible
Remedies
• NoSQL databases
High performance More memory
Use an external cluster with descent memory
• In the long run
Performance degrade More data
Archiving
![Page 21: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/21.jpg)
• Concurrency issues handling
Low speed Locking database
Use shadow copy
• NoSQL fails to achieve requirements
Options :
NewSQL– VoltDB (totally run on memory)
CEP (Need actions to preserve persistency )
• Handling sudden peaks
Should have an auto balancing mechanism ready
![Page 22: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/22.jpg)
Final Deliverables
• Big Data CDR Analyzer system
• Research Paper
• Final Report
![Page 23: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/23.jpg)
References
• http://www.slideshare.net/gvdinesh/cap-and-base-8169489
• B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with YCSB,” 2010, pp. 143–154.
Visit us at Kanthaka
![Page 24: Kanthaka - High Volume CDR Analyzer](https://reader034.fdocuments.net/reader034/viewer/2022051818/54b6cf774a795967678b4588/html5/thumbnails/24.jpg)
Thank You!