Performance in Android: Tips and Techniques [IndicThreads Mobile Application Development Conference]
IndicThreads-Pune12-NoSQL Now and Path Ahead
-
Upload
indicthreads -
Category
Documents
-
view
223 -
download
0
Transcript of IndicThreads-Pune12-NoSQL Now and Path Ahead
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
1/57
NoSQL: Now and Path Ahead
Shubham Kumar SrivastavaMakeMyTrip
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
2/57
Who am I
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
3/57
3
Abstract
What and Why : NoSql
Fundamentals
Use Case
Challenges
Path Ahead
.
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
4/57
What is NoSql
Database which does not adhere to the traditional relational database
management system (RDMS) structure .
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
5/57
Why NoSql
Scalability and Performance
Cost
Data Modeling
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
6/57
Why NoSql : Motives and Drivers
Scalability and Performance
Horizontal scalability better than Vertical
Hardware getting cheaper and processing power increasing
Less Operational complexity as against RDBMS solutions.
In most of the solutions you get automatic sharding etc as default .
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
7/57
Why NoSql : Motives and Drivers contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
8/57
Why NoSql : Motives and Drivers contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
9/57
Why NoSql : Motives and Drivers contd..
Cost
Scale(as with NoSql) with Hefty Cost
Commodity hardware, software versions, upgrades,maintenance.
This brought organizations look out for alternatives andthe need for a cost effective scale out option.
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
10/57
Why NoSql : Motives and Drivers contd..
Data Modeling
SQL has been for
Concurreny,Consistency,Integrity
For Summations,Aggregations,Groupings
Schema Says: What all Do I answer ??
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
11/57
Why NoSql : Motives and Drivers contd..Data Modeling
A plain key-value store is very powerful and fit the max use cases fora NoSQL solution
Hierarchical or graph-like data modelling and processing.
Values like maps of maps of maps.
Document Databases which even store arbitrary complex objects.
Document based indexing data stores are a huge success.
Wh S l d d
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
12/57
Why NoSql : Motives and Drivers contd..
At times SW apps are not limited to these constraints . This lead todata models like
Key/Value Store :
Redis,MemcacheDb/Voldemort etc.
Wide Column Store / Column Families :
Cassandra/Hadoop(Hbase)/Hypertable/Cloudera etc.
Document Based Stores :
Solr/Lucene/MongoDb/CouchDb/TerraStore etc.
Graph Data Store :
Neo4J/GraphBase/FlockDb etc.
Wh N S l M i d D i d
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
13/57
Why NoSql : Motives and Drivers contd..
Wh N S l M i d D i d
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
14/57
Schema Says: What are the questions
Data modeling is based on the set of Queries
Exploit De-normalization Duplication
Use Aggregates
Manage Joins with App + Aggregation + DeNormalization etc.
Why NoSql : Motives and Drivers contd..
S F d t l
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
15/57
Some Fanda-mentals
CAP Theorem
At the most only two properties of the three in ashared/distributed system can be satisfied.
Consistency
Availability
Tolerance to Network Partitions
CAP Pi t i ll
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
16/57
CAP : Pictorially
E l ti
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
17/57
Explanation
Use case:
Scaling Web Apps
Critical facts : Network outages are common
Customer shopping carts, email search, social networkqueriescan tolerate stale data
How:Compromise on Consistency in-order to remain available vsdisrupt user service at outages.
Explanation
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
18/57
Rather than requiring consistency after every transaction, itis enough for the database to eventually be in a consistentstate.
Brewers CAP theorem says you have no choice if you want
to scale up.
Explanation
Explanation contd
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
19/57
Explanation contd..
Sharp Contrast : High Speed Financial Application
Highly Transactional
Consistent
Automated
Cant live with Eventual consistency
ACID vs BASE
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
20/57
ACID vs BASEACID
Atomic: Everything in a transaction succeeds or the
entire transaction is rolled back.
Consistent: A transaction cannot leave the database in
an inconsistent state.
Isolated: Transactions cannot interfere with each other.
Durable: Completed transactions persist, even whenservers restart etc.
Some Fanda mentals cont
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
21/57
Some Fanda-mentals cont..
BASE
Basic Availability
Soft-state
Eventual consistency
Consistent Hashing
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
22/57
Consistent Hashing
Common way to load balance .
The machine chosen to cache object o will be:
hash(o) mod nn:total number of machines
Consistent Hashing contd
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
23/57
Consistent Hashing contd..
Adding a machine to the cache meanshash(o) mod (n + 1)
Removing a machine to the cache means
hash(o) mod (n - 1)
Result on any above: Disaster
Swamped machines with redistribution
Consistent Hashing contd
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
24/57
Consistent Hashing contd..
Commonly, a hash function(e.g MD5 hash) will
map a value into a 128-bit key, 0~2^127-1(or 32 bit
even as given next) .
Consistent Hashing contd
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
25/57
Consistent Hashing contd..
Consistent Hashing contd
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
26/57
Consistent Hashing contd..Both Key and Machine hashed with the same function
Consistent Hashing contd
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
27/57
Consistent Hashing contd..
Adding a Node
Consistent Hashing contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
28/57
Consistent Hashing contd..
Removing a Node
Use Case and NoSQL Solution
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
29/57
Use Case and NoSQL Solution
Problem:
Need to store bookings per day of all hotels .Queries centered around city and regions.
Hotel count : 1 Million
Date Range : Now to next 365 *2 Days
NoSQL: Path Ahead
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
30/57
NoSQL: Path Ahead
ACID equivalence(Neo4J,CouchDb etc)
Transaction Support
Atomicity
MVCC
NoSQL: Path Ahead contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
31/57
Q
Possible Solution
Work with SQL Db w.r.t Creation/Updation etc.
Archive the data in NoSQL for query/analysis etc.
NoSQL: Path Ahead contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
32/57
Q
Enterprise Adoption and Challenges
NoSQL looks good for Unstructured data largely
SQL is the best choice for a broad range oftraditional workloads.
NoSQL: Path Ahead contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
33/57
Q
NoSQL: Path Ahead contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
34/57
Q
Shout out loud
Hybrid
ACID + BASE
They are not alternatives but supplements
NoSQL: Path Ahead contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
35/57
Q
Maturity
Support
Skillset and Administration/Operation
Analytics and BI support
NoSQL: Path Ahead contd..
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
36/57
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
37/57
Q & A
References
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
38/57
Nancy Lynch and Seth Gilbert, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.
Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010
Brewers CAP theorem on distributed systems", royans.net CAP Twelve Years Later: How the "Rules" Have Changed on-line resource
E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp.Principles of DistributedComputing (PODC 00), ACM, 2000, pp. 7-10; on-line resource
D. Abadi, "Problems with CAP, and Yahoos Little Known NoSQL System," DBMS Musings, blog, 23 Apr.2010; on-line resource.
C. Hale, "You Cant Sacrifice Partition Tolerance," 7 Oct. 2010; on-line resource. Facebook: Scaling Out on-line resource.
Gemstone : The Hardest Problems In Data Management on-line resource
The Log-Structured Merge-Tree (Research Paper)
CodeProject : Consistent Hashing on-line resource
HighlyScalable : NoSQL Data Modeling Techniques on-line resource
eBay Tech Blog :Cassandra Data Modeling Best Practices on-line resource
John D Cook : Acid Vs Base on-line resource
Merkle Trees
Phy-Accural Faliover Detaection (Research Paper)
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
39/57
Backup Slides
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
40/57
Better than the Original 1
Document Based DataStore
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
41/57
{
_id : ObjectId("4e77bb3b8a3e000000004f7a"),
when : Date("2011-09-19T02:10:11.3Z",author : "alex",
title : "No Free Lunch",
text : "This is the text of the post. It could be very long.",
tags : [ "business", "ramblings" ],
votes : 5,
voters : [ "jane", "joe", "spencer", "phyllis", "li" ],
comments : [
{ who : "jane", when : Date("2011-09-19T04:00:10.112Z"),
comment : "I agree." },{ who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),
comment : "You must be joking. etc etc ..." }
]
}
User and Items
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
42/57
User and Items : Option 1
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
43/57
User and Items : Option 2
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
44/57
User and Items : Option 3
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
45/57
User and Items : Option 4
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
46/57
Cassandra CF
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
47/57
Cassandra SuperCF
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
48/57
Use Case 1
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
49/57
Ecommerce Site
Problem : Record User Preferences e.g :Location,IP,Currency selected, Source of Traffic,Multiple other dynamic values
Solution : In a CF based structure keep it simple
UserId_Key:Pref2_Name:Value1,Pref2_Name:Value2,.PrefN_Name:ValueN
Use Case 1
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
50/57
RowKey: 1350136093705_6501082438199894
=> (column=1350136093764, value=-3242432#911167901131523, timestamp=1350136093766000)
=> (column=1350283322499, value=GOI#200701231712126570, timestamp=1350283322502001)
=> (column=1350283566051, value=GOI#200703221605283033, timestamp=1350283566054001)
=> (column=1350749595676, value=GOI#200805261514037199, timestamp=1350749595677001)
(column=1350785230322, value=BOM#200701251747233158, timestamp=1350785230324001)
RowKey: 1354499614310_10861558002828044
=> (column=1354499614368, value=TRV#201104071059204768, timestamp=1354499614370000, ttl=1728000)
-------------------
RowKey: 1349760150553_6114662943774777
=> (column=1349760152066, value=BLR#200802111324575807, timestamp=1349760152068001)
-------------------
RowKey: 1349805109805_6167423558533191
=> (column=1349805111833, value=TRV#312254274337517, timestamp=1349805111835001)
-------------------
RowKey: 1354435656227_7908056941568359 => (column=1354435656367, value=IDR#200701211254519381, timestamp=1354435656369000, ttl=1728000)
-------------------
RowKey: 1347648097261_15570089270962881
=> (column=1347648097304, value=DEL#201101192008115545, timestamp=1347648097307000)
Use Case 1
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
51/57
Get
private Map getPrerences(Keyspace keySpace, String userId, String...
prefernceNames) throws IOException, CharacterCodingException {
SliceQuery rsq = HFactory.createSliceQuery(keySpace,StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
rsq.setColumnFamily(USER_PREFERENCE);
rsq.setKey(userId);
rsq.setColumnNames(prefernceNames);
QueryResult orows = rsq.execute();
Map preferenceMap = new LinkedHashMap();
for (HColumn column : orows.get().getColumns()) {
preferenceMap.put(column.getName(), column.getValue());
}
return preferenceMap;
}
Use Case 1
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
52/57
Save
Mutator m = HFactory.createMutator(keySpace, StringSerializer.get());
HColumn userPrefrences = HFactory.createColumn(colkey, colvalue,StringSerializer.get(), StringSerializer.get());
userPrefrences.setTtl(ttlUserPrefrences);
m.addInsertion(rowkey, USER_PREFERENCE, userPrefrences);
m.execute();
Use Case 2
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
53/57
Online Travel Site
Problem: Need to know different metrics for acity hotels e.g.:
Hotels booked in last X Time
Hotels Last viewed in Y Time
Hotels Left with Z Inventory
Use Case 2R K 2d323436353731
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
54/57
RowKey: 2d323436353731
=> (super_column=911167901297486,
(column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 23 hour(s) ago.,
timestamp=1354962852610000)column=6c6173747669657765646d657373616762, value=Inventory#20 ,timestamp=1354962852610000,
column=6c6173747669657765646d657373616769, value=Bookings#8 , timestamp=135496282610000
)
-------------------
RowKey: 58524f
=> (super_column=200903041759196196,
(column=6c617374626f6f6b65646d657373616765, value=Booked#Last booked 1 day(s) ago.,timestamp=1347781187842000)
(column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 2 hours ago.,timestamp=1347707080147000))
=> (super_column=200903041848352230,
(column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 1 day(s) ago.,timestamp=1347266107708000))
Use Case 2SuperSliceQuery superQuery = HFactory createSuperSliceQuery(getKeySpace()
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
55/57
SuperSliceQuery superQuery = HFactory.createSuperSliceQuery(getKeySpace(),
StringSerializer.get(), StringSerializer.get(),
StringSerializer.get(), StringSerializer.get());
superQuery.setColumnFamily(SUPER_SOCIAL_MESSAGE).setKey(cityCode);
QueryResult result = superQuery.execute();
List superColumns = result.get().getSuperColumns();
if (superColumns != null) {
for (HSuperColumn superColumn : superColumns) {
Map messages = new HashMap();List columns = superColumn.getColumns();
if (columns != null) {
for (HColumn column : columns) {
messages.put(column.getName(), column.getValue());
}
}
/* The equivalent doc *\
document.addField(superColumn.getName(), messages);
documents.add(document);
}
}
Pig Script : MR
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
56/57
Delete All Messages
Last Viewed start 15 minutes to 30 days ago
GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(
TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('VIEWED#Last viewed ',D.name,' ago.')));
};]]>
Last Booked 1 to 8 days ago
GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(
TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('Booked#Last booked ',D.name,' ago.')));
};]]>
Criteria's to Evaluate NoSQL Solutions
I l i i i
-
7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead
57/57
Internal partitioning
Automated flexible data distribution
Hot swappable nodes
Replication-style
Automated failover strategy