IndicThreads-Pune12-NoSQL Now and Path Ahead

download IndicThreads-Pune12-NoSQL Now and Path Ahead

of 57

Transcript of IndicThreads-Pune12-NoSQL Now and Path Ahead

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    1/57

    NoSQL: Now and Path Ahead

    Shubham Kumar SrivastavaMakeMyTrip

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    2/57

    Who am I

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    3/57

    3

    Abstract

    What and Why : NoSql

    Fundamentals

    Use Case

    Challenges

    Path Ahead

    .

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    4/57

    What is NoSql

    Database which does not adhere to the traditional relational database

    management system (RDMS) structure .

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    5/57

    Why NoSql

    Scalability and Performance

    Cost

    Data Modeling

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    6/57

    Why NoSql : Motives and Drivers

    Scalability and Performance

    Horizontal scalability better than Vertical

    Hardware getting cheaper and processing power increasing

    Less Operational complexity as against RDBMS solutions.

    In most of the solutions you get automatic sharding etc as default .

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    7/57

    Why NoSql : Motives and Drivers contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    8/57

    Why NoSql : Motives and Drivers contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    9/57

    Why NoSql : Motives and Drivers contd..

    Cost

    Scale(as with NoSql) with Hefty Cost

    Commodity hardware, software versions, upgrades,maintenance.

    This brought organizations look out for alternatives andthe need for a cost effective scale out option.

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    10/57

    Why NoSql : Motives and Drivers contd..

    Data Modeling

    SQL has been for

    Concurreny,Consistency,Integrity

    For Summations,Aggregations,Groupings

    Schema Says: What all Do I answer ??

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    11/57

    Why NoSql : Motives and Drivers contd..Data Modeling

    A plain key-value store is very powerful and fit the max use cases fora NoSQL solution

    Hierarchical or graph-like data modelling and processing.

    Values like maps of maps of maps.

    Document Databases which even store arbitrary complex objects.

    Document based indexing data stores are a huge success.

    Wh S l d d

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    12/57

    Why NoSql : Motives and Drivers contd..

    At times SW apps are not limited to these constraints . This lead todata models like

    Key/Value Store :

    Redis,MemcacheDb/Voldemort etc.

    Wide Column Store / Column Families :

    Cassandra/Hadoop(Hbase)/Hypertable/Cloudera etc.

    Document Based Stores :

    Solr/Lucene/MongoDb/CouchDb/TerraStore etc.

    Graph Data Store :

    Neo4J/GraphBase/FlockDb etc.

    Wh N S l M i d D i d

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    13/57

    Why NoSql : Motives and Drivers contd..

    Wh N S l M i d D i d

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    14/57

    Schema Says: What are the questions

    Data modeling is based on the set of Queries

    Exploit De-normalization Duplication

    Use Aggregates

    Manage Joins with App + Aggregation + DeNormalization etc.

    Why NoSql : Motives and Drivers contd..

    S F d t l

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    15/57

    Some Fanda-mentals

    CAP Theorem

    At the most only two properties of the three in ashared/distributed system can be satisfied.

    Consistency

    Availability

    Tolerance to Network Partitions

    CAP Pi t i ll

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    16/57

    CAP : Pictorially

    E l ti

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    17/57

    Explanation

    Use case:

    Scaling Web Apps

    Critical facts : Network outages are common

    Customer shopping carts, email search, social networkqueriescan tolerate stale data

    How:Compromise on Consistency in-order to remain available vsdisrupt user service at outages.

    Explanation

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    18/57

    Rather than requiring consistency after every transaction, itis enough for the database to eventually be in a consistentstate.

    Brewers CAP theorem says you have no choice if you want

    to scale up.

    Explanation

    Explanation contd

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    19/57

    Explanation contd..

    Sharp Contrast : High Speed Financial Application

    Highly Transactional

    Consistent

    Automated

    Cant live with Eventual consistency

    ACID vs BASE

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    20/57

    ACID vs BASEACID

    Atomic: Everything in a transaction succeeds or the

    entire transaction is rolled back.

    Consistent: A transaction cannot leave the database in

    an inconsistent state.

    Isolated: Transactions cannot interfere with each other.

    Durable: Completed transactions persist, even whenservers restart etc.

    Some Fanda mentals cont

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    21/57

    Some Fanda-mentals cont..

    BASE

    Basic Availability

    Soft-state

    Eventual consistency

    Consistent Hashing

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    22/57

    Consistent Hashing

    Common way to load balance .

    The machine chosen to cache object o will be:

    hash(o) mod nn:total number of machines

    Consistent Hashing contd

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    23/57

    Consistent Hashing contd..

    Adding a machine to the cache meanshash(o) mod (n + 1)

    Removing a machine to the cache means

    hash(o) mod (n - 1)

    Result on any above: Disaster

    Swamped machines with redistribution

    Consistent Hashing contd

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    24/57

    Consistent Hashing contd..

    Commonly, a hash function(e.g MD5 hash) will

    map a value into a 128-bit key, 0~2^127-1(or 32 bit

    even as given next) .

    Consistent Hashing contd

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    25/57

    Consistent Hashing contd..

    Consistent Hashing contd

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    26/57

    Consistent Hashing contd..Both Key and Machine hashed with the same function

    Consistent Hashing contd

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    27/57

    Consistent Hashing contd..

    Adding a Node

    Consistent Hashing contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    28/57

    Consistent Hashing contd..

    Removing a Node

    Use Case and NoSQL Solution

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    29/57

    Use Case and NoSQL Solution

    Problem:

    Need to store bookings per day of all hotels .Queries centered around city and regions.

    Hotel count : 1 Million

    Date Range : Now to next 365 *2 Days

    NoSQL: Path Ahead

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    30/57

    NoSQL: Path Ahead

    ACID equivalence(Neo4J,CouchDb etc)

    Transaction Support

    Atomicity

    MVCC

    NoSQL: Path Ahead contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    31/57

    Q

    Possible Solution

    Work with SQL Db w.r.t Creation/Updation etc.

    Archive the data in NoSQL for query/analysis etc.

    NoSQL: Path Ahead contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    32/57

    Q

    Enterprise Adoption and Challenges

    NoSQL looks good for Unstructured data largely

    SQL is the best choice for a broad range oftraditional workloads.

    NoSQL: Path Ahead contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    33/57

    Q

    NoSQL: Path Ahead contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    34/57

    Q

    Shout out loud

    Hybrid

    ACID + BASE

    They are not alternatives but supplements

    NoSQL: Path Ahead contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    35/57

    Q

    Maturity

    Support

    Skillset and Administration/Operation

    Analytics and BI support

    NoSQL: Path Ahead contd..

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    36/57

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    37/57

    Q & A

    References

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    38/57

    Nancy Lynch and Seth Gilbert, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.

    Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010

    Brewers CAP theorem on distributed systems", royans.net CAP Twelve Years Later: How the "Rules" Have Changed on-line resource

    E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp.Principles of DistributedComputing (PODC 00), ACM, 2000, pp. 7-10; on-line resource

    D. Abadi, "Problems with CAP, and Yahoos Little Known NoSQL System," DBMS Musings, blog, 23 Apr.2010; on-line resource.

    C. Hale, "You Cant Sacrifice Partition Tolerance," 7 Oct. 2010; on-line resource. Facebook: Scaling Out on-line resource.

    Gemstone : The Hardest Problems In Data Management on-line resource

    The Log-Structured Merge-Tree (Research Paper)

    CodeProject : Consistent Hashing on-line resource

    HighlyScalable : NoSQL Data Modeling Techniques on-line resource

    eBay Tech Blog :Cassandra Data Modeling Best Practices on-line resource

    John D Cook : Acid Vs Base on-line resource

    Merkle Trees

    Phy-Accural Faliover Detaection (Research Paper)

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    39/57

    Backup Slides

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    40/57

    Better than the Original 1

    Document Based DataStore

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    41/57

    {

    _id : ObjectId("4e77bb3b8a3e000000004f7a"),

    when : Date("2011-09-19T02:10:11.3Z",author : "alex",

    title : "No Free Lunch",

    text : "This is the text of the post. It could be very long.",

    tags : [ "business", "ramblings" ],

    votes : 5,

    voters : [ "jane", "joe", "spencer", "phyllis", "li" ],

    comments : [

    { who : "jane", when : Date("2011-09-19T04:00:10.112Z"),

    comment : "I agree." },{ who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),

    comment : "You must be joking. etc etc ..." }

    ]

    }

    User and Items

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    42/57

    User and Items : Option 1

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    43/57

    User and Items : Option 2

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    44/57

    User and Items : Option 3

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    45/57

    User and Items : Option 4

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    46/57

    Cassandra CF

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    47/57

    Cassandra SuperCF

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    48/57

    Use Case 1

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    49/57

    Ecommerce Site

    Problem : Record User Preferences e.g :Location,IP,Currency selected, Source of Traffic,Multiple other dynamic values

    Solution : In a CF based structure keep it simple

    UserId_Key:Pref2_Name:Value1,Pref2_Name:Value2,.PrefN_Name:ValueN

    Use Case 1

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    50/57

    RowKey: 1350136093705_6501082438199894

    => (column=1350136093764, value=-3242432#911167901131523, timestamp=1350136093766000)

    => (column=1350283322499, value=GOI#200701231712126570, timestamp=1350283322502001)

    => (column=1350283566051, value=GOI#200703221605283033, timestamp=1350283566054001)

    => (column=1350749595676, value=GOI#200805261514037199, timestamp=1350749595677001)

    (column=1350785230322, value=BOM#200701251747233158, timestamp=1350785230324001)

    RowKey: 1354499614310_10861558002828044

    => (column=1354499614368, value=TRV#201104071059204768, timestamp=1354499614370000, ttl=1728000)

    -------------------

    RowKey: 1349760150553_6114662943774777

    => (column=1349760152066, value=BLR#200802111324575807, timestamp=1349760152068001)

    -------------------

    RowKey: 1349805109805_6167423558533191

    => (column=1349805111833, value=TRV#312254274337517, timestamp=1349805111835001)

    -------------------

    RowKey: 1354435656227_7908056941568359 => (column=1354435656367, value=IDR#200701211254519381, timestamp=1354435656369000, ttl=1728000)

    -------------------

    RowKey: 1347648097261_15570089270962881

    => (column=1347648097304, value=DEL#201101192008115545, timestamp=1347648097307000)

    Use Case 1

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    51/57

    Get

    private Map getPrerences(Keyspace keySpace, String userId, String...

    prefernceNames) throws IOException, CharacterCodingException {

    SliceQuery rsq = HFactory.createSliceQuery(keySpace,StringSerializer.get(), StringSerializer.get(), StringSerializer.get());

    rsq.setColumnFamily(USER_PREFERENCE);

    rsq.setKey(userId);

    rsq.setColumnNames(prefernceNames);

    QueryResult orows = rsq.execute();

    Map preferenceMap = new LinkedHashMap();

    for (HColumn column : orows.get().getColumns()) {

    preferenceMap.put(column.getName(), column.getValue());

    }

    return preferenceMap;

    }

    Use Case 1

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    52/57

    Save

    Mutator m = HFactory.createMutator(keySpace, StringSerializer.get());

    HColumn userPrefrences = HFactory.createColumn(colkey, colvalue,StringSerializer.get(), StringSerializer.get());

    userPrefrences.setTtl(ttlUserPrefrences);

    m.addInsertion(rowkey, USER_PREFERENCE, userPrefrences);

    m.execute();

    Use Case 2

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    53/57

    Online Travel Site

    Problem: Need to know different metrics for acity hotels e.g.:

    Hotels booked in last X Time

    Hotels Last viewed in Y Time

    Hotels Left with Z Inventory

    Use Case 2R K 2d323436353731

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    54/57

    RowKey: 2d323436353731

    => (super_column=911167901297486,

    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 23 hour(s) ago.,

    timestamp=1354962852610000)column=6c6173747669657765646d657373616762, value=Inventory#20 ,timestamp=1354962852610000,

    column=6c6173747669657765646d657373616769, value=Bookings#8 , timestamp=135496282610000

    )

    -------------------

    RowKey: 58524f

    => (super_column=200903041759196196,

    (column=6c617374626f6f6b65646d657373616765, value=Booked#Last booked 1 day(s) ago.,timestamp=1347781187842000)

    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 2 hours ago.,timestamp=1347707080147000))

    => (super_column=200903041848352230,

    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 1 day(s) ago.,timestamp=1347266107708000))

    Use Case 2SuperSliceQuery superQuery = HFactory createSuperSliceQuery(getKeySpace()

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    55/57

    SuperSliceQuery superQuery = HFactory.createSuperSliceQuery(getKeySpace(),

    StringSerializer.get(), StringSerializer.get(),

    StringSerializer.get(), StringSerializer.get());

    superQuery.setColumnFamily(SUPER_SOCIAL_MESSAGE).setKey(cityCode);

    QueryResult result = superQuery.execute();

    List superColumns = result.get().getSuperColumns();

    if (superColumns != null) {

    for (HSuperColumn superColumn : superColumns) {

    Map messages = new HashMap();List columns = superColumn.getColumns();

    if (columns != null) {

    for (HColumn column : columns) {

    messages.put(column.getName(), column.getValue());

    }

    }

    /* The equivalent doc *\

    document.addField(superColumn.getName(), messages);

    documents.add(document);

    }

    }

    Pig Script : MR

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    56/57

    Delete All Messages

    Last Viewed start 15 minutes to 30 days ago

    GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(

    TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('VIEWED#Last viewed ',D.name,' ago.')));

    };]]>

    Last Booked 1 to 8 days ago

    GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(

    TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('Booked#Last booked ',D.name,' ago.')));

    };]]>

    Criteria's to Evaluate NoSQL Solutions

    I l i i i

  • 7/30/2019 IndicThreads-Pune12-NoSQL Now and Path Ahead

    57/57

    Internal partitioning

    Automated flexible data distribution

    Hot swappable nodes

    Replication-style

    Automated failover strategy