Introducing-MongoDB1

download Introducing-MongoDB1

of 57

Transcript of Introducing-MongoDB1

  • 8/7/2019 Introducing-MongoDB1

    1/57

    Introducing:

    MongoDBDavid J. C. Beach

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    2/57

    David Beach

    Software Consultant (past 6 years)

    Python since v1.4 (late 90s)Design, Algorithms, Data Structures

    Sometimes Database stuff

    not a frameworks guy

    Organizer: Front Range Pythoneers

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    3/57

    Outline

    Part I: Trends in Databases

    Part II: Mongo Basic Usage

    Part III: Advanced Features

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    4/57

    Part I:Trends in Databases

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    5/57

    Database Trends

    Past: Relational (RDBMS)

    Data stored in Tables, Rows, Columns

    Relationships designated by Primary, Foreign

    keysData is controlled & queried via SQL

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    6/57

    Trends:

    Criticisms of RDBMSRigid data model

    Hard to scale / distribute

    Slow (transactions, disk seeks)

    SQL not well standardized

    Awkward for modern/dynamic languages

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    7/57

    Trends:

    FragmentationRelational with ORM (Hibernate, SQLAlchemy)

    ODBMS / ORDBMS (push OO-concepts into database)

    Key-Value Stores (MemcacheDB, Redis, Cassandra)

    Graph (neo4j)Document Oriented (Mongo, Couch, etc...)

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    8/57

    Where Mongo Fits

    The Best Features ofDocument Databases,

    Key-Value Stores,

    and RDBMSes.

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    9/57

    What is Mongo

    Document-Oriented Database

    Produced by 10gen / Implemented in C++Source Code Available

    Runs on Linux, Mac, Windows, Solaris

    Database: GNU AGPL v3.0 License

    Drivers: Apache License v2.0

    Sunday, August 1, 2010

    http://www.10gen.com/http://www.10gen.com/http://www.10gen.com/
  • 8/7/2019 Introducing-MongoDB1

    10/57

    Mongo

    Advantagesjson-style documents(dynamic schemas)

    flexible indexing (B-Tree)

    replication and high-availability (HA)

    automatic shardingsupport (v1.6)*

    easy-to-use API

    fast queries (auto-tuningplanner)

    fast insert & deletes(sometimes trade-offs)

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    11/57

    Mongo

    Language Bindings

    C, C++, JavaPython, Ruby, Perl

    PHP, JavaScript

    (many more community supported ones)

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    12/57

    Mongo

    Disadvantages

    No Relational Model / SQL

    No Explicit Transactions / ACID

    Limited Query API

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    13/57

    When to use Mongo

    Rich semistructured records (Documents)

    Transaction isolation not essential

    Humongous amounts of data

    Need for extreme speed

    You hate schema migrations

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    14/57

    Part II:Mongo Basic Usage

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    15/57

    Installing Mongo

    Use a 64-bit OS (Linux, Mac, Windows)

    Get Binaries: www.mongodb.org

    Run mongod process

    Sunday, August 1, 2010

    http://www.mongodb.org/http://www.mongodb.org/http://www.mongodb.org/
  • 8/7/2019 Introducing-MongoDB1

    16/57

    Installing PyMongo

    Download: http://pypi.python.org/pypi/pymongo/1.7

    Build with setuptools

    (includes C extension for speed)

    # python setup.py install

    # python setup.py --no-ext install

    Sunday, August 1, 2010

    http://pypi.python.org/pypi/pymongo/1.7http://pypi.python.org/pypi/pymongo/1.7
  • 8/7/2019 Introducing-MongoDB1

    17/57

    Mongo Anatomy

    Database

    Collection

    Document

    Mongo Server

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    18/57

    >>> import pymongo

    >>> connection = pymongo.Connection(localhost)

    Getting a Connection

    Connection required for using Mongo

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    19/57

    >>> db = connection.mydatabase

    Finding a Database

    Databases = logically separate stores

    Navigation using propertiesWill create DB if not found

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    20/57

    >>> blog = db.blog

    Using a Collection

    Collection is analogous to Table

    Contains documentsWill create collection if not found

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    21/57

    >>> entry1 = {title: Mongo Tutorial,

    body: Heres a document to insert. }

    >>> blog.insert(entry1)

    ObjectId('4c3a12eb1d41c82762000001')

    Inserting

    collection.insert(document) => document_id

    document

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    22/57

    >>> entry1

    {'_id': ObjectId('4c3a12eb1d41c82762000001'),

    'body': "Here's a document to insert.",

    'title': 'Mongo Tutorial'}

    Inserting (contd.)

    Documents must have _id field

    Automatically generated unless assigned12-byte unique binary value

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    23/57

    >>> entry2 = {"title": "Another Post",

    "body": "Mongo is powerful",

    "author": "David",

    "tags": ["Mongo", "Power"]}

    >>> blog.insert(entry2)

    ObjectId('4c3a1a501d41c82762000002')

    Inserting (contd.)

    Documents may have different properties

    Properties may be atomic, lists, dictionaries

    another documentSunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    24/57

    >>> blog.ensure_index(author)

    >>> blog.ensure_index(tags)

    Indexing

    May create index on any field

    If field is list => index associates all values

    index by single value

    by multiple values

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    25/57

    bulk_entries = [ ]

    for i in range(100000):

    entry = { "title": "Bulk Entry #%i" % (i+1),

    "body": "What Content!",

    "author": random.choice(["David", "Robot"]),

    "tags": ["bulk",

    random.choice(["Red", "Blue", "Green"])]

    }

    bulk_entries.append(entry)

    Bulk Insert

    Lets produce 100,000 fake posts

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    26/57

    >>> blog.insert(bulk_entries)

    [ObjectId(...), ObjectId(...), ...]

    Bulk Insert (contd.)

    collection.insert(list_of_documents)

    Inserts 100,000 entries into blogReturns in 2.11 seconds

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    27/57

    >>> blog.remove() # clear everything

    >>> blog.insert(bulk_entries, safe=True)

    Bulk Insert (contd.)

    returns in 7.90 seconds (vs. 2.11 seconds)

    driver returns early; DB is still working...unless you specify safe=True

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    28/57

    >>> blog.find_one({title: Bulk Entry #12253})

    {u'_id': ObjectId('4c3a1e411d41c82762018a89'),u'author': u'Robot',

    u'body': u'What Content!',

    u'tags': [u'bulk', u'Green'],

    u'title': u'Bulk Entry #99999'}

    Querying

    collection.find_one(spec) => document

    spec = document of query parameters

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    29/57

    >>> blog.find_one({title: Bulk Entry #12253,

    tags: Green})

    {u'_id': ObjectId('4c3a1e411d41c82762018a89'),

    u'author': u'Robot',

    u'body': u'What Content!',

    u'tags': [u'bulk', u'Green'],

    u'title': u'Bulk Entry #99999'}

    Querying

    (Specs)Multiple conditions on document => AND

    Value for tags is an ANY match

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    30/57

    >>> green_items = [ ]

    >>> for item in blog.find({tags: Green}):

    green_items.append(item)

    Querying

    (Multiple)collection.find(spec) => cursor

    new items are fetched in bulk (behind thescenes)

    >>> green_items = list(blog.find({tags: Green}))

    - or -

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    31/57

    >>> blog.find({"tags": "Green"}).count()

    16646

    Querying

    (Counting)Use the find() method + count()

    Returns number of matches found

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    32/57

    >>> item = blog.find_one({title: Bulk Entry #12253})>>> item.tags.append(New)

    >>> blog.update({_id: item[_id]}, item)

    Updating

    collection.update(spec, document)

    updates single document matching spec

    multi=True => updates all matching docs

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    33/57

    >>> blog.remove({"author":"Robot"}, safe=True)

    Deleting

    use remove(...)

    it works like find(...)

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    34/57

    Part III:Advanced Features

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    35/57

    Advanced Querying

    Regular Expressions

    {tag : re.compile(r^Green|Blue$)}

    Nested Values {foo.bar.x : 3}

    $where Clause (JavaScript)

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    36/57

    >>> blog.find({$or: [{tags: Green}, {tags:

    Blue}]})

    Advanced Querying

    $lt, $gt, $lte, $gte, $ne

    $in, $nin, $mod, $all, $size, $exists, $type

    $or, $not

    $elemmatch

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    37/57

    >>> blog.find().limit(50) # find 50 articles

    >>> blog.find().sort(title).limit(30) # 30 titles

    >>> blog.find().distinct(author) # unique author names

    Advanced Querying

    collection.find(...)

    sort(name) - sortinglimit(...) & skip(...) [like LIMIT & OFFSET]

    distinct(...) [like SQLs DISTINCT]

    collection.group(...) - like SQLs GROUP BY

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    38/57

    Map/Reduce

    collection.map_reduce(mapper, reducer)ultimate in querying power

    distribute across multiple nodes

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    39/57

    Map/Reduce

    Visualized

    Diagram Credit:

    by Tom White; OReilly BooksChapter 2, page 20

    also see:Map/Reduce : A Visual Explanation

    1 2 3

    Sunday, August 1, 2010

    http://ayende.com/Blog/archive/2010/03/14/map-reduce-ndash-a-visual-explanation.aspxhttp://ayende.com/Blog/archive/2010/03/14/map-reduce-ndash-a-visual-explanation.aspxhttp://ayende.com/Blog/archive/2010/03/14/map-reduce-ndash-a-visual-explanation.aspx
  • 8/7/2019 Introducing-MongoDB1

    40/57

    db.runCommand({

    mapreduce: "DenormAggCollection",

    query: {

    filter1: { '$in': [ 'A', 'B' ] },

    filter2: 'C',

    filter3: { '$gt': 123 }},

    map: function() { emit(

    { d1: this.Dim1, d2: this.Dim2 },

    { msum: this.measure1, recs: 1, mmin: this.measure1,

    mmax: this.measure2 < 100 ? this.measure2 : 0 }

    );},

    reduce: function(key, vals) {

    var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 };

    for(var i = 0; i < vals.length; i++) {

    ret.msum += vals[i].msum;

    ret.recs += vals[i].recs;

    if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin;

    if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax))

    ret.mmax = vals[i].mmax;

    }

    return ret;

    },

    finalize: function(key, val) {

    val.mavg = val.msum / val.recs;

    return val;

    },out: 'result1',

    verbose: true

    });

    db.result1.

    find({ mmin: { '$gt': 0 } }).

    sort({ recs: -1 }).

    skip(4).

    limit(8);

    SELECT

    Dim1, Dim2,

    SUM(Measure1) AS MSum,

    COUNT(*) AS RecordCount,

    AVG(Measure2) AS MAvg,

    MIN(Measure1) AS MMin MAX(CASE

    WHEN Measure2 < 100

    THEN Measure2

    END) AS MMax

    FROM DenormAggTable

    WHERE (Filter1 IN (A,B))

    AND (Filter2 = C)

    AND (Filter3 > 123)

    GROUP BY Dim1, Dim2

    HAVING (MMin > 0)

    ORDER BY RecordCount DESC

    LIMIT4, 8

    !

    "

    #

    $

    %

    !

    &

    '

    !

    "

    #

    $

    %

    ()*+,-./.01-230*2/4*5+123/6)-/,+55-./

    *+7/63/8-93/02/7:-/16,/;+2470*2-/*;/7:-/?*)802=/3-7@

    A-63+)-3/1+37/B-/162+6559/6==)-=67-.@

    C==)-=67-3/.-,-2.02=/*2/)-4*)./4*+273/

    1+37/?607/+2705/;02650>670*2@

    A-63+)-3/462/+3-/,)*4-.+)65/5*=04@

    D057-)3/:6E-/62/FGAHC470E-G-4*).I

    5**802=/3795-@

    ' C==)-=67-/;057-)02=/1+37/B-/6,,50-./7*/

    7:-/)-3+57/3-7

  • 8/7/2019 Introducing-MongoDB1

    41/57

    Map/ReduceExamples

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    42/57

    Health Clinic Example

    Person registers with the Clinic

    Weighs in on the scale

    1 year => comes in 100 times

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    43/57

    Health Clinic Example

    person = { name: Bob,

    ! weighings: [! ! {date: date(2009, 1, 15), weight: 165.0},! ! {date: date(2009, 2, 12), weight: 163.2},! ! ... ]}

    Sunday, August 1, 2010

    /

  • 8/7/2019 Introducing-MongoDB1

    44/57

    for i in range(N):

    person = { 'name': 'person%04i' % i }

    weighings = person['weighings'] = [ ]

    std_weight = random.uniform(100, 200)for w in range(100):

    date = (datetime.datetime(2009, 1, 1) +

    datetime.timedelta(

    days=random.randint(0, 365))

    weight = random.normalvariate(std_weight, 5.0)

    weighings.append({ 'date': date,'weight': weight })

    weighings.sort(key=lambda x: x['date'])

    all_people.append(person)

    Map/Reduce

    Insert Script

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    45/57

    Insert Data

    Performance

    1

    10

    100

    1000

    1k 10k 100k

    3.14s

    29.5s

    292s

    Insert

    Sunday, August 1, 2010

    /

  • 8/7/2019 Introducing-MongoDB1

    46/57

    map_fn = Code("""function () {

    this.weighings.forEach(function(z) {

    emit(z.date, z.weight);

    });

    }""")

    reduce_fn = Code("""function (key, values) {

    var total = 0;

    for (var i = 0; i < values.length; i++) {

    total += values[i];

    }return total;

    }""")

    result = people.map_reduce(map_fn, reduce_fn)

    Map/Reduce

    Total Weight by Day

    Sunday, August 1, 2010

    / d

  • 8/7/2019 Introducing-MongoDB1

    47/57

    >>> for doc in result.find():

    print doc

    {u'_id': datetime.datetime(2009, 1, 1, 0, 0), u'value':

    39136.600753163315}

    {u'_id': datetime.datetime(2009, 1, 2, 0, 0), u'value':

    41685.341024046182}

    {u'_id': datetime.datetime(2009, 1, 3, 0, 0), u'value':

    38232.326554504165}

    ... lots more ...

    Map/Reduce

    Total Weight by Day

    Sunday, August 1, 2010

    t l i t b

  • 8/7/2019 Introducing-MongoDB1

    48/57

    Total Weight by Day

    Performance

    1

    10

    100

    1000

    1k 10k 100k

    4.29s

    38.8s

    384s

    MapReduce

    Sunday, August 1, 2010

    /R d

  • 8/7/2019 Introducing-MongoDB1

    49/57

    map_fn = Code("""function () {

    var target_date = new Date(2009, 9, 5);

    var pos = bsearch(this.weighings, "date",

    target_date);

    var recent = this.weighings[pos];emit(this._id, { name: this.name,

    date: recent.date,

    weight: recent.weight });

    };""")

    reduce_fn = Code("""function (key, values) {return values[0];

    };""")

    result = people.map_reduce(map_fn, reduce_fn,

    scope={"bsearch": bsearch})

    Map/Reduce

    Weight on Day

    Sunday, August 1, 2010

    M /R d

  • 8/7/2019 Introducing-MongoDB1

    50/57

    bsearch = Code("""function(array, prop, value) {

    var min, max, mid, midval;

    for(min = 0, max = array.length - 1; min midval) {

    min = mid + 1;

    } else {max = mid - 1;

    }

    }

    return (midval > value) ? mid - 1 : mid;

    };""")

    Map/Reduce

    bsearch() function

    Sunday, August 1, 2010

    W i ht D

  • 8/7/2019 Introducing-MongoDB1

    51/57

    Weight on Day

    Performance

    1

    10

    100

    1000

    1k 10k 100k1.23s

    10s

    108s

    MapReduce

    Sunday, August 1, 2010

    W i ht D

  • 8/7/2019 Introducing-MongoDB1

    52/57

    target_date = datetime.datetime(2009, 10, 5)

    for person in people.find():

    dates = [ w['date'] for w in person['weighings'] ]

    pos = bisect.bisect_right(dates, target_date)

    val = person['weighings'][pos]

    Weight on Day

    (Python Version)

    Sunday, August 1, 2010

    M /R d

  • 8/7/2019 Introducing-MongoDB1

    53/57

    Map/Reduce

    Performance

    0.1

    1

    10

    100

    1000

    1k 10k 100k

    0.37s

    2.2s

    26s

    1.23s

    10s

    108s

    MapReduce Python

    Sunday, August 1, 2010

  • 8/7/2019 Introducing-MongoDB1

    54/57

    Summary

    Sunday, August 1, 2010

    R

  • 8/7/2019 Introducing-MongoDB1

    55/57

    Resources

    www.10gen.com

    www.mongodb.org

    MongoDBThe Definitive Guide

    OReilly

    api.mongodb.org/python

    PyMongo

    Sunday, August 1, 2010

    http://www.10gen.com/http://api.mongodb.org/python/http://api.mongodb.org/python/http://www.mongodb.org/http://www.mongodb.org/http://www.10gen.com/http://www.10gen.com/
  • 8/7/2019 Introducing-MongoDB1

    56/57

    END OF SLIDES

    Sunday, August 1, 2010

    Ch lkb d

  • 8/7/2019 Introducing-MongoDB1

    57/57

    Chalkboard

    is notComic Sans

    This is Chalkboard, not Comic Sans.

    This isnt Chalkboard, its Comic Sans.

    does it matter, anyway?