Mongodb, Node.js and You: PART I

Post on 11-Nov-2014

919 views 1 download

Tags:

description

This is the first of a three-part series I gave at jsDay 2014 in Verona, Italy.

Transcript of Mongodb, Node.js and You: PART I

Node.js, MongoDB and You: Part I

Mitch Pirtle jsDay 2014, Verona Italy - @jsdayit

First, tell me about yourselves.

New to Node?

Come on, be honest!

New to Node?

New to MongoDB?

Does your Javascript totally suck?

Ok my Javascript totally sucks.

Now about me.

Mitch Pirtle

• Recovering Joomla! founder

• Mongo Master

• Starting companies since 1995

• Musician, skate punk, football coach

• American idiot living in Turin

Important Mitch Facts

• I am not cool. However I have been called perky.

• I am not Rich. My name is Mitch. Such is life.

• I am internet famous. Just to be clear: Internet Famous + $1.50 = $1.50

About this talk.

Ok, technically there are three talks today.

• Session 1: All about MongoDB (this one)!

• Session 2: All about Node.js (that’s next)

• Session 3: The coolness of both together

That’s a lotta lotta stuff to

cover

STAY ALERT

All About MongoDB

• Brief introduction to MongoDB

• CONSOLE!

• Really cool discoveries and surprises

• Shameful admissions and painful stories

In The Beginning• We had relational databases. Back then they

were called “databases” and that’s where you stored your data.

• Primary focus: atomicity, consistency, reliability.

• Was normal to spend 6 hours. ON ONE QUERY.

• I love vacuum tubes, keep you warm in winter.

• Life was good.

What Happened

• Hello, Internet!

• Databases became immediate source of pain for scale, performance

• Traffic grew, along with it came bigger expectations, infinitely more complexity, a slew of new platforms, and Big Data™

That sure looks like a nail to me.

Troubled Relations

• Web languages gravitated toward objects, not 3NF entites/relations

• Size of data needed to live on more than one physical machine

• Performance requirements needed to be far better

Along came sharding

• Can split your data across multiple machines

• Also splits your query load across multiple machines

• Like RAID for your data, right?

What sharding brought along for the ride

• How do you back this stuff up?

• How do you spread a group query across N machines again?

• How do you run a join query that spans a sharded table?

All those hours, spent mastering 3NF and

procedural programming

IMPORTANT LESSON:

It is REALLY hard to scale a relational database

engine.

The common approach pushed logic out of the database back

into the application tier.

Then why use a relational database in the first place?

Then there was…

The Promises of MongoDB• Speed - crazy whack-daddy fast

• Simplicity - JSON documents FTW

• Embedded documents

• 16MB limit

• Scale - sharding, multimaster out of the box

• Yes, I said whack-daddy.

ENOUGH TALKBRING ON THE CONSOLE

Wait, there’s more• Fulltext: Allows for compound indexes, supports

many languages

• Sharding: You can scale collections across N machines

• GridFS: Simple interface to store files in your database (CONSOLE!)

• Multimaster: Replica Sets make it possible for read slaves, failover, redundancy

Now some cool stories

Mini Case Study: Totsy• First ecommerce site to rely on MongoDB for all

data. Everything. Even product images and associated media.

• I suspected it would be fast.

• I suspected we could develop quickly. (This was important, as they only let me hire one guy.)

So how fast was it?

Launch story• Went live with MongoDB on a quad-core

consumer grade el-cheapo machine, only 2GB RAM.

• I was terrified.

• Over a million moms waiting for the launch.

• Upon launch, load was 0.05. Highest it ever got was around 0.5.

Was development quicker?

Development impact• Simple models make for less code. There were

no sixteen-table joins, no ORM, one result had all the data needed from a single query.!

• Less code makes for less bugs. No more six-hour query debugging marathons. No more learning why UNION was faster than JOIN…

• Less bugs leaves time for more code. Did I mention they only let me hire one guy?

Even moar impact

• Used GridFS for all media storage.

• Allowed free MD5 checking for duplicates.

• Allowed storage of metadata per file (views, comments, rates, whatever else we wanted).

• No need for NFS, clumsy rsync cronjobs, high costs of NAS or iSCSI.

Now some sad stories.

The perils of schemaless

• Started prototyping quickly enough

• Made a couple changes to user model

• Made some more changes…

• WHUPS WHY FIFTEEN KINDS OF USER?!?!

Remember: Always update existing data

when changing models.

Everything in the database!

• Backups were brutal

• Forgot to separate GridFS data from main database

• Totally unprepared for the operational impact

Remember: Operational impact BEFORE you

launch.

Stump the Geek™

Thanks!

• AboutMe

• @mitchitized - Twitter

• spacemonkey - GitHub

• LinkedIn - I’M AVAILABLE!