Googol Data

Googol records (with MySQL)

IPC | October 2008 | Alex Aulbach

© MAYFLOWER GmbH 2008

2

„Googol records“

Definition: Googol

10100

or

10 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000

000 000 000 000 000 000 000 000 000

or

“Imaginable big number”


3


Overview

What will the future bring for databases?

Is the principal way to access data the best?

Patterns (or suggestions)

and showing how that could work with MySQL.

Discuss!


4


The (performance) future of the web

Only 10-20 % of world population are “in the Internet”.

How should it be with 80 % ?


5



World population is growing and people get older.


6



More specialized databases

More ways to access them

Much easier to access

Sharing knowledge vs. closed knowledge: Who wins?

Services become more dependent to others

The web grows faster than Moores Law!(Moores Law: “Only” Factor 1000 in 20 years.)


7


What does this mean us?

We will surely come into problems

But cannot say when, where and why

No Boss. The data belongs to everyone

It’s like new roads

It’s “Real-live”!


8


Consequences of growth

New hardware will no longer solve speed problems

Even new database will not

Even a rewrite of the application won’t

Need to rethink the problems from scratch!


9


Of course...

… need for splitting, sharding, partitioning, cluster etc.

… need to plan growth from beginning of the project.

… hardware resources can no longer be planned.

… distinct importance of data.

… estimate instead of being correct.


10


But ...

Is this enough?


11


Patterns (or better: suggestions)

Brain storage engine.

Reading differs from writing.

Redundancy and specialization.

The storage itself can keep the information.

Time (and sleep).

The journey is the reward.


12


1 :: Brain storage engine :: 1

Short term memory (working memory)

Unsorted, unfiltered, any dataFast readVery much fast updates/changesRemembers which data is changed/invalidLimited


13


Long-term memory

Presorted, well filtered dataUnlimited (well, more or less)Extremely fast read access (sometimes)Updates/inserts by repeating in working memorySleep helps to better store

1 :: Brain storage engine :: 2


14


How does that model fit into real life?

Nobody awaits to find old things fast

Telephone-books

90/10-Problems


15


Show

Searching in long term memory.Scaling of working/long-term memory

vs.one table with inserts/updates/deletes.


16


2 :: Reading differs from writing

Look at the physical processes

Reading with the fingertips:No read and write at the same time

Handling reading and writing as different aspects of the same thing is a compromise

Only specialization enables good optimization


17


Reader/Writer: Simplest layout


18


The web as storage?


19


Web can work like this


20


Recursive definition of the catalog


21


Scaling, setup as “black box”


22


Share everything


23


Comments

How does this scale?

What doesn’t work with this?


24


3 :: Redundancy and specialization :: 1

We cannot backup a googol

Nobody needs backup, but everybody needs to restore


25



Redundancy:

Store the information on many places Store more important information on more places

Specialization: “Materialized views” EAV modeling and pivoting Take ideas from data warehouses and repositories


26


The wheel comes full circle:

More important: more access.More access: More need for redundancy.More redundancy: more speed and reliability.More speed and reliability: more important.



27


Implementation with Reader/Writer


28


4 :: The storage itself can keep the information.

“A storage has always physical limitations.A logical information of data which belongs together doesn't have any physical limitations.”

Alex Aulbach, Sept. 2008


29


The index is the problem!

The googol-universe is limited.The index can take “half of the galaxies”.Only the “rest” can be used for the data.

Less index means:Faster search in the “needed” index.Less time to write data and index.Less time to warm up.More space for the records.


30


Show

Access full table or split data into several parts.Index-sizeWritePresorted tables


31


5 :: Time (and sleep) :: 1

Human brain: Only three bits per second!

We all have been babies.

Trust! Just wait and see.

Developers (and customers) need to think in decadesnot in days till to the project-end.


32


Again human brain: Learns while sleeping!Why not apply this for databases?

Premise: Redundancy!Dolphins sleep only with one hemisphere at a time.

The wheel comes full circleRedundancy.Distinct read and write.

5 :: Sleep (and time) :: 2


33


Show

Well, I can’t show this, because it takes … time.


34


6 :: The journey is the reward

Future: Not so important how to search, but where.

Store step by step where to find the result, not the result.

You can find faster ways only by trying a shortcut.

It comes full circle:Search many different ways and take the fastest.While sleeping try out new things (dreaming).


35


Conclusion

Dreams may come true while sleeping.

We must invent now the toolsto solve the problems of the future.

Speed is not a matter of hardwarebut of how things are done.

Never take speed as stated:In a googol-universe wormholes exists!

Moores Law may help, but do not trust em.

Thank you!

Alex AulbachMayflower GmbH

Pleichertorstr. 2 97070 Würzburg, Germany+49 (931) 35 9 65 - [email protected]

Googol Data

Technology

Transcript of Googol Data