Googol Data

36
Googol records (with MySQL) IPC | October 2008 | Alex Aulbach

Transcript of Googol Data

Page 1: Googol Data

Googol records (with MySQL)

IPC | October 2008 | Alex Aulbach

Page 2: Googol Data

© MAYFLOWER GmbH 2008

2

„Googol records“

Definition: Googol

10100

or

10 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000

000 000 000 000 000 000 000 000 000

or

“Imaginable big number”

Page 3: Googol Data

© MAYFLOWER GmbH 2008

3

„Googol records“

Overview

What will the future bring for databases?

Is the principal way to access data the best?

Patterns (or suggestions)

and showing how that could work with MySQL.

Discuss!

Page 4: Googol Data

© MAYFLOWER GmbH 2008

4

„Googol records“

The (performance) future of the web

Only 10-20 % of world population are “in the Internet”.

How should it be with 80 % ?

Page 5: Googol Data

© MAYFLOWER GmbH 2008

5

„Googol records“

The (performance) future of the web

World population is growing and people get older.

Page 6: Googol Data

© MAYFLOWER GmbH 2008

6

„Googol records“

The (performance) future of the web

More specialized databases

More ways to access them

Much easier to access

Sharing knowledge vs. closed knowledge: Who wins?

Services become more dependent to others

The web grows faster than Moores Law!(Moores Law: “Only” Factor 1000 in 20 years.)

Page 7: Googol Data

© MAYFLOWER GmbH 2008

7

„Googol records“

What does this mean us?

We will surely come into problems

But cannot say when, where and why

No Boss. The data belongs to everyone

It’s like new roads

It’s “Real-live”!

Page 8: Googol Data

© MAYFLOWER GmbH 2008

8

„Googol records“

Consequences of growth

New hardware will no longer solve speed problems

Even new database will not

Even a rewrite of the application won’t

Need to rethink the problems from scratch!

Page 9: Googol Data

© MAYFLOWER GmbH 2008

9

„Googol records“

Of course...

… need for splitting, sharding, partitioning, cluster etc.

… need to plan growth from beginning of the project.

… hardware resources can no longer be planned.

… distinct importance of data.

… estimate instead of being correct.

Page 10: Googol Data

© MAYFLOWER GmbH 2008

10

„Googol records“

But ...

Is this enough?

Page 11: Googol Data

© MAYFLOWER GmbH 2008

11

„Googol records“

Patterns (or better: suggestions)

Brain storage engine.

Reading differs from writing.

Redundancy and specialization.

The storage itself can keep the information.

Time (and sleep).

The journey is the reward.

Page 12: Googol Data

© MAYFLOWER GmbH 2008

12

„Googol records“

1 :: Brain storage engine :: 1

Short term memory (working memory)

Unsorted, unfiltered, any dataFast readVery much fast updates/changesRemembers which data is changed/invalidLimited

Page 13: Googol Data

© MAYFLOWER GmbH 2008

13

„Googol records“

Long-term memory

Presorted, well filtered dataUnlimited (well, more or less)Extremely fast read access (sometimes)Updates/inserts by repeating in working memorySleep helps to better store

1 :: Brain storage engine :: 2

Page 14: Googol Data

© MAYFLOWER GmbH 2008

14

„Googol records“

How does that model fit into real life?

Nobody awaits to find old things fast

Telephone-books

90/10-Problems

Page 15: Googol Data

© MAYFLOWER GmbH 2008

15

„Googol records“

Show

Searching in long term memory.Scaling of working/long-term memory

vs.one table with inserts/updates/deletes.

Page 16: Googol Data

© MAYFLOWER GmbH 2008

16

„Googol records“

2 :: Reading differs from writing

Look at the physical processes

Reading with the fingertips:No read and write at the same time

Handling reading and writing as different aspects of the same thing is a compromise

Only specialization enables good optimization

Page 17: Googol Data

© MAYFLOWER GmbH 2008

17

„Googol records“

Reader/Writer: Simplest layout

Page 18: Googol Data

© MAYFLOWER GmbH 2008

18

„Googol records“

The web as storage?

Page 19: Googol Data

© MAYFLOWER GmbH 2008

19

„Googol records“

Web can work like this

Page 20: Googol Data

© MAYFLOWER GmbH 2008

20

„Googol records“

Recursive definition of the catalog

Page 21: Googol Data

© MAYFLOWER GmbH 2008

21

„Googol records“

Scaling, setup as “black box”

Page 22: Googol Data

© MAYFLOWER GmbH 2008

22

„Googol records“

Share everything

Page 23: Googol Data

© MAYFLOWER GmbH 2008

23

„Googol records“

Comments

How does this scale?

What doesn’t work with this?

Page 24: Googol Data

© MAYFLOWER GmbH 2008

24

„Googol records“

3 :: Redundancy and specialization :: 1

We cannot backup a googol

Nobody needs backup, but everybody needs to restore

Page 25: Googol Data

© MAYFLOWER GmbH 2008

25

„Googol records“

3 :: Redundancy and specialization :: 2

Redundancy:

Store the information on many places Store more important information on more places

Specialization: “Materialized views” EAV modeling and pivoting Take ideas from data warehouses and repositories

Page 26: Googol Data

© MAYFLOWER GmbH 2008

26

„Googol records“

The wheel comes full circle:

More important: more access.More access: More need for redundancy.More redundancy: more speed and reliability.More speed and reliability: more important.

3 :: Redundancy and specialization :: 3

Page 27: Googol Data

© MAYFLOWER GmbH 2008

27

„Googol records“

Implementation with Reader/Writer

Page 28: Googol Data

© MAYFLOWER GmbH 2008

28

„Googol records“

4 :: The storage itself can keep the information.

“A storage has always physical limitations.A logical information of data which belongs together doesn't have any physical limitations.”

Alex Aulbach, Sept. 2008

Page 29: Googol Data

© MAYFLOWER GmbH 2008

29

„Googol records“

The index is the problem!

The googol-universe is limited.The index can take “half of the galaxies”.Only the “rest” can be used for the data.

Less index means:Faster search in the “needed” index.Less time to write data and index.Less time to warm up.More space for the records.

Page 30: Googol Data

© MAYFLOWER GmbH 2008

30

„Googol records“

Show

Access full table or split data into several parts.Index-sizeWritePresorted tables

Page 31: Googol Data

© MAYFLOWER GmbH 2008

31

„Googol records“

5 :: Time (and sleep) :: 1

Human brain: Only three bits per second!

We all have been babies.

Trust! Just wait and see.

Developers (and customers) need to think in decadesnot in days till to the project-end.

Page 32: Googol Data

© MAYFLOWER GmbH 2008

32

„Googol records“

Again human brain: Learns while sleeping!Why not apply this for databases?

Premise: Redundancy!Dolphins sleep only with one hemisphere at a time.

The wheel comes full circleRedundancy.Distinct read and write.

5 :: Sleep (and time) :: 2

Page 33: Googol Data

© MAYFLOWER GmbH 2008

33

„Googol records“

Show

Well, I can’t show this, because it takes … time.

Page 34: Googol Data

© MAYFLOWER GmbH 2008

34

„Googol records“

6 :: The journey is the reward

Future: Not so important how to search, but where.

Store step by step where to find the result, not the result.

You can find faster ways only by trying a shortcut.

It comes full circle:Search many different ways and take the fastest.While sleeping try out new things (dreaming).

Page 35: Googol Data

© MAYFLOWER GmbH 2008

35

„Googol records“

Conclusion

Dreams may come true while sleeping.

We must invent now the toolsto solve the problems of the future.

Speed is not a matter of hardwarebut of how things are done.

Never take speed as stated:In a googol-universe wormholes exists!

Moores Law may help, but do not trust em.

Page 36: Googol Data

Thank you!

Alex AulbachMayflower GmbH

Pleichertorstr. 2 97070 Würzburg, Germany+49 (931) 35 9 65 - [email protected]