LS Google Search - OS3€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems...

17
Large Systems: Large Systems: Design + Design + Implementation: Implementation: Google Search Google Search Image (c) Facebook

Transcript of LS Google Search - OS3€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems...

Large Systems:Large Systems:Design + Design + Implementation:Implementation:

➢ Google SearchGoogle Search

Image (c) Facebook

2

Case Study: Google Evolution

Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer Scientist Lecture lecture, November, 2010

Jeff Dean, “Evolution and future directions of large-scale storage and computation systems at Google”, SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, ACM, New York, NY, USA (2010), pp. 1-1

https://research.google.com/pubs/jeff.html George Coulouris et al. “Distributed Systems: Concepts and Design”

5th Ed., Addison-Wesley, Ch. 21.

3

4

5

6

7

8

9

10

11

Leaf servers handle both index & doc requests from in-memory data structures

12

Leaf servers handle both index & doc requests from in-memory data structures

Coordinates index switching as new shards become available

13

New Problems

More collections to search besides Web More structured: Maps

Need more real-time results

14

More Real-Time

Creating Index was batch process via MapReduce Store all documents in GFS (==HDFS) Run several MapReduce jobs to create index Upload index to Leaf servers

New documents would not show up in search results for 2-3 days [Peng and Dadek, 2010]

Needed lower “time from crawl-to-search-hit” Solution:

New data storage system: Colossus / BigTable Event-driven, incremental processing: Caffeine / Percolator

15

BigTable:

16

BigTable:

17

Caffeine / Percolator

Crawler uploads new version of page in BigTable Updates to BigTable can trigger code E.g. code to create index Push index update to Leafs