Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at...

17
Large Systems: Large Systems: Design + Design + Implementation: Implementation: Google Search Google Search Image (c) Facebook

Transcript of Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at...

Page 1: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

Large Systems:Large Systems:Design + Design + Implementation:Implementation:

➢ Google SearchGoogle Search

Image (c) Facebook

Page 2: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

2

Case Study: Google Evolution

Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer Scientist Lecture lecture, November, 2010

Jeff Dean, “Evolution and future directions of large-scale storage and computation systems at Google”, SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing, ACM, New York, NY, USA (2010), pp. 1-1

https://research.google.com/pubs/jeff.html

Page 3: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

3

Page 4: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

4

Page 5: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

5

Page 6: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

6

Page 7: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

7

Page 8: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

8

Page 9: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

9

Page 10: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

10

Page 11: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

11

Leaf servers handle both index & doc requests from in-memory data structures

Page 12: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

12

Leaf servers handle both index & doc requests from in-memory data structures

Coordinates index switching as new shards become available

Page 13: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

13

New Problems

More collections to search besides Web More structured: Maps

Need more real-time results

Page 14: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

14

More Real-Time

Creating Index was batch process via MapReduce Store all documents in GFS (==HDFS) Run several MapReduce jobs to create index Upload index to Leaf servers

New documents would not show up in search results for 2-3 days [Peng and Dadek, 2010]

Needed lower “time from crawl-to-search-hit” Solution:

New data storage system: Colossus / BigTable Event-driven, incremental processing: Caffeine /

Percolator

Page 15: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

15

BigTable:

Page 16: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

16

BigTable:

Page 17: Large Systems€¦ · Case Study: Google Evolution Jeff Dean, “Building Software Systems at Google and Lessons Learned”, Stanford Computer Science Department Distinguished Computer

17

Caffeine / Percolator

Crawler uploads new version of page in BigTable Updates to BigTable can trigger code E.g. code to create index Push index update to Leafs