Connects people to information and services...Conventional Conventional SSD SDFSDF SSD Ctrl SSD...
Transcript of Connects people to information and services...Conventional Conventional SSD SDFSDF SSD Ctrl SSD...
The search engine you can see
Connects people to information and services
The search engine you cannot see
Total data: ~1EB
Processing data : ~100PB/day
Total web pages: ~1000 Billion
Web pages updated: ~10Billion/day
requests: ~10Billion/day
Total logs : ~100PB
Logs updated: ~1PB/day
The search engine you don’t see
Large scale distributed computing Large scale distributed storage
Speech Image Rec. sys Intelligent HCI
Other cutting edge tech.
The history of Moore’s law
• The Moore’s law is going to the end
1
10
100
1000
10000
100000
PERFORMANCE OF PROCESSOR
The history of data center
1
10
100
1000
10000
100000
PERFORMANCE OF PROCESSOR
Mainframe PC cluster SDDC
History of data center
• 2000~now• Scalability
PC cluster SDDC
• 2013~• Efficiency
Outline
• What is the PC cluster
• What is the SDDC
• Baidu’s practice
• Conclusion
PC cluster
• Background – Web scale applications – The performance and cost limitations of mainframe
• Scale PC server by Ethernet – Up to 10K servers per cluster
• Typical configurations – Commodity hardware: X86 CPU, INSPUR, HUAWEI…– Software: MR/HDFS/Spark…
PC cluster
• Typical stack– Each layer are independent – The interfaces are highly abstract – Follows the technology paradigms of PC
• Limitations – Multiple highly abstract layers block to
exploit the performance potential – Commodity hardware cannot support
emerging applications, such as AI and big data
• The end of Moore’s law
Commodity hardware
Distributed software
Applications
Software-Defined Data Center - SDDC
• What is SDDC– Applications driven hardware and software– Whole-stack co-design
• How– Algorithm
• Customized for new hardware and architecture
– System and software• separate data path and control path
– Hardware• Expose low level API, fully controlled by software• Customized for applications
hardware
software
Applications
Commodity hardware
Distributed software
Applications
Software-Defined Data Center - SDDC
• Why SDDC– Exploit performance potential cross multiple layer– Customized hardware to extend Moore’s law for emerging
applications• AI and big data
– Achieve extreme efficiency
• The FPGA in SDDC– Enable the possibility of whole-stack co-design
SDDC – Baidu’s visions and practice
• Vision – Shift PC cluster to SDDC in next 3 years– Define and design the SDDC, collaborating with partners
• Practice– SDF: software-defined flash– SDA: software-defined accelerator
2011: SDF 2013: SDA 2015: design the distributed SD system
Software-defined flash – background
• Traditional SSD limitations– Low bandwidth utilization
• 40% or less in real workload
– Limited capacity utilization• Only 50%~70% for applications
– Less predictable performance
• Large-scale– 10,000+ SSD deployment per year (10PB+ capacity)
• Challenges– Acquisition of extra devices– Higher cost
Software-defined flash – designs
• Software defined– Expose low level hardware
interface to software– Software can control hardware
completely• New hardware architecture
– Expose hardware channels to software
– Individual FTL controller for each channel
• New HW/SW interface– Write in the unit of erase block
size– Leverage global resource for data
persistency• Removes across-channel parity
coding
............Flash ch_0
Flash CH_0
Flash ch_0
Flash CH_1
Flash ch_0
Flash CH_N
SSD Controller
/dev/sda
Flash ch_0
Flash CH_0
Flash ch_0
Flash CH_1
Flash ch_0
Flash CH_N
SSD Ctrl
/dev/sda0 ~/dev/sdaN
Conventional SSDConventional SSD SDFSDF
SSD Ctrl SSD Ctrl
Software-defined flash – designs
• Removing unnecessary software layers– To reduce latency and CPU cycles– To remove complexity of kernel configurations
• User-defined scheduler – Data layout– Erase scheduling
11111111111111111111111111111111111111111111111111111
VFS
Generic Block LayerGeneric Block Layer
IO Scheduler
PCIE
SCSI Mid-layer
SATA and SAS Translation
Block DeviceFile System
Low Level Device Driver
Conventional SSD
User Space
IOCTRLIOCTRLKernel SpaceKernel Space
User SpaceUser Space
Buffered IOBuffered IODirect IODirect IO
(a) (b)
PCIE Driver
SDF
PageCache
1
Software-defined flash – designs
• Hardware – 25nm MLC NAND, 44 channels, ONFI 1.x asynchronous 40Mhz– 5 FPGA, 4 Spartan-6 for FTL, Virtex-5 for PCIE
PCIEx8
Virtex-5
Spartan-6 Spartan-6 Spartan-6 Spartan-6
11 channels 11 channels 11 channels 11 channels
Software-defined flash – conclusions
• Key ideas– Exposes flash channels to software– SW/HW co-design
• Results– 95% write and 99% read bandwidth utilization – 99% capacity utilization– 50% cost reduction per GB compared with SSD for workload on
the production systems
• 3000+ deployment in Baidu Webpage storage system– 3x performance better than commodity SSD– 50% cost reduction
Software-defined accelerator – background
• AI is the core technology – speech, image, page ranking
and Ads.
• Extremely computing density – GPU
• High cost• High power and high space consumption• Higher demand on data center cooling,
power supply, and space utilization
– CPU• Medium cost and power consumption• Low speed
– FPGA• Most potential• Need faster iteration of development
Software-defined accelerator – design
• Xilinx K7 FPGA– Best performance/cost/power
consumption
• Evaluations • Batch size=8, layer=8
• Workload1– Weight matrix size=512– FPGA is 4.1x than GPU– FPGA is 3x than CPU
Workload2– Weight matrix size=2048– FPGA is 2.5x than GPU– FPGA is 3.5x than CPU
• Conclusions – FPGA can merge the small requests to improve performance– Throughput in Req/s of FPGA scales better
0
100
200
300
400
500
600
700
1 2 4 8 12 16 24 32
CPU
GPU
FPGA
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 12 16 24 32 40 48 56 64
CPU
GPU
FPGA
Thread #
Thread #
Req/sFig a:workload1
Fig b: workload2
4.1x
3x
2.5x
3.5x
Conclusion
• Paradigm shift– From PC cluster to SDDC
• What is SDDC– Applications driven– Whole-stack co-deign and tuning
• The FPGA in SDDC– Enable SDDC
• Baidu’s vision and practice– Shift PC cluster to SDDC– SDF,SDA and more