Architectural Musings - IBM
Transcript of Architectural Musings - IBM
1
Architectural Musings Rethinking Computer Systems Architecture & Evaluation
Christopher Vick [email protected]
March 23, 2014
2
§ Vision Talk
§ How should we analyze, reason about and evaluate Computer System Architecture in the 21st century?
§ What can history tell us about these questions? § What does this mean for the research community?
§ Mobile computing and current technologies fundamentally
change key parameters and constraints for computer system architecture
§ Vast new opportunities for research of great interest to and great relevance for industry
Introduction
3
Outline § Computer System Architecture § Then (Circa 1970)
§ Scarce Resources & Bottlenecks § Optimizations § Evaluation
§ Now (Mobile Computing Platforms) § Scarce Resources & Bottlenecks § Optimizations? § Evaluation?
§ Questions?
4
COMPUTER SYSTEM ARCHITECTURE
5
Computer System Architecture § Hardware
§ The 5 classic components (Patterson & Hennessy) § Input, Output, Memory, Datapath, Control
§ Software § System Virtual Machine (Hypervisor, VM, or VMM) § Operating System § Compilers & Tools
§ Definitions § The way components fit together § The arrangement of the various devices in a complete computer system or
network § The instruction set plus a model of the execution of the instruction set
(Amdahl et al)
§ Computer System Architecture § The selection and combination of hardware and software components to
assemble an effective computer system
6
Application Programs
Virtual Machine
Libraries
Multicore Execution Unit
Operating System
Interconnect
Drivers Memory Manager Scheduler
IO Devices Memory
Hypercall Interface
Software
Hardware
Combination
7
Effective § An optimization problem
§ Many variables § Selection of hardware/software components § Selection of interfaces/interconnects
§ Many constraints § Physical, sociological, technical & cost constraints
§ Scarce Resources and Bottlenecks § Maximize utilization of scarce resources § Minimize impact of bottlenecks
§ Evaluation § How do you measure effectiveness? § What effect does the evaluation have on the optimization?
8
THEN (CIRCA 1970)
Photo 1
Photo 2
9
Scarce Resources § CPU Cycles
§ CPUs expensive § Slow clock rates
§ Memory Locations § Random Access Memory expensive § Address/Data paths into CPU expensive
§ Skilled Programmers § Relatively new discipline § Poor language and tools support
Photo 3
10
Bottlenecks § Programmer Productivity
§ Software development slow and expensive § Low level programming paradigms
§ Memory Latency § RAM latency gated overall speed (~2-3 MHz) § Small RAM backed by vastly slower storage
§ I/O Bandwidth § Limited CPU connectivity § Crude communication mechanisms
Photo 4
11
Optimizations § Time Sharing
§ Effective sharing of limited resource
§ Virtual Memory § Effective sharing, and backing with cheaper alternative
§ Hardware Improvements § Smaller features provide more resource and faster clock § Large Scale Integration § Better signaling to improve bandwidth
§ High Level Programming Languages § Broadens productive programmer community § Abstracts away some hardware complexity
12
Evaluation § Started with primitive measures
§ MIPS § SLOC
§ Worked towards more sophisticated evaluation tools § Hennessey & Patterson very influential § SPEC CPU § TPM § Defect rate
§ Cost is always a factor
13
Examples § Digital PDP 11
§ 16-bit address space § Orthogonal instruction set § Memory mapped I/O § Unix, DOS, many others
§ IBM System 370
§ 24-bit address space § Virtual Memory § VMS, VM/370, DOS/VS § Backward compatibility with System 360
Photo 5
Photo 6
14
NOW (MOBILE COMPUTING)
15
Scarce Resources § Energy
§ Fixed Energy Budget for mobile devices § Thermal issues at all scales § Tradeoff between performance and energy § Shrinks no longer significantly improving consumption
§ Memory Bandwidth § Providing bandwidth is expensive § Memory interconnect consumes significant energy
16
Bottlenecks § Memory Latency
§ Increasing gap between CPU speed and DRAM latency § Physical distance to DRAM devices a factor
§ Concurrency § Shortage of programmers who can handle this § Inadequate language/tools support
§ I/O Bandwidth/Latency § Wireless bandwidth lower than wired § Consumes large amounts of energy
Photo 7
17
Example § Samsung Galaxy S5
§ Processor: 2.5 GHz Qualcomm® Snapdragon™ 801 (Quad Core)
§ GPU: Qualcomm® Adreno 330 § OS: Android™ 4.4.2 § Memory RAM: 2 GB DDR2 § Memory Storage: 16/32/64 GB onboard storage § Display: 5” AMOLED 1920 x 1080 HD § Network: LTE Cat 4, CDMA, UMTS/HSPA,
GSM/GPRS/EDGE § Battery: 2600 mAh § Camera (Main): 16.5 megapixel, Ultra HD § Dimensions: 142 x 73 x 8.1mm
§ This is a General Purpose Computer!
18
Optimizations? § Multi-core
§ Aggressive addition of cores and threads § Hardware concurrency outstripping software § New Concurrent Programming Models/Tools?
§ Memory Subsystem § Significant contributor to total energy consumption § Adding bandwidth is expensive § New technologies addressing some energy issues
§ Wireless bandwidth enhancements (LTE Advanced,etc.) § Solutions from desktop/server or embedded worlds
may not directly apply in mobile space!
19
Memory System Energy § Retaining data (one second)
§ DRAM: ~1-10 pJ/bit self-refresh § SRAM: 1200+ pJ/bit, and rising over time [ITRS 2009]
§ 4 pJ/bit (45nm LP, standby) [Barasinski et al., ESSCIRC ‘08] § Flash, PCM, STT RAM…: Zero !
§ Moving Data § 32-bit value:
§ Recompute: 60 pJ (Razor) § Send 1mm: 10 pJ § Retain in cache for 1 ms: 38 pJ § Retain in DRAM for 1 second: 32+ pJ
Photo 8
20
§ Move less! § Caches physically close to CPU § Locality, locality, locality (the first rule of chip real estate)
§ Retain less! § Power off unused caches lines [Kaxiras et al., ISCA ‘01] § “Drowsy” caches [Flautner et al., ISCA ‘02] § … with compiler analysis
[Zhang et al., Trans. Emb. Comp. Sys. 4(3) 2005] § Don’t refresh unused DRAM § … e.g. with garbage collection [Chen et al., CODES+ISSS ‘03]
Reducing Memory System Energy
21
§ Maintaining the illusion of a single flat memory address space is too expensive § On-chip caches can be major consumers of area and energy § Coherence protocols are expensive and difficult to scale
• Alternative: software-managed memory hierarchies – Tightly-coupled memory (TCM), scratchpads – Do not require tag memory, address comparison logic – More area- and energy-efficient – Help bridge gap between bandwidth and throughput
Extending the Memory Model
22
§ Different programming paradigm: software explicitly orchestrates all transfers between on-chip and off-chip memory areas
§ Major implications on memory management § Scratchpad allocation strategies § Data partitioning strategies § Dynamic relocation between scratchpad and DRAM to track the
program’s locality characteristics
§ Opportunities for compile-time and runtime optimization § Challenges in both Hardware and Software!
New Challenges and Opportunities
23
Evaluation § Energy/Power
§ Both matter § MIPS/Watt § Battery life § Hard to measure and lacking in precision
§ Performance § Currently rather primitive
§ Linpack, CaffeineMark, CoreMark, Quadrant § SPEC CPU § Following similar track to early PC evaluation, so should get more sophisticated
§ Need to more accurately measure/reflect the utility of the device § Balancing peak performance, throughput, battery life, etc.
§ Cost
Thank You
25
Photo Copyright Notices