Shimin Chen (LBA Reading Group Presentation)

24
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)

description

Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006. Shimin Chen (LBA Reading Group Presentation). Motivation. Focus: run-time monitoring for development Tool overhead a mount of analysis at test-time - PowerPoint PPT Presentation

Transcript of Shimin Chen (LBA Reading Group Presentation)

Introspective 3D ChipsS. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006

Shimin Chen

(LBA Reading Group Presentation)

Motivation Focus: run-time monitoring for development Tool overhead amount of analysis at test-time Previous research: specialized on-chip h/w

modules At odds with economics of consumer

microprocessors May require significant amount of area Often introduce interconnect congestion Replicated on every processors whether used or not

Challenge: enabling these techniques with a minimum of impact on typical end-user systems

Solution: Add-On using 3D

Optionally adding a layer to a processor specifically for analysis

Developers: processors with this layer

End users: processors without this layer

Outline

Introduction Benefits of Introspection in 3D Quantifying the Technology

(Methodology) Architectural Ramifications

(Evaluation) Conclusion

Benefits of Introspection in 3D Cutting interconnect impact Reducing cost for commodity parts Enabling more powerful software

analysis

Cutting Interconnect Impact

Previous: gathering data from all over chip for centralized analysis

Global interconnect Cross almost every design block Consume significant top metal layer Run at high speed Require wire buffering &

even pipeline latches Reserve silicon for buffers

Cutting Interconnect Impact

Previous: global interconnect 3D: Area for inter-layer vias localized to positions of

taps

Reducing Cost for Commodity Parts

225 million PCs in use vs. 0.7 million programmers

Need to consider two costs: Cost of a consumer system:

cirtuit that drives the post and the vertical column of vias

Cost of a developer system: adding an extra layer

Enabling More Powerful SW Analysis

More h/w resources allocated to analysis Area power

Outline

Introduction Introspection in 3D Quantifying the Technology Architectural Ramifications Conclusion

Cross Section of 3D Chip

Posts: 5um x 5um cross 30 - 40 um high

(compare normal metal wire: 1um x 1um)

Estimating Interconnect Overhead

Optimal buffer size and inter-buffer separation

2D interconnect overhead 3D interconnect overhead Metalization area

Number of Vertical Posts

Estimate that 1024 bits of profile data will be generated per cycle (?)

Gathering Profile Data on Pentium 4

Example HW Monitor

16KBRISC ARM

16KB

32KB

32KB 32KB

130nm technology, area: 16mm2, power: 2.7W

Outline

Introduction Introspection in 3D Quantifying the Technology Architectural Ramifications Conclusion

Four Types of Systems to Compare

Basic System (Sbase) System with integrated profiling HW

(Sintegrated) System with profiling HW stacked (Sstacked) System with profiling stubs (Sstubs)

Routability

Based on Pentium 4 analysis Sintegrated:

Total wire length=5682.3 mm Total buffers=~20,000

Sstacked: Total buffers=1024 (one per post)

Area for Wires and Buffers

Power

Thermal

Thermal

Conclusion

Economic argument: cost of specialized H/W is decoupled from consumer market

H/W stubs add only 0.021 mm2 area and 0.9% power

Thank you!