computer architecture slides 1

download computer architecture slides 1

of 27

Transcript of computer architecture slides 1

  • 8/13/2019 computer architecture slides 1

    1/27

    LECTURE 1INTRODUCTION TO

    COMPUTER ORGANISATIONTurbo Majumder

    [email protected]

    mailto:[email protected]:[email protected]
  • 8/13/2019 computer architecture slides 1

    2/27

    About Instructor and Course

    Instructor: Dr. Turbo Majumder

    Department of Electrical Engineering

    Office: III-335

    Email: [email protected]

    Phone: 1073

    Course webpage: http://web.iitd.ac.in/~turbo/EEL3

    08_1302.htm

    TAs: See course page for full list.

    Textbook:

    Computer Organization and Design: The Hardware/Software Interface,ARM Edition, David A. Patterson, John L. Hennessy, Morgan

    Kaufmann (Source of most material and figures in the lecture slides)

    Class hours: Slot F: Tue, Thu, Fri: 11:00 11:50 am

    Tutorial hours: 1:00 1:50 pm

    mailto:[email protected]://web.iitd.ac.in/~turbo/EEL308_1302.htmhttp://web.iitd.ac.in/~turbo/EEL308_1302.htmhttp://web.iitd.ac.in/~turbo/EEL308_1302.htmmailto:[email protected]
  • 8/13/2019 computer architecture slides 1

    3/27

    Grading policy

    Minor1: 20

    Minor2: 20

    Major: 30

    Class participation Term paper: 5

    Quizzes: 15

    Tutorial: 10

    Attendance policy: As per Institute rules Collaboration is good when it is open, honest and given

    due credit. Clandestine collaboration invites an F.

  • 8/13/2019 computer architecture slides 1

    4/27

    Why learn computer architecture?

    It is a core course, duh. Computers (or if you will, microprocessors) are

    everywhere. You will probably be designing or using one

    in whatever job you do.

    To design well, of course.

    To use it well (e.g. programming), you need to know what is inside.

    Plus, knowing this stuff gives you a geeky edge!

  • 8/13/2019 computer architecture slides 1

    5/27

    Where are computers used?

  • 8/13/2019 computer architecture slides 1

    6/27

    What you can hope to learn?

    A. How does my computer understandthe C-program I

    have written?

    B. Where does software and hardware interface in a

    microprocessor? How does the interface look like?

    C. What isperformance? How can I characterise it? How

    can I improve upon it?

    D. Briefly, why do we need multicore processors and

    parallel processing?

  • 8/13/2019 computer architecture slides 1

    7/27

    What impacts program performance?

    Algorithm

    Programming language, compiler and architecture

    Processor and memory design

    I/O interface design We will look at all of these in terms of A, B, C and D

    (previous slide).

  • 8/13/2019 computer architecture slides 1

    8/27

    How does my computer understand my

    program?

    Applications software

    (browser, word processor,

    media player)

    Systems software (OS)

    Computer hardware(microprocessor)

    Compiler

    Assembler

    Assemblylanguage

    Machine

    language

    High-level

    langu

    age

    c = a + b;

    ADD RC, RA, RB

    0x40af8020

    Only binary language,

    please!

    Instruction set

    architecture

  • 8/13/2019 computer architecture slides 1

    9/27

    Basic components of a computer

    Processor

    Datapath

    ControlMemory

    Volatile

    Non-volatile

    I/O

    Input

    OutputNetworking

    LAN, WAN, WLAN

  • 8/13/2019 computer architecture slides 1

    10/27

    Moores Law

    Source: Wikimedia

    Commons

  • 8/13/2019 computer architecture slides 1

    11/27

    Moores Law again

    Rachel Courtland, The Status of Moore's Law: It's Complicated, IEEE Spectrum, 28 Oct 2013 (based on data from

    Global Foundries)

  • 8/13/2019 computer architecture slides 1

    12/27

    Moores Law: New dimensions?

    AMDs Barcelona Architecture

    Quad-core, 65 nm process

    2007

    (Courtesy: AnandTech)

    Effect on number of

    cores in a

    microprocessor

    Multicores

    More on this later

  • 8/13/2019 computer architecture slides 1

    13/27

    Performance

    Mostly concerned with time performance

    Execution time Performance = 1/(Execution time)

    Important for individual applications/tasks

    Improves (decreases) with faster processors What is faster?

    Higher clock speed?

    Greater parallelism?

    Computation throughput

    Performance = No. of tasks/operations performed per second Usually from different applications

    Measured typically in GFLOPS, TFLOPS, ExaFLOPS

    Important for server/cloud applications

    Parallelism is key to getting these benefits.

  • 8/13/2019 computer architecture slides 1

    14/27

    Performance: Deep Dive

    Relative performance:

    Perf(X)/Perf(Y) = ExTime(Y)/ExTime(X)

    Total execution time

    Wall clock time, response time or elapsed time

    CPU time

    User CPU time

    System CPU time

    Difficult to separate these components

    Use top command in Linux shell or Task Manager in Windows.

  • 8/13/2019 computer architecture slides 1

    15/27

    CPU Performance

    CPU execution time

    = CPU clock cycles per program X clock cycle time (clock period)

    = CPU clock cycles per program / clock frequency

    Program Set of (assembly/machine language)

    instructions

    CPU clock cycles per program

    = Instructions per program X average clock cycles per instruction

    = Instruction count (IC) X cycles per instruction (CPI)

    CPU execution time

    = IC X CPI X Tclk

    = IC X CPI / fclk

  • 8/13/2019 computer architecture slides 1

    16/27

    CPU Performance: An example

    Two programs foo and faa

    Instruction types: Instr_0 1 cycle

    Instr_1 2 cycles

    Instr_2 5 cycles foo: total 10 instructions

    Instr_0: 7

    Instr_1: 2

    Instr_2: 1

    faa: total 8 instructions Instr_0: 4

    Instr_1: 2

    Instr_2: 2

    Total clock cycles

    = 7*1+2*2+1*5 =16

    Total clock cycles

    = 4*1+2*2+2*5 =18

  • 8/13/2019 computer architecture slides 1

    17/27

    CPI details

    Different instructions have different individual CPIs

    Overall CPI is given by using a weighted average

    foo: CPI = 1.6

    faa: CPI = 2.25 Higher relative frequency of Instr_2

    CPI= Clock CyclesInstruction Count

    = CPIi Instruction Counti

    Instruction Count

    i=1

    n

    Relative frequency

    of instruction i

  • 8/13/2019 computer architecture slides 1

    18/27

    Power: Problem with Moores Law

    0.1

    1

    10

    100

    1,000

    10,000

    71 74 78 85 92 00 04 08

    Power

    (Watts)

    4004

    8008

    8080

    8085

    8086

    286

    386

    486

    Pentium

    processors

    Power Projections Too High!

    Hot PlateNuclear Reactor

    Rocket Nozzle

    Suns Surface

    Source: Intel

  • 8/13/2019 computer architecture slides 1

    19/27

    Circumventing the power wall

    P = CV2f

    V: 5 V 1V; f: 30 MHz3 GHz

    We can reduce voltage and capacitive load

    by only so much.

  • 8/13/2019 computer architecture slides 1

    20/27

    Other limitations in uniprocessors

    Constrained by

    power, instruction-

    level parallelism(ILP) and memory

    latency

  • 8/13/2019 computer architecture slides 1

    21/27

    Moores Law: New approach

    AMDs Barcelona Architecture

    Quad-core, 65 nm process

    2007

    (Courtesy: AnandTech)

    Increasing number of

    cores in a processor to

    be better prepared for

    the power wall.

    More processing donein parallel at the same

    clock frequency.

    Age of multicore

    processors

    22

  • 8/13/2019 computer architecture slides 1

    22/27

    Multiprocessor trends

    Larger number of coresBetter performance (speed, energy)

    Greater complexity in design and application porting

    Single-core Dual-core 8-core GPU NoC

    22

  • 8/13/2019 computer architecture slides 1

    23/27

    Benchmarking for performance Standard Performance Evaluation Corporation (SPEC) Integer (CINT2006) or Floating point (CFP2006)

    Reference: Sun UltraSparc II system at 296MHz

    specSPEC CINT2006 Result

    Copyright 2006-2013 Standard Performance Evaluation Corporation

    Cisco Systems

    GHz)Cisco UCS C220 M3 (Intel Xeon E5-2667 v2 @ 3.30

    SPECint2006 = 68.1

    SPECint_base2006 = 63.0CPU2006 license:9019 Test date: Sep-2013

    Test sponsor: Cisco Systems Hardware Availability: Sep-2013

    Tested by: Cisco Systems Software Availability: Aug-2013

    Results Table

    Benchmark Seconds Ratio Seconds Ratio Seconds Ratio

    Base

    Seconds Ratio Seconds Ratio Seconds Ratio

    Peak

    400.perlbench 263 37.2 263 37.2 264 37.1 210 46.5 210 46.5 210 46.5

    401.bzip2 349 27.6 349 27.6 349 27.6 346 27.9 346 27.9 346 27.9

    403.gcc 214 37.6 215 37.4 216 37.3 210 38.4 210 38.4 210 38.4

    429.mcf 119 76.7 119 76.7 119 76.5 119 76.7 119 76.7 119 76.5

    445.gobmk 367 28.6 367 28.6 366 28.7 331 31.7 331 31.6 331 31.7

    456.hmmer 133 70.1 133 70.1 133 70.1 133 70.0 133 70.0 135 69.1

    458.sjeng 359 33.7 359 33.7 388 31.2 352 34.4 352 34.4 352 34.4

    462.libquantum 5.48 3780 5.88 3520 5.48 3780 5.48 3780 5.88 3520 5.48 3780

    464.h264ref 400 55.4 399 55.4 398 55.6 327 67.6 327 67.7 327 67.7

    471.omnetpp 165 37.8 174 35.9 168 37.2 116 53.7 115 54.3 116 53.8

    473.astar 188 37.4 188 37.3 188 37.4 188 37.4 188 37.3 188 37.4

    483.xalancbmk 103 67.1 103 67.2 103 67.3 104 66.3 104 66.5 103 66.7

    Results appear in the order in which they were run. Bold underlined text indicates a median measurement.

    Source:

    www.spec.org

  • 8/13/2019 computer architecture slides 1

    24/27

    Benchmarking for power

    Performance Power

    Performance toPower RatioTarget

    LoadActualLoad

    ssj_opsAverage

    Active Power(W)

    100% 99.2% 28,593,082 3,239 8,828

    90% 89.9% 25,917,132 2,765 9,372

    80% 80.0% 23,041,812 2,499 9,221

    70% 70.0% 20,156,576 2,289 8,80560% 60.0% 17,296,778 2,061 8,393

    50% 49.9% 14,392,213 1,848 7,787

    40% 40.0% 11,531,023 1,671 6,902

    30% 30.0% 8,645,852 1,494 5,788

    20% 20.0% 5,766,436 1,322 4,363

    10% 10.0% 2,879,100 1,151 2,501

    Active Idle 0 688 0

    ssj_ops / power = 7,525

    SPECpower_ssj2008

    Copyright 2007-2013 Standard Performance Evaluation Corporation

    Dell Inc. PowerEdge M620 (Intel Xeon E5-2660 v2,2.20 GHz)

    SPECpower_ssj2008 = 7,525 overallssj_ops/watt

    Test Sponsor: Dell Inc. SPEC License #: 55 Test Method: Multi Node

    Tested By: Dell Inc. Test Location:Round Rock, TX,USA

    Test Date: Sep 12, 2013

    HardwareAvailability:

    Sep-2013 Software Availability: Sep-2012 Publication: Oct 16, 2013

    System Source: Single Supplier System Designation: ServerPower

    Provisioning: Line-powered

    Benchmark Results Summary

    Source:

    www.spec.org

  • 8/13/2019 computer architecture slides 1

    25/27

    Improving performance

    Make certain parts faster acceleration

    e.g. graphics acceleration using GPU while playing computer

    games

    How much can we improve?

    Amdahls Law:

    Let To = orignal execution time = Ta (time that is subject to

    acceleration) + Tu (time unaffected by acceleration);

    Acceleration = f Improved total time = Ti Overall speedup = S

    Ti = Ta/f + Tu S = To/Ti = (Ta + Tu)/(Ta/f + Tu)

    If f , S 1 + Ta/Tu = 1/(fraction of total runtime unaffected by

    acceleration)

    Corollary: Make the common case faster.

  • 8/13/2019 computer architecture slides 1

    26/27

    What about idle power?

    SPECPower results: 10% workload consumes more than

    25% of peak power

    Leakage power is a major concern with smaller

    technologies.

    Green data centres to have energy-proportional

    computing

    Barroso, L.A.; Holzle, U., "The Case for Energy-Proportional

    Computing," Computer, vol.40, no.12, pp.33,37, Dec. 2007

  • 8/13/2019 computer architecture slides 1

    27/27

    Conclusion

    Cost-performance-power tradeoff: Architects and designersare slowly winning the game.

    Hierarchical layers of abstraction Both in software and hardware

    Most important example of such abstraction: Hardware-software interface Instruction set architecture

    Performance Execution time (seconds/program)

    = (instructions/program)*(clock cycles/instruction)*(seconds/clock cycle)

    Most critical resource: Energy No longer area

    Paradigm shift to multicores

    Other parameters: reliability, scalability