Lecture 1: Introduction. Course Outline The aim of this course: Introduction to the methods and...

34
Lecture 1: Lecture 1: Introduction Introduction

Transcript of Lecture 1: Introduction. Course Outline The aim of this course: Introduction to the methods and...

Lecture 1:Lecture 1:

Introduction Introduction

Course Outline

The aim of this course:

Introduction to the methods and techniques of performance analysis of computer systems.

Solve computer performance analysis problems related to • measuring performance of computer systems, • comparison of computer systems• predicting the future performance under different configurations, • designing new applications that meet performance requirements• planning the capacity

Hands-on experiments on modern hardware/software systems

Course Outline1. Introduction

2. Hardware and software aspects of computer systems

3. Performance metrics

4. Performance measurement tools and techniques

5. Benchmarking

6. Statistical analysis of performance experiments

7. Design of experiments

8. Processor Performance • ALU• Pipelining• Optimizing program performance

9. Memory Hierarchy• Cache performance• Optimizing program performance

10. Performance of multiprocessor systems

11. Simulation

12. Queueing Theory

Course Outline

Textbook: D. Lilja, “Measuring Computer Performance: A Practitioner's Guide”,

Cambridge University Press

Reference Books: R. Jain, “The Art of Computer Systems Performance Analysis”, John Wiley P.J. Fortier, H.E. Michel, “Computer Systems Performance Evaluation and

Prediction”, Digital Press K.R. Wadleigh, I.L. Crawford, “Software Optimization for High Performance

Computing”, Prentice-Hall Computer Systems: A Programmer’s Perspective, R.E. Bryant, D.R.O’Hallaron,

Pearson Computer Architecture, J.L. Hennessy, D.A. Patterson, Morgan & Kaufmann High Performance Computing, K.R. Wadleigh, I.L. Crawford, Prentice Hall

Course Outline

Grading:

Assignments 30% Midterm 30% Final Exam 40%

Performance Evaluation of Computer Systems

Computer systems consist of:• Processor

• Memory

• Input/Output

• Operating system

• Network

instruction data

Memory

ProcessorInput unit

Output unit

P P P P

Network

Performance Evaluation of Computer Systems

Performance depends on:• Technology

Technology

In recent years, microprocessors have become smaller and denser.

1945 2010

Computer ENIAC Laptop

Devices 18 000 17 000 000 000

Weight (kg) 27 200 2.8

Size (m3) 68 0.0018

Power (watts) 20 000 5.5

Cost ($) 4 630 000 1 000

Memory (bytes) 200 2 147 483 648

Performance (Flops/s)

800 2 000 000 000

Moore’s Law

Gordon Moore predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.

Moore’s Law

• Number of transistors

• Performance

Double every 1.5 year.

Top500 List at June 2013Computer Country Vendor Processor +

GPU + interconnect

# cores Rmax (Pflops)

Rpeak (Pflops)

1 Tianhe-2 China NUDT Xeon 2.2GHz+ Nvidia GPU +

custom

3 120 000 33.9 54.9

2 Titan USA Cray Opteron 2.2GHz+ Nvidia GPU + CRAY Gemini

560 640 17.6 27.1

3 Sequoia USA IBM BlueGene 1.6GHz+ custom

1 572 864 17.2 20.1

4 K computer Japan Fujitsu Sparc64 2.0GHz+ Tofu

705 024 10.5 11.3

5 Mira USA IBM BlueGene 1.6GHz+ custom

785 432 8.6 10.1

Performance Units

Speed1 Mflop/s 1 Megaflop/s 106 Flop/second1 Gflop/s 1 Gigaflop/s 109 Flop/second 1 Tflop/s 1 Teraflop/s 1012 Flop/second 1 Pflop/s 1 Petaflop/s 1015 Flop/second 1 Eflop/s 1 Exaflop/s 1018 Flop/second

Storage1 MB 1 Megabyte 106 Bytes1 GB 1 Gigabyte 109 Bytes 1 TB 1 Terabyte 1012 Bytes 1 PB 1 Petabyte 1015 Bytes

Moore’s Law

Limits of Moore’s Law:

Moore’s Law is exponential. Exponentials can not last forever.

Heat is a problem in today’s CPUs

The size of atoms is the fundamental barrier

Moore’s Law Reinterpreted

Number of cores per chip doubles every 2 years

• Multicore architectures

Moore’s Law Reinterpreted

Number of cores per chip doubles every 2 years, while clock speed decreases

• Multicore architectures

Performance Evaluation of Computer Systems

Performance depends on:• Technology

• Instruction Set Architecture

Instruction Set Architecture-ISA

Instruction Set Design:

RISC / CISC• Code density

Number of operands• Stack machines (0-operand)

• Accumulator machines (1-operand)

• Register machines (2-operand, 3-operand)

Performance Evaluation of Computer Systems

Performance depends on:• Technology

• Instruction Set Architecture

• Organization

Organization

Memory Hierarchy

Hierarchy Speed Size

Within the processor (CPU-registers-on chip cache)

1 ns Byte

L2 cache (SRAM) 10 ns KByte

Main Memory (DRAM) 100 ns MByte

Secondary storage (Disk) 10 ms Gbyte

Tertiary Storage (Tape/Disk) 10 s TByte

CPU

Registers

L1 Cache

L2 Cache

Main Memory

Disk

Tape

Organization

Manycore ChipsSingle-core Dual-core

CPU

Registers

L1 Cache

L2 Cache

Main Memory

CPU

Registers

L1 Cache

L2 Cache

Main Memory

CPU

Registers

L1 Cache

Performance Evaluation of Computer Systems

Performance depends on:• Technology

• Instruction Set Architecture

• Organization

• Software

Software

The primary duty of software developers is to create functionally correct programs

Performance evaluation is a part of software development for well-performing programs

Performance Analysis Cycle

Have an optimization phase just like testing and debugging phase

Code Development

Measure

Modify / Tune

Analyze

Usage

Functionally complete and correct program

Complete, correct and well-performing program

Systematic Approach to Performance Evaluation

1. Define the system

2. List services offered by the system

3. Select performance metrics

4. List system and workload parameters

5. Select factors and their values

6. Select evaluation technique

7. Select the workload

8. Design the experiment

9. Analyze the data

10.Present the results

1. Define the system

Client ServerNetwork

An Example:

2. List services offered by the system

Service:

Remote procedure call

3. Select performance metrics

Metrics:

• Time taken for the service • Elapsed time • Local CPU time• Remote CPU time

• The rate at which the service can be performed• calls per second

4. List system and workload parameters

• System Parameters • Speed of the network • Speed of the Local CPU• Speed of the Remote CPU• Operating system overhead

• Workload Parameters• Time between successive calls• Number and sizes of the call parameters

5. Select factors and their values

Factors are the parameters to be varied and their values are called levels.

For example:• Factor: speed of the network;

2 levels: short distance (in the campus), long distance (across the country)

• Factor: Sizes of the call parameters; 2 levels: small, large

• Factor: number of consecutive calls; 11 levels: 1,2,4,8, … 1024

6. Select evaluation technique

Three techniques:

• Analytical modeling• Simulation• Measuring the real system

7. Select the workload

Depending on the evaluation technique, the workload may be expressed in different forms.

• Analytical modeling• probability of various requests

• Simulation• a trace of requests measured on a real system

• Measurement• user programs

8. Design the experiment

• In the example: 2x2x11=44 experiments

• Phase 1• Number of factors is large but number of levels is small

• Phase 2• Reduce the number of factors and increase the

number of levels

9. Analyze the data

• Analysis of Variance• Regression• etc.

10. Present the results

• Use graphical form to represent the data rather than statistical results