ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI...

6
ECEN/CSCI 5593: Advanced Computer Architecture (ACA) Course Syllabus Instructor: Dan Connors E-Mail [email protected] Website: Desire2Learn: https://learn.colorado.edu I. Course Overview Advanced Computer Architecture (ACA) covers advanced topics in computer architecture focusing on multicore, graphics-processor unit (GPU), and heterogeneous SOC multiprocessor architectures and their implementation issues (architect's perspective). A range of levels are explored from deep submicron CMOS characteristics, microarchitecture, compiler optimization, parallel programming, run-time optimization, performance analysis & tuning, fault tolerance, and power-aware computing techniques. The objective of the course is to provide in-depth coverage of current and emerging trends in computer architecture focusing on performance and the hardware/software interface. The course emphasis is on analyzing fundamental issues in architecture design and their impact on application performance. To enable a better understanding of the concepts, hands-on assignments are used to explore issues in multicore and GPU architecture systems. Students have options in exploring their own interests in custom projects and assignments. New recorded video lectures in Spring 2017 New projects in Spring 2017: Students work in groups of up to two people, for projects related to acceleration and performance tuning of machine learning, computer vision, and deep learning. Students taking the course can investigate projects with access to NVIDIA, Xilinx, and Raspberry Pi resources: NVIDIA Jetson TX1 (http://www.nvidia.com/object/jetson-tx1-module.html) is the world's leading AI computing platform for GPU-accelerated parallel processing in the mobile embedded systems market. Its high- performance, low-energy computing for deep learning and computer vision makes Jetson the ideal solution for compute-intensive embedded projects. Jetson TX1 is a supercomputer on a module that's the size of a credit card. It features the new NVIDIA Maxwell™ architecture: GPU 1 TFLOP/s 256-cores, with CPU 64-bit ARM® A57 CPUs Memory 4 GB LPDDR4 | 25.6 GB/s Project potential: Drones & Unmanned Aerial Vehicles (UAVs), Autonomous Robotic Systems, Mobile Medical Imaging, Intelligent Video Analytics (IVA)

Transcript of ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI...

Page 1: ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI 5593: Advanced Computer Architecture (ACA) Course Syllabus Instructor: Dan Connors E-Mail

ECEN/CSCI5593:AdvancedComputerArchitecture(ACA)CourseSyllabus

Instructor: DanConnorsE-Mail [email protected]

Website: Desire2Learn:https://learn.colorado.edu

I. CourseOverview

Advanced Computer Architecture (ACA) covers advanced topics in computer architecture focusing onmulticore, graphics-processor unit (GPU), and heterogeneous SOC multiprocessor architectures and theirimplementation issues (architect's perspective). A range of levels are explored from deep submicron CMOScharacteristics, microarchitecture, compiler optimization, parallel programming, run-time optimization,performanceanalysis&tuning,faulttolerance,andpower-awarecomputingtechniques.Theobjectiveof the course is toprovide in-depth coverageof current andemerging trends in computer

architecture focusing on performance and the hardware/software interface. The course emphasis is on

analyzing fundamental issues in architecture design and their impact on application performance. To

enable a better understanding of the concepts, hands-on assignments are used to explore issues in

multicoreandGPUarchitecturesystems.Studentshaveoptionsinexploringtheirowninterestsincustom

projectsandassignments.

NewrecordedvideolecturesinSpring2017

New projects in Spring 2017: Students work in groups of up to two people, for projects related to

accelerationandperformance tuningofmachine learning, computervision, anddeep learning. Students

takingthecoursecaninvestigateprojectswithaccesstoNVIDIA,Xilinx,andRaspberryPiresources:

NVIDIAJetsonTX1(http://www.nvidia.com/object/jetson-tx1-module.html)istheworld'sleadingAI

computingplatformforGPU-acceleratedparallelprocessinginthemobileembeddedsystemsmarket.Itshigh-

performance,low-energycomputingfordeeplearningandcomputervisionmakesJetsontheidealsolutionfor

compute-intensiveembeddedprojects.JetsonTX1isasupercomputeronamodulethat'sthesizeofacredit

card.ItfeaturesthenewNVIDIAMaxwell™architecture:GPU1TFLOP/s256-cores,withCPU64-bitARM®A57

CPUsMemory4GBLPDDR4|25.6GB/s

• Projectpotential:Drones&UnmannedAerialVehicles(UAVs),AutonomousRoboticSystems,Mobile

MedicalImaging,IntelligentVideoAnalytics(IVA)

Page 2: ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI 5593: Advanced Computer Architecture (ACA) Course Syllabus Instructor: Dan Connors E-Mail

PYNQ-PythonProductivityforXilinxZynqProgrammableHardware

http://www.pynq.io/

PYNQisanopen-sourceprojectfromXilinxthatmakesiteasytodesignembeddedsystemswithZynqAll

ProgrammableSystemsonChips(APSoCs).UsingthePythonlanguageandlibraries,designerscanexploitthe

benefitsofprogrammablelogicandmicroprocessorsinZynqtobuildmorecapableandexcitingembedded

systems.UsingthePythonlanguageandlibraries,designerscanexploitthebenefitsofprogrammablelogicand

microprocessorsinZynqtobuildmorecapableandexcitingembeddedsystems.

PYNQuserscannowcreatehighperformanceembeddedapplicationswith

• parallelhardwareexecution

• hardwareacceleratedalgorithms

• real-timesignalprocessing

• highbandwidthIOandlowlatencycontrol

Zynq-7000AllProgrammableSoCFeatures

• DualARM®Cortex™-A9MPCore™withCoreSight™

• 32KBInstruction,32KBDataperprocessorL1Cache

• 512KBunifiedL2Cache,256KBOn-ChipMemory,630KBoffastblockRAM

• 85Klogiccells(13300logicslices,eachwithfour6-inputLUTsand8flip-flops)

RaspberryPi–ARMLinux-basedEmbeddedSystem(https://www.raspberrypi.org)

• LowTransistorCount

• LowPowerConsumption/HeatProduction

• Usedinmostmobiledevices:phonesandsmalldigitaldevices

• RaspberryPihassimilarrequirementstomobiledevices

Page 3: ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI 5593: Advanced Computer Architecture (ACA) Course Syllabus Instructor: Dan Connors E-Mail

II. CoursePrerequisites

Thiscourserequirestheunderstandingofdesignofprocessors,specificallycomputerorganizationandthe

instruction set architecture (ISA): ECEN 4593 (Computer Organization) or an equivalent first course in

computerorganizationanddesign.Studentsshouldalreadyunderstandsomecomputerinstructionsetand

knowhowtodesignacontrolunit,arithmeticunit,memory(cacheandvirtual),andvariousinput/output

interfaces.

III. CourseOutline

1. IntroductiontoComputerDesignandQuantitativePrinciplesofArchitecturePerformanceAnalysis

• Technologyandcomputertrends

• Measuringcomputersystemperformance

• Benchmarksandmetrics

2. InstructionSetPrinciplesandExamples

• ClassificationofInstructionSetArchitectures(ISA)–RISC,CISC,VLIW,EPIC

• Predicatedexecutionandcompiler-controlledspeculation

3. AdvancedMicroarchitectureandInstruction-LevelParallelism

• Superscalarandpipelineoperation

• Instruction-LevelParallelism(ILP)

• Dynamicinstructionscheduling(Tomasulo,scoreboarding,reservationstationdesign)

• Overcomingcontrolhazard-branchprediction(2-bit,two-level)

• Compileroptimizationandanalysis

4. Memory-HierarchyDesign

• Multi-levelcachedesignissues

• Performanceevaluation

• Memoryprefetchingtechniques

5. Thread-LevelParallelism

• Multicoresystems

• Threadcontrolmodels(fine-grained,coarse-grained,hyper-threading)

6. Data-LevelParallelism

• Vectorprocessing

• GraphicsProcessingUnits(GPU)

• NVIDIAarchitecturemodels–Fermi,Tesla,Kepler,Maxwell,Pascal

• CUDA/OpenCLprogramming

7. Performance-tuningandAnalysisofModernApplications

• Run-timeoptimization

• Binaryinstrumentation

• Hardwareperformancemonitoring

• Performancetuning

8. ArchitectureImplementationIssuesandAnalysis

• Power-DynamicVoltageFrequencyScaling(DVFS),Energy-DelayProduct(EDP)

• Architecture physical layer concepts including device&layout, manufacturing constraints,

architectures,defecttolerance,anddesignvariability.

Page 4: ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI 5593: Advanced Computer Architecture (ACA) Course Syllabus Instructor: Dan Connors E-Mail

CourseSchedule

WEEK1-Introduction,InstructionSetArchitecture,andPipelines

Topics:

• Descriptionofarchitecture,micro-architectureandinstructionsetarchitectures.

• PipeliningReview-basicconceptofpipelineandtwodifferenttypesofhazards.

• PipelineCPI

• ProcessorPipelineHazards

• ComputerArchitecture&TechTrends

• ProcessorSpeed,Cost,Power

• MeasuringPerformance

• BenchmarksStandards

• IronLawofPerformance

• Moore'sLaw

• Amdahl'sLaw

• Lhadma'sLaw

• Gustafson'slaw

WEEK2-ControlHazards

Topics:

• MispredictionPenalties

• BranchPredictionTechniques

• Two-levelCorrelationPredictors:PAg,GAg

• HybridPredictors

• ReturnAddressStack

• LoopPrediction

• UnderstandingCodeExecutionandCodingPracticesforBranchPrediction

WEEK3andWEEK4–BaseCacheMemory,DynamicExecutionandSuperscalarModel

Topics:

• Cachememorycharacteristics

• InstructionLevelParallelism(ILP)

• Out-of-order execution- common methods used to improve the performance of out-of-order

processorsincludingregisterrenamingandmemorydisambiguation.

• Commonissuesforsuperscalararchitecture.

• Kindsofarchitecturesforout-of-orderprocessors.

WEEK5andWEEK6–VLIW,EPIC,andILPCompilerOptimizationsforArchitectures

Topics:

• TraditionalCompilerOptimization:Peephole,LoopUnrolling,Inter-procedural,andInlining

• CompilerOptimizationforInstructionLevelParallelism(ILP)andProfile-DirectedTechniques

• Out-of-order execution- common methods used to improve the performance of out-of-order

processorsincludingregisterrenamingandmemorydisambiguation.

Page 5: ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI 5593: Advanced Computer Architecture (ACA) Course Syllabus Instructor: Dan Connors E-Mail

WEEK7-MulticoreArchitecturesandVector/MultimediaInstructionSets

Topics:

• Simultaneousmultithreaded(SMT)architectures

• SMTArchitectureAlternatives

• SMTarchitecture:OSimpactandadaptivearchitectures

• Multi-coreArchitectures

• SingleInstructionMultipleData(SIMD)

• IntelArchitectureDevelopment:MMX,SSE

• InlineAssemblyandAssemblyIntrinsics

WEEK8thruWEEK13–GraphicsProcessingUnit(GPU)Architecture

Topics:

• NVIDIACUDA/GPUProgrammingModel

• GPUHardwareandParallelCommunication

• GPUFundamentalParallelAlgorithms

• OptimizingGPUPrograms

• TheFrontiersandFutureofGPUComputing

• OpenCL–OpenComputeLanguage

• MobileGPUSystemArchitectureExploration:NVIDIATX1

WEEK14–RuntimeOptimizationandCompilation

Topics:

• DynamiccompilationandCodeTranslations

IV. LearningOutcomes

Astudentwhohassuccessfullycompletedthiscourseshouldbeableto:

1. Analyzevariousperformancecharacteristicsofacomputersystem.

2. Applydigitaldesigntechniquestothemicroarchitectureconstructionofaprocessor.

3. Translateassemblylanguageprogramsto/fromhigh-levellanguagecodesandalgorithms.

4. Analyzehardware&softwaretrade-offstodesigntheinstructionsetarchitecture(ISA)interface.

5. Understandadvancedissuesindesignofcomputerprocessors,caches,andmemory.

6. Analyzeperformancetrade-offsincomputerdesign.

7. Applyknowledgeofprocessordesigntoimproveperformanceinalgorithmsandsoftwaresystems.

8. Acquireexperiencewithtoolsforstatisticalanalysisofinstructionsettrade-offs.

9. GaintheabilitytodevelopparallelGPGPUsolutionsofCUDAandOpenCL

V. RequiredTextandMaterials

HennessyandPatterson,ComputerArchitecture-AQuantitativeApproach,4thor laterEdition(ISBN-13:

978-0123704900ISBN-10:0123704901Edition:4th)-thisisthemaintextbookfortheclass.

VI. Assessment&Assignments

Assignments:Thefollowingprogrammingassignmentsarescheduled:• Pin–Binaryinstrumentationtooltoanalyzeprogrambehaviors

o Choiceofbranchpredictionorcachedesignsimulation.

• CUDAprogramming-Vectoraddition

• CUDAprogramming-Histogramgeneration

• CUDAprogramming-Imagefiltering

Page 6: ECEN/CSCI 5593: Advanced Computer …ecee.colorado.edu/~ecen5593/ECEN5593_Syllabus.pdfECEN/CSCI 5593: Advanced Computer Architecture (ACA) Course Syllabus Instructor: Dan Connors E-Mail

ReadingAssignments: There are several technical papers (conferenceproceedings, journal articles, andtechnical reports) assigned through the semester. Reading technical papers in the field of computer

architectureisimperativetounderstandingfuturedirectionsinthefield.Assignmentswillrequirestudents

towritebriefoverviewsoranswertechnicalquestionsaboutthepapersassigned.Subjectmatterfromthe

readingassignmentsarelikelytobecoveredinexams.

FinalExam:Therewilla take-homefinalexamthatcovers theconceptsof thecourse.Theexamproblemsarecloselyrelatedtothelectures,homeworkassignments,andassignedreadings.The

finalexamwillbecumulative,coveringallsubjecttopics.

FinalProject:Therewillbeaprojectforyoutoworkonasanindividualorinagroupoftwopeople.Theprojectwill count as15%of your grade, andwill be a significant amountofwork.The assignment is to

extendthesemesterprojectortoanalyzesomeinterestingdataornewarchitecturefeature.Studentsare

able to write survey papers as a second option to the project. The project will be divided into several

milestones, one checkpoint being a presentation ofwork.Details about the project and schedulewill be

announcedlaterinthesemester.

BasisforFinalGrade

Student’sgradeswillbeassessedbasedontheircompletedhomework,quizzes,project,in-classexams,and

the final exam. Homework assignments are designed to provide active learning for the student by

exercisingthevarioustopicscoveredbythecourse.Examswillbedesignedtoassessthestudent’sability

tomaster thedifferent topicareas, and theiraptitude ineachof the learningoutcomes. Thepercentage

giventoeachassessmentmethodisgivenbyTable1.

Table1.GradeAssessmentAssessment %ofFinalGrade

ReadingAssignments 10%

Assignments&Checkpoints 40%

Project 20%

FinalExam 30%

Total 100%

CoursePolicies

LateWorkPolicy: Homeworkassignmentsmustbe turned inat thebeginningof class,else itwillbe

consideredlate.Astudent’sscorewillbereducedbya20%penaltyforsubmittingwork,onesecondto24

hourslate.

StudentHonorCode:StudentsshouldbefamiliarwiththeCollegeofEngineeringandAppliedSciencesstudenthonorcode.Allhonorcoderuleswillbeadheredtointhisclass.

Appointments:Studentsareencouragedtomakeatleastoneappointmentwiththeprofessorduringthesemester.Appointmentscanbemadebyemail.Studentsareencouragedtoexploreresearchopportunities,

expressingconcerns,offeringsuggestions,andseekingadviceareamongthewelcometopics.