My parallel universe

32
1 Mitt Parallella Universum Latinoware-2014 Andreas Olofsson [email protected] Twitter: @adapteva

Transcript of My parallel universe

1

Mitt Parallella Universum

Latinoware-2014

Andreas [email protected]

Twitter: @adapteva

2

The Prologue...

In 2008 I quit my job to launch a chip startup with the goal of boosting processor energy efficiency by 25X.

3

25X: Equivalent to driving from Patagonia to Alaska on one tank of gas!

4

Why?

5

Peak CPU Performance is Stalling!

6

Real CPU performance is stalling!

7

..and it gets worse!

8

So What?

9

Communication

Robotics

IoT

Datacenters/HPC

Life without Moore is boring!

10

The “Epiphany”

11

The Epiphany Manycore Architecture

RISC SRAM

Router DMA

No CachesNo Standards No MMU No Legacy...No Power

12

Epiphany-III Power-up (2011): Success!!

Happy but tired Pappa!

13

The market reception....

13

• Ambric• Asocs• Aspex• Axis Semi• BOPS• Boston

Circuits• Brightscale• Chameleon• Clearspeed

• Ambric• Asocs• Aspex• Axis Semi• BOPS• Boston

Circuits• Brightscale• Chameleon• Clearspeed

• PACT• Picochip• Plurality• Quicksilver• Rapport• Recore• Sandbridge• SiByte

• TILERA

• PACT• Picochip• Plurality• Quicksilver• Rapport• Recore• Sandbridge• SiByte

• TILERA

• SiCortex• Silicon Hive• Spiral Gateway• Stream

Processors• Stretch• Venray• Xelerated

• XMOS• Zililabs

• SiCortex• Silicon Hive• Spiral Gateway• Stream

Processors• Stretch• Venray• Xelerated

• XMOS• Zililabs

How the $@%# will we program this thing??

14

There is no “C” of parallel programming

Erlang SystemC Intel TBB Co-Fortran Lisp Janus

Scala Haskell Pragmas Fortress Hadoop Linda

Smalltalk CUDA Clojure UPC PVM Rust

Julia OpenCL Go X10 Posix XC

Occam OpenHMPP ParaSail APL Simulink Charm++

Occam-pi OpenMP Ada Labview Ptolemy StreamIt

Verilog OpenACC C++Amp Rust Sisal Star-P

VHDL Cilk Chapel MPI MCAPI Java

15

The Problem(s)!

● Parallel programming is HARD!

● Productivity matters. Time is money

● <1% of developers know parallel programming

Technology doesn't move backwards!

16

The Obvious Answer:

Open Source

Collaboration!

17

Presenting “Parallella”

● Launched in September 2012 at $99 (now starting at $119)

● Open source SW/HW!

● Runs Linux (Ubuntu)

● Dual-core ARM A9 processor

● A sizable FPGA

● 1GB RAM USB, HDMI, GigE

● 16/64 Epiphany coprocessors

● 50 Gbit/sec IO, 25/100 GFLOPS

18

Parallella Mission and Principles

● Mission: To help make parallel computing ubiquitous

● Principles:

● Complete and open documentation● Low cost● Open source software● Open standards● Open source hardware (schematics, layout)● Open collaboration:

http://github.com/parallella

http://forums.parallella.org

19

Some Perspective...

● 1993 CM-5● 1024 processors● 136 GFLOPS/100KW● #1 in 1993 Top500 List● Price: >$30M

● 2014 Parallella-64● 66 processors● 100 GFLOPS*/5W● #1 in energy efficiency● Price: $199*

20

Yes, but does it work?

21

25X: Size does matter... Tianhe-2 ● 33 PFLOPS● $390M USD● 24 MW● Insanity!!!!

“There is STILL plenty of room at the bottom”

33 PFLOPS=~16 28nm Epiphany Wafers**

22

Now What?

23

Parallella Research in 2014

● >10,000 Parallella boards shipped● 200+ University collaborations● $10K in hardware donated● Active Research Areas:

● Computer science education● Robotics/drones● Software defined radio● HPC

24

Parallella Universities in South America

Brazil:● Sao Paolo State University● CELTAB● Federal University of Uberlandia

Argentina:● Universidad Austral, Argentina● Universidad De Buenos Aires● Universidad Nacional de La Plata● Universidad Tecnologica Nacional● Pontificia Universidad Javeriana● Univesidad Nacional de Cordoba

Chile:● Universidad Mayor

Colombia:● Universidad Industrial de Santander

25

Some Parallella Lessons

● Openness more important than cost● You CAN build hardware with a profit outside China, we did it!

● Collaboration is VERY hard work● Time is our devs' most precious resource● Ease of use wins over performance very time.(simplicity+docs+support)

26

How we benefited from open source

● As consumers:● Linux, U-boot, Ubuntu, Beaglebone, Verilator

● As recipients:● Eclipse Multicore IDE ($1M)● OpenCL ($1M)● Multicore Epiphany simulator ($50K)● Demos ($50K)

27

It is “your” responsibility to make pervasive parallel computing a reality!

Explorers1. Create the tools to make parallel programming easier

2. Create algorithms that scale (Amdahl)

3. Create a universal parallel software stack

Teachers1. Rewrite the computer science curriculum

2. Retrain 20M programmers

28

The Future of HW: A Brief Summary

Constraint --> Result

Performance limits Massive parallelism

Thermal density Slow clocks (1MHz-1GHz)

Failure rate Distributed systems

Bandwidth No shared resources

Density 3D chip stacking

Efficiency Heterogeneous HW

Productivity Heterogeneous SW

Amdahl's law New algorithms

Development cost Open collaboration

Latency Open collaboration

29

Get ready now!!

●Critical code must be performance scalable to 1000 threads

●You (or a tool) will manage memory in software

●Know where in the universe your bits are stored!

●The hardware will fail often, can your SW handle it?

●The minimum number of languages is 2.

30

The Future is Heterogeneous

FPGA● Irregular math● IO● Customization

CPU● Legacy code● 90% of LOC● <100GFLOPS

ASIC● Makes comeback at end of Moore's Law

● Another 100X boost

Accelerators● Math crunching● Scalable● >100 GFLOPS

31

16K-64K CPUs1MB/core (3D)~20 TFLOPS

0.2W-20W

16K-64K CPUs1MB/core (3D)~20 TFLOPS

0.2W-20W64 CPUs

32KB/core100 GFLOPS

0.1W-2W

64 CPUs32KB/core

100 GFLOPS0.1W-2W

64 CPUs128KB/core80 GFLOPS

(DPF)0.1W-3W

64 CPUs128KB/core80 GFLOPS

(DPF)0.1W-3W

1K CPUs128KB/core

~1.2 TFLOPS0.4W-40W

1K CPUs128KB/core

~1.2 TFLOPS0.4W-40W

By 2018 there WILL be 64K-core chips!

This is a new world. Without legacy, a great opportunity to do software right!

2013 2015 2015 2018

32

Getting your hands dirty

● Tomorrow: LAB2 from 10am-2pm

● Email: [email protected]

● Twitter: @adapteva