Accelerating Progress in Bioinformatics: Education, Technology, Infrastructure, and Application

40
Accelerating Progress in Bioinformatics: Education, Technology, Infrastructure, and Application OCCBIO 2007 Panel Discussion

description

OCCBIO 2007 Panel Discussion. Accelerating Progress in Bioinformatics: Education, Technology, Infrastructure, and Application. Hardware Acceleratos for Bioinformatics Anthony D. Johnson The University of Toledo. ? Hardware Acceleration Processors ?. - PowerPoint PPT Presentation

Transcript of Accelerating Progress in Bioinformatics: Education, Technology, Infrastructure, and Application

Page 1: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Accelerating Progress in

Bioinformatics: Education, Technology,

Infrastructure, and Application

OCCBIO 2007

Panel Discussion

Page 2: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Hardware Acceleratos for

Bioinformatics

Anthony D. Johnson

The University of Toledo

Page 3: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

? Hardware Acceleration Processors ?

Accelerating Bioinformatics beyond microprocessors,

Inherent limitations of microprocessors,

already at the upper limit on clock speed,

limited degree of parallelism in hardware,

fixed hardware architecture not adapted to: FASTA, BLAST.

Acceleration Processors already developed using: high parallelism in hardware, mature technologies.

Page 4: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Computational tasks in Bioinformatics

At the top abstraction level, most Bioinformatics tasks can be

described as partial matching of character strings, or string

patterns,

Bioinformatics research tasks characterized by: large data sets, minimal dependency between data

elements,

huge numbers of simple processing steps.

Page 5: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Multiple platform computing systems

offer hardware parallelism in the form of: computer clusters, multi-core processors;

slow data transfer between: processors and shared memory, processors and distributed memory;

powering supercomputers is costly (up to 50MW).

Page 6: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Field Programmable Gate Arrays

Very Large Scale Integration (VLSI) electronic components, manufactured using cutting edge VLSI technologies, architectures completely application-independent, configurable functionality:

after the manufacturing of the FPGA has been completed,

on the lowest functional module level,

"on the fly" - while running an application, vast configurable interconnect resources, massive hardware parallelism, power consumption significantly below microprocessors.

Page 7: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

FPGA Architectures

Arrays of identical configurable Logic Circuit Modules (LCMs)

vendors refer to propriotary LCMs by different names:

Logic Module (Actel),

Logic Array Block (Altera),

Configurable Logic Block (Xilinx);

Architectures loosely classified by the size of LCMs:

fine-grain,

coarse-grain,

mixed-grain.

Page 8: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Virtex-2 LCM

Includes:

four slices,

one switch matrix.

Page 9: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Virtex-2 Half-Slice arcitecture

Page 10: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Virtex-2 slice Configurations

Look Up Table,

Shift Register,

Memory.

Page 11: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

FPGA Architectures - continued

modern FPGAs feature a variety of specialized modules:

adder circuitry,

multiplier arrays,

memory blocks,

whole microprocessors,

blocks of fast I/O ports,

Digital Delay Loops for clock skew and frequency

management.

Page 12: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Virtex-2 Memory and Multipliers

Close coupling between:

memory and multiplier blocks.

Page 13: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

FPGA Characteristics

variable number of LCMs - 4k to 200k,

variable number of I/O connection pads (in excess of 1400),

speed grade,

package type.

Page 14: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Other Candidates for Acceleration Processors

General Purpose Graphics Processing Units (GPGPUs)

stream processors (stream = set of records which require similar computation)

GPUs enhanced for supporting some other FP applications,

extreme parallelism of the GPU pipeline makes them suitable for applications with: large data sets, high parallelism, minimal dependency between data elements,

may lack reliability needed in scientific HPC.

Page 15: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Other Candidates for Acceleration Processors

Cell Processor by IBM, Toshiba & Sony

designed for PlayStation’s enhanced graphics processing,

contains one scalar and eight vector processors. Vector processors:

pipeline both, the instruction execution and

the data I/O, have a degree of superscalar implementation,

i.e. parallelism in hardware.

Page 16: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Other candidates for Acceleration Processors

Array Processor by ClearSpeed

Multithreaded Array Processor for floating point operations: 64 processing elements in an 8x8 array, FP unit, local memory, 384KB SRAM, I/O ports;

programmable only in C language.

Page 17: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

High Level programming Languages

Every researcher is fluent in one HLL,

one HLL is included in all Bachelor Degree curricula,

HLLs’ characteristics: semantics and syntax are oriented to the

application for which originally developed, hundreds have been developed with some goal

in mind - small percentage is still in use.

Page 18: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

HLLs - continued

algorithms described by consecutive operations,

require translation to a lower level language,

translation is platform/compiler dependent,

source code is nominally portable.

Page 19: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Hardware Description Languages (HDLs)

Only few engineering students learn an HDL,

HDLs’ characteristics:

designed to describe hardware,

intended originally for hardware simulation,

describe algorithms by concurrent

operations on electrical signals.

Page 20: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

HDLs - continued

there are two standardized HDLs :

VHDL developed under a grant from

DOD,

VHDL = VHSIC HDL,

VHSIC= Very High Speed Integrated

Circuits,

Verilog is a product of a single company.

Page 21: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Configuring the FPGAs

After an HDL code has been extensively simulated:

1. synthesis tool converts the code into a net-list form,

due to the original purpose, NOT all HDL constructs

are synthesizable!

2. vendor tools map the net list onto the FPGA's hardware,

3. designer specifies mapping constraints to meat the timing

requirements,

4. extensive verification after each step in the process.

Page 22: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Hybrid HLL/HDL Languages

Incorporate the hardware component into an HLL,

tend to bear resemblance to the C-language,

their name predominantly contains the character C

,

aim to imply that power of acceleration processors is at finger tips of researchers,

re-education still needed to transition from sequential to parallel programming.

Page 23: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Hybrid HLL/HDL - continued

development of libraries of common functions crucial for adoption,

some vendors provide tools for conversion form fortran and C to Hybrid-C;

After the Hybrid-C code has been obtained: all steps listed for HDLs are necessary, vendors are working on automating the

steps, automation implies artificial intelligence

solutions.

Page 24: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

A sample of Hybrids and tools for converting C

code to HDL

C2H ChiMPS Catapult-C

Handel-C HARWEST Carte

Mitrion-C SystemC Chapel

ROCCC Impulse-C DIME-C

Page 25: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Computing platforms with FPGA acceleration

processors

General trend: patforms with hybrid processing resources.

Means for connecting FPGAs to microprocessors: main processor bus, memory slot, I/O slot, HTX expansion bus slot.

Page 26: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

One Shared Architecture

Two Opteron processor sockets on a board,

sockets linked by a high-speed, HyperTransport (HTX) bus,

one socket contains an Opteron – for control tasks,

the other holds an FPGA card – for intensive data-processing,

examples: XtremeData’s XD1000 FPGA Coprocessor

Module, DRC’s Reconfigurable Processing Unit, RPU110-

L200.

Page 27: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Other Interesting Architectures

Systems using two processors and one HTX slot: IBM x3455, HP DL145 Server.

Other architectures: Cray XD1 cluster supercomputer with

proprietary extension modules, first couple: Opteron - FPGA - for processing, second couple: Opteron - FPGA - for

communication; Celoxica’s RCHTX high-performance computing

board plugs into an HTX slot, SRC Computers plug MAP processor into a

memory slot.

Page 28: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Other significant platforms

Mitrion Virtual Processor: runs on Virtex-4 FPGAs on the SGI RASC RC100

compute blades, runs commercially available BLAST application;

SGI Altix family servers are equipped with SGI RASC RC100 computation blades,

Nallatech System uses: BenONE FPGA-based computing card on H100

Series platform, Cray XT Super Computers:

will replace the XD1 line, will use a DRC’s extension module with Virtex-4 FPGAs.

Page 29: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Benchmarks of Bioinformatics Algorithms

FASTA running on the Cray XD1 System,

benchmarking results show unprecedented FPGA

speedups [2].

Demonstrating substantial benefits to the users generates the traction for a breakthrough into mainstream HPC markets.

Page 30: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Micro-RNA Comparison benchmark

query sequences: 3685 sequences (~20 characters ) databasefile: the first of all 24 human genome

chromosomes Platform 1: Cray XD1, using one Virtex2 Pro 50

FPGA Speedup vs. Opteron 10X

Page 31: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Bacillus_anthracis DNA Comparison

query sequences: AE017024 through AE017041,

(300K characters per sequence) database file: AE016879 (more than 5M characters)

Platform 1: Cray XD1, using one Virtex2 Pro 50 FPGA speedup vs. Opteron: 50X

Platform 2: Cray XD1, using one Virtex4 LX160 FPGA speedup vs. Opteron: 100X (from 8 hours down to 5 minutes)

Platform 3: Cray XD1, using five Virtex4 LX160 FPGAs

speedup vs. Opteron: 500X [3]

Speedup scales linearly with the number of FPGAs !!!

Page 32: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Amino Acid Search

query sequences OpenFPGA : ras (60 characters)

myc (189 characters)

sec (351 characters) database file: 24 human genome chromosomes

translated

into amino acids Platform 1: Cray XD1, using one Virtex-II Pro 50 Speedup vs. Opteron: 20X to 50X

Speedup increases with the sequence length.

Page 33: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Other Benchmark Outcomes

After the 100X and 500X speedups by FPGA-acceleration

processors on XD1:

all other reported speedups are less significant,

few researchers are likely to settle for 10X to 20X,

all vendors will scramble to catch up with Cray,

Bioinformatics should hope for a much better HPC

landscape in the near future.

Page 34: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Recent References

[1] Tripp, J.L., Gokhale, M.B, Peterson, K.D.: Trident: From

High-Level Language to Hardware Circuitry, IEEE Computer,

March 2007,pp.28-37.

[2] Storaasli, O., YU, W., Strenski, D., Multbi, J.,: Evaluation

of FPGA-Based Biological Applications, CUG 2007.

[3] Storaasli, O., ORNL’s Future Technologies Group -

personal communication.

[4] Lazou,C.: FPGAs in HPC Landscape, EnterTheGrid -

PrimeurMonthly, 2007.

Page 35: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Acknowledgment: Results courtesy of:

On research supported by the LDRD Program of ORNL for the U.S.

Department of Energy under Contract DE-AC05-00OR22725

Page 36: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Computer Architectures

Jargons are living, changing, confusing: Von Neumann architecture is defined by basic

hardware subsystems and their interaction,

Page 37: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Computer Architectures - continued

in the current jargon, architecture is defined by the instruction set of the processor, x86 = IA-32: Pentium, Athlon, IA-64: Itanium, 2001, PowerPC: PPC620, IBM, SPARC, SUN, PA-RISC: PA8700, HewletPackard;

microarchitecture determines how processor executes instructions.

Page 38: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Microprocessor Architectures: Pipelines

Still basic Von Neumann architecture, architectures of subsystems have undergone extensive development, first to achieve one instruction per clock cycle, exploiting ILP,

Pipe-lining:

executes in parallel different parts of multiple instructions on different parts of the pipe-lined processor's hardware,

Page 39: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Microprocessor Architectures: Threading

Execution of multiple instructions per clock cycle, exploiting independent code sequence level parallelism by:

threading, which uses multiple pipe-lines to execute

multiple instruction sequences concurrently .

both, pipelining and threading are not always clear winners,

pipelining speedup is eroded by hazards which stall execution of an instruction:

structural hazards consequence of lack of hardware

parallelism, data hazards consequence of instruction dependencies, control hazards consequence of branches which change

the PC.

Page 40: Accelerating Progress in  Bioinformatics: Education, Technology, Infrastructure,  and Application

Microprocessor Architectures - Summary

In the end:

fixed architectures,

compromise between demands of different applications,

programming using HLLs,

compilers adapt processing to the instruction set,

speed is limited, and paid for, by power dissipation.

powering supercomputers takes up to 50MW.