Using the Starbridge Systems FPGA-based Hypercomputer for Cancer Research Experiences of a...
-
Upload
alaina-flynn -
Category
Documents
-
view
214 -
download
0
Transcript of Using the Starbridge Systems FPGA-based Hypercomputer for Cancer Research Experiences of a...
Using the Starbridge Systems FPGA-based Hypercomputer
for Cancer ResearchExperiences of a computational chemist/biologist
Jack Collins, Ph.D.
Advanced Biomedical Computing Center
SAIC/National Cancer Institute
Frederick, MD
Collins E195/MAPLD20042
Motivation
• New technologies in proteomics, genomics, and imaging are providing more data and challenging the conventional wisdom of biologists. Computational biologists must develop more realistic and precise models of biological systems at the cellular and network levels to help make sense of this new data.– Biomedical research needs to measure
performance in “Heartbeats to Solution”
Collins E195/MAPLD20043
Biological Applications
• Systems Biology– Correlated networks of cells and biological processes– Reaction pathways/cascades
• Properties of cell/bacterial/viral populations (Biodefense)– Bacterial virulence factors
• Generating diversity by changing immune signature
– Environmental Adaptation of cells/pathogens– Drug Resistence (Cancer, HIV, Bacteria)
• Nano-systems/nano-technology– Statistical fluctuations must be included in models– Single cells -- Essentially nano-systems
• Machinery within cells/nucleus
• Non-equilibrium dynamics
Collins E195/MAPLD20044
Cellular Processes(Examples)
• DNA Replication– Interactions with proteins and small
molecules
• Transcription Factors– Gene Regulation
• RNA– Editing, interference, protein synthesis
• Regulatory feedback– Kinase Pathways/Cascades
Collins E195/MAPLD20045
Modeling Reactions/PathwaysDiscrete Processes
• Must study populations of cells/molecules but the mean behavior is dependent on the states of the individual entities.– Low copy number of cells/molecules– Variation in copy number– Relatively slow reaction rates– Varied conditions/environments– “Activation potential” to reaction
Collins E195/MAPLD20046
Simulation Methods
• Stochastic Simulations– Deterministic modeling
• Mean behavior of large numbers – often small numbers of biological components
– Fluctuations are important
• Boolean Networks– Lack of experimental rate constants
Collins E195/MAPLD20047
Why use FPGAs?
• Current Computational Limitations– Can only model relatively modest systems
• Computational Efficiency– Inherent parallelism in molecular reactions
• Scalability– Use multiple FPGAs to simultaneously model
hundreds of reactions
• Looking to Future– Computational power rapidly growing– Price/Performance
Collins E195/MAPLD20048
Smith-Waterman Update(Proof of Concept)
• Total # Operations / Second– 1 Smith-Waterman Step includes:
• 25 Logic Operations (Adds, compares, mostly 26-27 bit ops, some single bit ops)• 13 Data Reorder Operations (Move, Combine…)• 11 Data Stor (Assignment)
– Logic Operations Only:• 25 Ops * 25Mhz * 448 Smith-Waterman kernels = 280Billion Operations / Second
– Logic & Data Operations:• 49 Ops * 25Mhz * 448 Smith-Waterman kernels = 550Billion Operations / Second
• Total Aggregate Communications Bandwidth of Systolic Array– 12 * 88 * 25Mhz = 26.4 Gb/s plus 7 * 22 * 50Mhz = 7Gb/s = 34.1 Gb/s
• Resources Consumed / Resources Available– PE2 – PE7: 60% to 70% consumed– PE1 20% consumed; XPE 5%; XPR .1%
• DMA transfer between host PC and FPGAs– Initial results 210Mb/sec (FPGA->X86)
Collins E195/MAPLD20049
Smith-Waterman (cont.)See Poster by Jim Yardley, SBS
• Opportunities to further optimize the algorithm include:– Increasing the number of SW_Iterations that
can be done in parallel (up to 100 Billion Smith Waterman steps/second)
– Increasing the clock speed of the hardware (up to 1 Trillion Smith Waterman steps/second)
– Friendlier User Interface
Collins E195/MAPLD200410
Viva Environment
• VIVA GRAPHICAL LANGUAGE– Capture natively parallel code– Accommodate data of any type, size, or precision – Tune algorithms for speed of execution or conservation of hardware
resources• VIVA EDITOR
– Call Viva algorithms from legacy code such as C, C++, or Fortran– Interactively debug code– Import/Export EDIF files
• VIVA COMPILER/SYNTHESIZER– Program multi-million gate designs – Compile hardware designs quickly for efficient development
• VIVA LIBRARIES– Reuse flexible Viva objects which accept any data type or size – Target any hardware platform with a ‘System Description’– Prototype Viva on any X-86-based Windows machine
Collins E195/MAPLD200411
Viva as a Modeling Language?
• Programming FPGAs has generally been the domain of engineers.
• Viva– “Pseudo-graphical language” – Map Model to Viva– Inherent parallelism of Model can map to FPGAs– Recursion of model– Document Code– Use the underlying elements of Viva™ to create an
environment that the bio-informatician/computational biologist can use to program the FPGA hardware
– Build Library Elements/Modules specific to Model
Collins E195/MAPLD200412
Libraries for Biology/Biochemistry
• Known Reaction Processes
• Conditional Elements to relate the reactions to each other
• Outputs to visualize the reactions
• Built-in Infrastructure for handling I/O
• Minimize Learning Curve for Modeling Biological Processes
Collins E195/MAPLD200413
Ease of Programming?Library Creation
• Examples of simple reactions programmed in Viva by a relatively novice user over a few days.– A B– A B– AB– A+BC
Collins E195/MAPLD200414
Collins E195/MAPLD200415
Collins E195/MAPLD200416
Collins E195/MAPLD200417
Collins E195/MAPLD200418
Programming StyleProgram Design
• Multiple ways to package logic– No Unique Solution
• Which is “best” depends on “user”– Simplicity vs. Functionality– Ease of Debugging– Ease of Documenting
Collins E195/MAPLD200419
Collins E195/MAPLD200420
Collins E195/MAPLD200421
Output Interfaces
• Efficient Computation of the Model is Useless if you can’t see the results
• Interface into COM objects
• Integrate Data Analysis and Visualization
Collins E195/MAPLD200422
Collins E195/MAPLD200423
Collins E195/MAPLD200424
Collins E195/MAPLD200425
Lessons Learned• Timing is Everything!
– Complexity of building large systems of reactions means that both efficiency (minimize clock ticks) and stability of computation (consistent results by keeping latency and synchronization in check) must be considered in a general system.
• Many ways to package the logic– Not all are equal!– Simplicity vs Functionality– Document your Code!– Bugs are often subtle
• Potential is Enormous
Collins E195/MAPLD200426
Obvious Extensions
• Timing & Synchronization
• Finite State machine – State of system may depend on several
variables or conditions• Not all conditions need to be completely known
• Some may be “black boxes” that produce a signal
• Go-Done-Busy-Wait
Collins E195/MAPLD200427
Future Directions
• User-Friendly Interfaces to Applications• Expand Application Areas
– Imaging– Pattern Recognition/Clustering/Data Mining
• Expand Libraries for Reactions/Pathways– “Tinker-Toy” Modeling
• Work with Vendor to bring FPGA solutions to wider community of computational biologists– Faster Application Development Time– Debugging and Documentation
Collins E195/MAPLD200428
Acknowledgements
• Starbridge Systems– Kent Gilson– Jim Yardley– Fred Geiger
• NCI for Support– Stan Burt, Director ABCC