Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological...

34
Christian Grothoff Fast Primer Search with DUP Christian Grothoff [email protected] Technische Universit¨ at M¨ unchen 1

Transcript of Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological...

Page 1: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Fast Primer Search with DUP

Christian [email protected]

Technische Universitat Munchen

1

Page 2: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Overview

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search1

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results1“An algorithm for the comprehensive search of oligonucleotide signatures based

on phylogenetic trees”, joint work with K. Bader, W. Ludwig and H. Meier

2

Page 3: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Problem Domains• Systems

• Programming Languages

• Software Engineering

• Secure Networking

• Privacy

RunaboutHashMap<Class,Code> mapvisitAppropriate(Object):void

RunaboutSumsum:intvisit(A0):voidvisit(A1):voidvisit(A2):void

Codeaccept(Object):void

GenCodeXX

0..*1

3

Page 4: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

The Problem:Developing Parallel Stream Applications

• Most developers (only) know how to write sequentialcode

• Parallel programing is error-prone (data races,deadlocks)

• High-performance parallel programming is really hard

• With GPUs for $4,000, we could have 2,600 cores...

⇒ Developers more expensive than hardware

4

Page 5: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

X10 vs. the DUP System2

X10

10x faster, 10x as productive in 10 years for BlueGene

DUP

12 the speed, 10x as productive in 10 months for POSIX

2Available at http://dupsystem.org/

5

Page 6: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

6

Page 7: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

The DUP System

7

Page 8: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

DUP Applications

• Fast primer search

• High-throughput, customizable video-conferencing

• Parallel and distributed discrete event simulationframework

8

Page 9: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

9

Page 10: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Biological Questions

• Which are the most specific OSSs for a species?

• Which are the most specific OSSs for a subtree in thephylogenetic tree?

10

Page 11: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Input: Phylogenetic Tree1

2

3

4

5

6

I

IIa

IIb

IIIa

IIIb

11

Page 12: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Parallel Mapping of Primers to Speciess @opt1:88[0<in.txt,1|p1:0,3|p2:0] $ fanout;

p1@opt1:88[1|pe:0] $ arb_probe_dup;

p2@opt2:88[1|pe:3] $ arb_probe_dup;

pe@opt2:88[1>out.txt] $ gather;

0

2

4

6

0 2 4 6 8 10 12 14 16

Spe

ed-u

p

# Nodes

replicated,exactreplicated,approximate

partitioned,exactpartitioned,approximate

12

Page 13: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Intermediate Result: Species and OSS

1

2

3

4

5

6

E

A

B

F

C

G

D

H

13

Page 14: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

14

Page 15: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Biological Questions

• Which are the most specific OSSs for a species?

• Which are the most specific OSSs for a subtree in thephylogenetic tree?

• Sequence information may contain errors; allow up to kout-group hits!

• Perfect OSS may not exist even with out-group hits;maximize number of in-group hits

15

Page 16: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

BGRT Creation (1/5)1,2 : A

1,2 : A

16

Page 17: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

BGRT Creation (2/5)2,3 : B

1,2 : A 1,2 : A

2,3 : B

17

Page 18: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

BGRT Creation (3/5)

1,2 : A

2,3 : B

1,3 : C

1

2,3 : B

3 : C

2 : A

18

Page 19: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

BGRT Creation (4/5)

1

2,3 : B

3 : C

2 : A

1,2,3 : D

1

2,3 : B

3 : C

2 : A

3 : D

19

Page 20: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

BGRT Creation (5/5)

1 : E

2 : A

3 : D

3 : C

2,3 : B

5 : G

4 : F

6 : H

20

Page 21: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Iterate-And-Bound

1

2

3

4

5

6

I

IIa

IIb

IIIa

IIIb

I IIa IIb IIIa IIIb

3:0 2:1 1:2 1:2 0:3

I IIa IIb IIIa IIIb

2:0 2:0 0:2 0:2 0:2

I IIa IIb IIIa IIIb

1:0 1:0 0:1 0:1 0:1

I IIa IIb IIIa IIIb

2:0 1:1 1:1 1:1 0:2

I IIa IIb IIIa IIIb

2:0 1:1 1:1 1:1 0:2

I IIa IIb IIIa IIIb

3:0 1:2 2:1 1:2 1:2

I IIa IIb IIIa IIIb

1:0 0:1 1:0 1:0 0:1

I IIa IIb IIIa IIIb

2:0 0:2 2:0 1:1 1:1

1 : E

2 : A

3 : D

3 : C

2,3 : B

5 : G

4 : F

6 : H

21

Page 22: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Final Result

PhaseI IIa 1 2 IIb IIIa 3 4 IIIb 5 6

Mis

mat

ches 0 3:D,G 2:A 1:E % 2:H 1:F % 1:F % % %

1 % 2:D+ 1:A,C+ 1:A,B 2:G+ 1:B,C,H+ 1:B,C 1:H+ 1:H % 1:H

2 % 1:G∗ 1:D+ 1:D,G+ 1:D∗ 1:D,G+ 1:D+ % 1:G+ 1:G %3 % % % % % % % % 0:D∗ % %

“+” Should probably not be computed (mismatches >, matches =)

“∗” Even more useless (mismatches >, matches <)

22

Page 23: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

23

Page 24: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Other Current Work

• Successful attacks on various “secure” P2P networks(Tor, Freenet, Tahoe LAFS, ...)

• Randomized Resilient Routing in Restricted RouteTopologies

• Autonomous NAT traversal

• Automatic Restart Management: RAS for GNU/Linux

• Parallelizing Protein Analysis (with Rost Lab)

24

Page 25: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

An Attack on Tor (USENIX Security 2009)

Client

Tor Node 3 - Our Exit Node

Server

Tor Node 1 - Unknown Node Malicious Client

Tor Node 2 - Known High BW Tor Node 1

High BW Tor Node 2 Malicious Server

25

Page 26: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Future Work

• Resource allocation for DUP

• Parallelize more applications with DUP

• Aspect-oriented coordination language for DUP

• Migrating to IPv6 using secure P2P VPN over GNUnet

• Memory fragmentation analysis (Mallice)

26

Page 27: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Questions

?

27

Page 28: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

The GNUnet Framework

ARM

Transport

TCP, UDP, HTTP, ...

Testing

Routing

Authoring

DatastoreEncryption

DV DHT

GAP

MySQL

sqLite

Postgres

28

Page 29: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

GNUnet’s Automatic Restart Manager

Bug Tracking System

DeveloperConfigurationUser

Bug

ARM

Dependency Learning

and Management

Process

Process

requires services

changes configuration

monitors

configuration

changes

starts

required

services

crashes

restarts

impacted

processes

reports

crash

investigates

bugs

analyzes crash,

adds instrumentation,

restarts serviceuses

dependency

29

Page 30: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Secure Routing (1/3)

A

BC

D

E

FG

H

Base Restricted Route Topology

30

Page 31: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Secure Routing (2/3)

A

BC

D

E

FG

H

Distance Vector Added Links

31

Page 32: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Secure Routing (3/3)

000

001101

011

111100

010

110

Resulting DHT Topology

32

Page 33: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Teaching

• Programming Languages

• Mainframe Administration

• Computer Networking

• Peer-to-Peer Networking

33

Page 34: Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological Questions { Primer Search Parallelization with DUP { Mapping Primers to Species with

Christian Grothoff

Group InfrastructureSoftware:• Drupal / Mantis

• Doxygen / Subversion

• Buildbot / lcov

• clang (LLVM)

• Coverity Prevent

Hardware:• Server (i7)

• Sheeva Plug (ARM)

• iMac (PowerPC)

• GNU/Linux VMs (AMD)

• PMP-capable NAT router

Furthermore, we have access to a wide range of networking

equipment (via Lehrstuhl) and HPC facilities at the LRZ.

34