Analyzing Massive Datausing Heterogeneous Computing
David A. Bader
OPPORTUNITIES
David A. Bader 2
Opportunities
• Application-oriented Opportunities:– High performance computing for massive graphs– Streaming analytics– Informational Visualization techniques for
massive graphs– Heterogeneous systems: Methodologies for
combining the use of the Cloud and Manycore for high-performance computing
– Energy-efficient high-performance computing
David A. Bader 3
Opportunity 1: High performance computing for massive graphs
• Traditional HPC has focused primarily on solving large problems from chemistry, physics, and mechanics, using dense linear algebra.
• HPC faces new challenges to deal with:– time-varying interactions among entities, and – massive-scale graph abstractions where the vertices represent the nouns or
entities and the edges represent their observed interactions.• Few parallel computers run well on these problems because
– they often lack locality required to get high performance from distributed-memory cache-based supercomputers.
• Case study: Massively threaded architectures are shown to run several orders of magnitude faster than the fastest supercomputers on these types of problems!
A focused research agenda is needed to design algorithms that scale on these new platforms.
David A. Bader 4
• While our high performance computers have delivered a sustained petaflop, they have done so using the same antiquated batch processing style where a program and a static data set are scheduled to compute in the next available slot.
• Today, data is overwhelming in volume and rate, and we struggle to keep up with these streams.
Fundamental computer science research is needed in: the design of streaming architectures, and data structures and algorithms that can compute important analytics while sitting in the
middle of these torrential flows.
Opportunity 2: Streaming analytics
David A. Bader 5
Opportunity 3: Information Visualizationtechniques for massive graphs• Information Visualization today
– addresses traditional scientific computing (fluid flow, molecular dynamics), or – when handling discrete data, scale to only hundreds of vertices at best.
However, there is a strong need for visualization in the data sciences so that analytics can gain understanding from data sets with from millions to billions of interacting non-planar discrete entities.
– Applications include: data mining, intelligence, situational awareness
David A. Bader 6
NNDB Mapper of George Washington
Twitter social network using Large Graph Layout
Source: Akshay Java, from ebiquity group
Opportunity 4: Heterogeneous Systems: Methodologies for combining the use of the Cloud and Manycore for high-performance computing.• Today, there is a dichotomy
between using clouds (e.g. Hadoop, map-reduce) for massive data storage, filtering, summarization, and massively parallel/multithreaded systems for data-intensive computation.
We must develop methodologies for employing these complementary systems for solving grand challenges in data analysis.
David A. Bader 7
Steve Mills, SVP of IBM Software (left), and Dr. John Kelly, SVP of IBM Research, view Stream Computing technology
Opportunity 5: Energy-efficient high-performance computing
• The main constraint for our ability to compute has changed– from availability of compute resources– to the ability to power and cool our systems within budget.
Holistic research is needed that can permeate from the architecture and systems up to the applications AND DATA CENTERS, whereby energy use is a first-class object that can be optimized at all levels.
David A. Bader 8
Microsoft’s Chicago Million Server DataCenter
MOTIVATION
David A. Bader 9
Exascale Streaming Data Analytics:Real-world challenges
All involve analyzing massive streaming complex networks:• Health care disease spread, detection
and prevention of epidemics/pandemics (e.g. SARS, Avian flu, H1N1 “swine” flu)
• Massive social networks understanding communities, intentions, population dynamics, pandemic spread, transportation and evacuation
• Intelligence business analytics, anomaly detection, security, knowledge discovery from massive data sets
• Systems Biology understanding complex life systems, drug design, microbial research, unravel the mysteries of the HIV virus; understand life, disease,
• Electric Power Grid communication, transportation, energy, water, food supply
• Modeling and Simulation Perform full-scale economic-social-political simulations
David A. Bader 10
0
50
100
150
200
250
300
350
400
450
Dec-
04
Mar
-05
Jun-
05
Sep-
05
Dec-
05
Mar
-06
Jun-
06
Sep-
06
Dec-
06
Mar
-07
Jun-
07
Sep-
07
Dec-
07
Mar
-08
Jun-
08
Sep-
08
Dec-
08
Mar
-09
Jun-
09
Sep-
09
Dec-
09
Million Users
Exponential growth:More than 900 million active users
Sample queries: Allegiance switching: identify entities that switch communities.Community structure:identify the genesis and dissipation of communitiesPhase change: identify significant change in the network structure
REQUIRES PREDICTING / INFLUENCE CHANGE IN REAL-TIME AT SCALE
Ex: discovered minimal changes in O(billions)-size complex network that could hide or reveal top influencers in the community
Ubiquitous High Performance Computing (UHPC)
Goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks
Program Objectives:One PFLOPS, single cabinet including self-contained cooling
50 GFLOPS/W (equivalent to 20 pJ/FLOP)
Total cabinet power budget 57KW, includes processing resources, storage and cooling
Security embedded at all system levels
Parallel, efficient execution models
Highly programmable parallel systems
Scalable systems – from terascale to petascale
Architectural Drivers:Energy EfficientSecurity and DependabilityProgrammability
Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes
“NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems” -MarketWatch
David A. Bader (CSE)Echelon Leadership Team
Example: Mining Twitter for Social Good
David A. Bader 12
ICPP 2010
Image credit: bioethicsinstitute.org
• CDC / Nation-scale surveillance of public health
• Cancer genomics and drug design– computed Betweenness Centrality
of Human Proteome
Human Genome core protein interactionsDegree vs. Betweenness Centrality
Degree
1 10 100
Bet
wee
nnes
s C
entra
lity
1e-7
1e-6
1e-5
1e-4
1e-3
1e-2
1e-1
1e+0
Massive Data Analytics: Protecting our Nation
US High Voltage Transmission Grid (>150,000 miles of line)
Public Health
David A. Bader 13
ENSG00000145332.2Kelch-
like protein implicat
ed in breast cancer
Graphs are pervasive in large-scale data analysis
• Sources of massive data: petascale simulations, experimental devices, the Internet, scientific applications.
• New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.
Astrophysics Problem: Outlier detection. Challenges: massive datasets, temporal variations.Graph problems: clustering, matching.
BioinformaticsProblem: Identifying drug target proteins.Challenges: Data heterogeneity, quality.Graph problems: centrality, clustering.
Social InformaticsProblem: Discover emergent communities, model spread of information.Challenges: new analytics routines, uncertainty in data.Graph problems: clustering, shortest paths, flows.
Image sources: (1) http://physics.nmt.edu/images/astro/hst_starfield.jpg(2,3) www.visualComplexity.com
14David A. Bader
Flywheel has driven HPC into a corner
David A. Bader 15
For decades, HPC has been on a vicious cycle of enabling applications that run well on HPC systems.
Real-World Applications(with natural parallelism)
Data-intensive computing has natural concurrency; yet HPC architectures are designed to achieve high floating-point rates by exploiting spatial and temporal localities.
• For the first time in decades, manycore and multithreaded can let us rethink architecture.
Data-intensive computing has natural concurrency; yet HPC architectures are designed to achieve high floating-point rates by exploiting spatial and temporal localities.
• For the first time in decades, manycore and multithreaded can let us rethink architecture.
These are today’s MPI applications!
Graph500 Benchmark, www.graph500.org
• Cybersecurity– 15 Billion Log Entires/Day (for large
enterprises)– Full Data Scan with End-to-End Join
Required
• Medical Informatics– 50M patient records, 20-200
records/patient, billions of individuals– Entity Resolution Important
• Social Networks– Example, Facebook, Twitter– Nearly Unbounded Dataset Size
• Data Enrichment– Easily PB of data– Example: Maritime Domain
Awareness• Hundreds of Millions of Transponders• Tens of Thousands of Cargo Ships• Tens of Millions of Pieces of Bulk
Cargo• May involve additional data (images,
etc.)
• Symbolic Networks– Example, the Human Brain– 25B Neurons– 7,000+ Connections/Neuron
David A. Bader 16
Defining a new set of benchmarks to guide the design of hardware architectures and software systems intended to support such applications and to help procurements. Graph algorithms are a core part of many analytics workloads. Executive Committee: D.A. Bader, R. Murphy, M. Snir, A. Lumsdaine
• Five Business Area Data Sets:
Breadth-first search comparison: Graph500• Graph500 benchmark: BFS on a synthetic graph of a social network• 2Size vertices and 16 * 2Size edges• Processing Rate: Traversed edges per second (TEPS)
David A. Bader 17
Rank Machine Owner Size TEPS
1 NNSA/SC Blue Gene/Q Prototype II (4096 nodes / 65,536 cores)
NNSA and IBM Research,T.J. Watson
32 254.3 B
2 Hopper XE6(1800 nodes / 43,200 cores)
LBNL 37 113.3 B
3 Lomonosov(4096 nodes / 32,768 cores)
Moscow State University 37 103.2 B
17 mirasol (Quad-Socket Intel E7-8870 - 2.4 GHz - 10 cores / 20 threads per socket)
Intel/Georgia Tech(UCB implementation)
28 5.1 B
20 Dingus (Convey HC1ex, 4FPGA) SNL 28 1.7 B
35 Matterhorn (64-proc XMT2) CSCS 31 884.6 M
Massive Streaming Graph Analytics
David A. Bader 18
(A, B, t1, poke)(A, C, t2, msg)
(A, D, t3, view wall)(A, D, t4, post)
(B, A, t2, poke)(B, A, t3, view wall)
(B, A, t4, msg)
Analysts
Streaming Graphs
STINGER: A Data Structure for Graphs with Streaming Updates
Light-weight data structure that supports efficient iteration and efficient updates.
Experiments with Streaming Updates to Clustering Coefficients
Working with bulk updates, can handle almost 200k
Tracking Connected ComponentsTracking Betweenness Centrality
19David A. Bader
STING Extensible Representation (STINGER)Enhanced representation developed for dynamic graphs developed in consultation with David A. Bader, Johnathan Berry, Adam Amos-Binks, Daniel Chavarría-Miranda, Charles Hastings, Kamesh Madduri, and Steven C. Poulos.Design goals:
Be useful for the entire “large graph” communityPortable semantics and high-level optimizations across multiple platforms & frameworks (XMT C, MTGL, etc.)Permit good performance: No single structure is optimal for all.Assume globally addressable memory accessSupport multiple, parallel readers and a single writer
Operations:Insert/update & delete both vertices & edgesAging-off: Remove old edges (by timestamp)Serialization to support checkpointing, etc.
20David A. Bader
STING Extensible RepresentationSemi-dense edge list blocks with free spaceCompactly stores timestamps, types, weightsMaps from application IDs to storage IDsDeletion by negating IDs, separate compaction
21David A. Bader
Goal: Streaming Graph Analytics
• O(Billion) vertices, O(Trillion) edges, 1M updates / sec
• Challenge: Maintain analytics, update quickly– Connected components– Agglomerative clustering
• Problem:– Irregular execution– Bandwidth and latency bound– Irregular memory access– Low-to-no reuse
22David A. Bader
Case Study 1:Clustering Coefficients in Streaming Graph• Roughly, the ratio of actual triangles to possible triangles
around a vertex.
• Defined in terms of triplets.• i-j-v is a closed triplet (triangle).• m-v-n is an open triplet.• Clustering coefficient
# closed triplets / # all triplets• Locally, count those around v.• Globally, count across entire graph.
Multiple counting cancels (3/3=1)
23David A. Bader
Updating clustering coefficients in streaming graph
• Monitoring clustering coefficients could identify anomalies, find forming communities, etc.
• Computations stay local. A change to edge <u, v> affects only vertices u, v, and their neighbors.
• Need a fast method for updating the triangle counts, degrees when an edge is inserted or deleted.
– Dynamic data structure for edges & degrees: STINGER– Rapid triangle count update algorithms: exact and approximate
“Massive Streaming Data Analytics: A Case Study with Clustering Coefficients.” Ediger, David, Karl Jiang, E. Jason Riedy, and David A. Bader. MTAAP 2010, Atlanta, GA, April 2010.
u v+2 +2
+1+1
24David A. Bader
Updating clustering coefficients in streaming graph
• Using RMAT as a graph and edge stream generator.– Mix of insertions and deletions
• Result summary for single actions– Exact: from 8 to 618 actions/second– Approx: from 11 to 640 actions/second
• Alternative: Batch changes– Lose some temporal resolution within the batch– Median rates for batches of size B:
• STINGER overhead is minimal; most time in spent metric.
Algorithm B = 1 B = 1000 B = 4000
Exact 90 25 100 50 100
Approx. 60 83 700 193 300
25David A. Bader
Case study 2:Tracking connected components in streaming graph
• Why? Interested in:– small components over long periods of time– communities that split off from the main group
• Global property– Track <vertex, component> mapping– Undirected, unweighted graphs
• Scale-free network– most changes are within one large component
• Inserting a new edge:– Endpoints are differently labeled– Re-label smaller component
• Deleting an edge:– Must prove that deletion does not cleave a component– Otherwise recompute!
26David A. Bader
Tracking Connected Components:Performance on 32 of 128p Cray XMT
David A. Bader 27
|V| = 1M, |E| = 8M |V| = 16M, |E| = 135M
• Insertions-only improvement 10x-20x over recomputation• Greater parallelism within a batch yields higher performance• Trade-off between time resolution and update speed
“Tracking Structure of Streaming Social Networks, David Ediger, Jason Riedy, David A. Bader, and Henning Meyerhenke. MTAAP 2011, Anchorage, AK, May 2011.
Tracking Connected Components:Faster deletions• Finding triangles rules out 90% of deletions as “safe”• Introduce a spanning tree to easily track deletions
• A deletion is “safe” if:– the edge is not in the spanning tree– the edge endpoints have a common neighbor– a neighbor can still reach the root of the tree
X
Spanning Tree Edge
Non-Tree Edge
Root vertex in black
28David A. Bader
• This method rules out 99.7% of deletions in a RMAT graph with 2M vertices and 32M edges– Often faster than triangle finding
Case Study 3:Tracking Betweenness Centrality (BC)in a streaming graph• Key metric in social network analysis
[Freeman ’77, Goh ’02, Newman ’03, Brandes ’03]
: Number of shortest paths between vertices s and t: Number of shortest paths between vertices s and t passing
through v
• Streaming Updates– Reuse previously computed traversals
• Reduce computational requirements
– Avoid redundant computations
( ) ( )st
s v t V st
vBC v
σσ≠ ≠ ∈
=
)(vstσstσ
29David A. Bader
Updating BC for Edge Insertions in streaming graph
• Currently, memory requirements limit problem size
• Speedup measured: static
insertion
TSpeedupT
=
Collaboration network Vertices Edges Speedup
Astrophysics 18772 198080 148XCondensed matter 23133 93468 91X
General relativity 5242 14490 40XHigh energy physics 120008 118505 108X
High energy physics theory
9877 51970 36X
David A. Bader 30
Updating BC in Streaming Graph:Experimental Results for Insertions-only• 3.4 GHz Intel Core i7-2600K (16 GB RAM)
020406080
100120140160
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Spee
dup
Edge factor(average edge degree)
R-MAT graph speedup
Scale 14 Scale 13 Scale 12 Scale 11 Scale 10
David A. Bader 31
STREAMING ON HYBRID SYSTEMS
David A. Bader 32
Sandy Bridge: Intel Core i7-2600K
CPU• Clock 3.4GHz (3.8GHz Peak)• 4 Cores with 2 threads each• 4 x 1MB L2, 8MB L3• 21 GB/s Memory Bandwidth• 32 nm Technology• AVX (256-bit Vector)• ~83 GFLOPS
• 95W Thermal Design Power (TDP) for CPU + GPU
GPU• Clock 860MHz
(1350MHz Peak)• Can use L3 Cache• 12 Execution Units
David A. Bader 33
Ivy Bridge: Intel Core i7-3770K
CPU• Clock 3.5GHz (3.9GHz Peak)• 4 Cores with 2 threads each• 4 x 1MB L2, 8MB L3• 25.6 GB/s Memory
Bandwidth• 22 nm Technology• AVX (256-bit Vector)• ~93 GFLOPS
• 77W TDP CPU + GPU
GPU• Clock 650MHz
(1150MHz Peak)• Can use L3 Cache• 16 Execution Units
David A. Bader 34
Llano: AMD A8-3870K
CPU• Clock 3.0GHz• 4 Cores• 4 x 1MB L2, No L3• 21GB/s Memory
Bandwidth• 32nm Technology• ~60 GFLOPS
• 100W TDP CPU + GPU
GPU• Clock 600MHz• 400 Radeon Cores• 480 GFLOPS Peak
David A. Bader 35
CPU Cores vs. GPU Cores
• Independent threads with vector units
• Up to 20 threads per socket• Up to 10 Cores per socket• ~3GHz, High power per core• Branch Prediction, Out of Order,
Register Renaming• ~100s GFLOPs
• SIMT threads with multiple execution units
• 1,000s threads per socket• Up to 1,000s of Cores• ~1GHz, Low power per core• Simple Execution Pipeline• Higher aggregate memory
bandwidth (5x over CPU)• ~1,000s GFLOPs
David A. Bader 36
NVIDIA Echelon Chip (DARPA UHPC)
Throughput Optimized Cores• Clock 2GHz (2.5GHz Peak)• 256 TOCs Per Chip• 8 Lanes per TOC• 64 x 4MB L2• 1600GB/s Local DRAM• 400GB/s Remote DRAM• 16 TFLOPs• 10nm Technology
• 270W / chip
Latency Optimized Cores• Clock 3GHz• 8 Per Chip• 8 x L2 256KB
David A. Bader 37
DARPA UHPC Challenge Problem (CP) 2
• Streaming graph analytics problem • Designed to simulate real-time tracking and
analysis of a dynamic social network graph• Primary metric is updates-per-second• Constraint of updated analytics every 100,000
updates
David A. Bader 38
Deployment Scenario• Single cabinet system in small server room
– Multi-terabyte global memory– Multiple connected Echelon chips– Millions of threads
• Tracking graph of billions of vertices and edges– Multiple edge and vertex types– Relevant temporal data per edge and vertex– Concurrent analytics kernels and graph updates– Millions of updates per second
• Applications– Tracking network connections for cyber security– Assembling / processing semantic graph of input sensor data– Social networks – insider threat detection, intelligence applications
David A. Bader 39
Problem Overview• Initialization
– Kernel 0 - Generate / load synthetic social network
RMAT generated graph of scale 32, edge factor 8– Kernel 1 - Build dynamic graph data structure
Create and maintain STINGER dynamic temporal graph structure– Kernel 2 - Connected Components
Track connected components within the graph– Kernel 3 – Clustering
Track communities via agglomerative clustering optimizing modularity
• Streaming (K1, K2, K3)– Update analytics & data structure in batches of 100,000 actions– Measure rate at which batches can be applied (updates per sec.)
David A. Bader 40
General Observations• Memory
– Off-chip and DRAM bandwidth and latency will be performance limiting factors
– Low compute to memory ratio– Random memory access to large global data structure– Edge blocks can be made to fit memory lines, but accessing single random
words cannot be avoided• Compute
– Irregular computation and workload– Minimal floating point, mostly integer computation– Large sorts and set intersections– Pointer jumping
• Parallelism– Fine grain and coarse grain– Tasks, loops, scans, and reductions– Data structure (Per edge, per vertex, within edge blocks)– Need low thread overhead and fast fine grain synchronization
David A. Bader 41
42
Future Architectures
• Highly multithreaded• High bandwidth (network and memory)• Complex but flexible memory hierarchy• Heterogeneous design in core capability and ISA
David A. Bader
Heterogeneous Design
• Different cores to accomplish different tasks
• Latency Optimized Cores (LOCs)– For branchy, serial code, OS / runtime– Fast clock (3GHz+), higher energy (½ CPU core)– Out of order execution, branch prediction
• Throughput Optimized Cores (TOCs)– For highly parallel code, majority of workload– Slower clock (≤ 2GHz), lower energy (1/10 GPU core)– Simple cores with many hardware thread contexts
43David A. Bader
Case Study: Clustering• Compute on community graph (initially the graph itself)
• Reduce edge and vertex weights into community graph volume
• Compute edge scores and reduce to find mean and variancescore(u,v) = [(weight(u,v) / 2*volume) – (deg (u) * deg (v) / [2*volume]2)]Variance = Variance + (score(u,v) – mean)2
• Parallel for over edges to filter and match– If score(u,v) > mean – 1.5 * √Variance AND score(u,v) > best[u]
• match[u] = v• best[u] = score(u,v)
• Refine matching
• Contract community graph around matching
• Repeat from second step until no edge scores pass the filter
David A. Bader 44
Case Study: Clustering
• Goal: find best score per vertex
• CPU Strategy: – OpenMP parallel for over vertices– Serially search adjacency lists for best score
• GPU Strategy:– Blocks: Parallel for over vertices– Threads in a block: Reduce max score over adjacency list
David A. Bader 45
Case Study: Clustering
• Goal: find best score per vertex
• Hybrid Strategy:– Parallel per vertex on CPU– If low degree, serially determine max score on CPU– If medium degree, use GPU thread blocks– If large degree, CPU to divide edge blocks to GPU thread blocks
• Overcome workload imbalance• Provide parallelism• Tune to reduce impact of overhead
David A. Bader 46
Heterogeneous Exploration• Analysis, updates run on TOCs• Management, load balancing,
serial bottlenecks run on LOCs• Traversal using lightweight
threading and shared memories– Graph stored in a blocked, linked
adjacency structure split across system global memory
– TOC threads cooperatively copy blocks from global to shared memory and explore in parallel
Local L2 Scratchpad
V2 | 1V3 | 1V11 | 1
* nextV9
TOC
V0 | 1
V8 | 1V2| 1
* nextV7
TOC
TOC
TOC
… ……V0| 1
47David A. Bader
CHALLENGES
David A. Bader 48
Graph Analytics for Social Networks• Are there new graph techniques? Do they parallelize?
How do we exploit heterogeneous platforms? Can the computational systems (algorithms, machines) handle massive networks with millions to billions of individuals? Can the techniques tolerate noisy data, massive data, streaming data, etc. …
• Communities may overlap, exhibit different properties and sizes, and be driven by different models– Detect communities (static or emerging)– Identify important individuals– Detect anomalous behavior– Given a community, find a representative
member of the community– Given a set of individuals, find the best
community that includes themDavid A. Bader 49
Open Questions for Massive Data Analytic Apps
• How do we diagnose the health of streaming systems?• Are there new analytics for massive spatio-temporal interaction
networks and graphs (STING)?• Do current methods scale up from thousands to millions and
billions?• How do I model massive, streaming data streams?• Are algorithms resilient to noisy data? • How do I visualize a STING with O(1M) entities? O(1B)? O(100B)?
with scale-free power law distribution of vertex degrees and diameter =6 …
• Can accelerators aid in processing streaming graph data?• How do we leverage the benefits of multiple architectures (e.g.
map-reduce clouds, and massively multithreaded architectures) in a single platform?
David A. Bader 50
New opportunities for Big Data• Don’t let programming paradigms limit your
imagination of what’s possible!• In the HPC community,
Scientific Computing ≡ Message-Passing Programming (MPI)
• Some grand challenge applications do very well, led to numerous scientific insights
• Other applications are impossible to solve efficiently
• For the Big Data community,– Hadoop/Map-Reduce
• Great for some statistical analyses• Problematic for deep insights
David A. Bader 51
New opportunities using heterogeneous systems for Big Data problems
Collaborators and Acknowledgments• Jason Riedy, Research Scientist, (Georgia Tech)• Graduate Students (Georgia Tech):
– David Ediger– Oded Green– Rob McColl– Pushkar Pande– Anita Zakrzewska– Karl Jiang
• Bader PhD Graduates:– Seunghwa Kang (Pacific Northwest National Lab)– Kamesh Madduri (Penn State)– Guojing Cong (IBM TJ Watson Research Center)
• John Feo and Daniel Chavarría-Miranda (Pacific Northwest Nat’l Lab)
David A. Bader 52
Bader, Related Recent Publications (2005-2008)
• D.A. Bader, G. Cong, and J. Feo, “On the Architectural Requirements for Efficient Execution of Graph Algorithms,” The 34th International Conference on Parallel Processing (ICPP 2005), pp. 547-556, Georg Sverdrups House, University of Oslo, Norway, June 14-17, 2005.
• D.A. Bader and K. Madduri, “Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors,” The 12th International Conference on High Performance Computing (HiPC 2005), D.A. Bader et al., (eds.), Springer-Verlag LNCS 3769, 465-476, Goa, India, December 2005.
• D.A. Bader and K. Madduri, “Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2,” The 35th International Conference on Parallel Processing (ICPP 2006), Columbus, OH, August 14-18, 2006.
• D.A. Bader and K. Madduri, “Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks,” The 35th International Conference on Parallel Processing (ICPP 2006), Columbus, OH, August 14-18, 2006.
• K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, “Parallel Shortest Path Algorithms for Solving Large-Scale Instances,” 9th DIMACS Implementation Challenge -- The Shortest Path Problem, DIMACS Center, Rutgers University, Piscataway, NJ, November 13-14, 2006.
• K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, “An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances,” Workshop on Algorithm Engineering and Experiments (ALENEX), New Orleans, LA, January 6, 2007.
• J.R. Crobak, J.W. Berry, K. Madduri, and D.A. Bader, “Advanced Shortest Path Algorithms on a Massively-Multithreaded Architecture,” First Workshop on Multithreaded Architectures and Applications (MTAAP), Long Beach, CA, March 30, 2007.
• D.A. Bader and K. Madduri, “High-Performance Combinatorial Techniques for Analyzing Massive Dynamic Interaction Networks,” DIMACS Workshop on Computational Methods for Dynamic Interaction Networks, DIMACS Center, Rutgers University, Piscataway, NJ, September 24-25, 2007.
• D.A. Bader, S. Kintali, K. Madduri, and M. Mihail, “Approximating Betewenness Centrality,” The 5th Workshop on Algorithms and Models for the Web-Graph (WAW2007), San Diego, CA, December 11-12, 2007.
• David A. Bader, Kamesh Madduri, Guojing Cong, and John Feo, “Design of Multithreaded Algorithms for Combinatorial Problems,” in S. Rajasekaran and J. Reif, editors, Handbook of Parallel Computing: Models, Algorithms, and Applications, CRC Press, Chapter 31, 2007.
• Kamesh Madduri, David A. Bader, Jonathan W. Berry, Joseph R. Crobak, and Bruce A. Hendrickson, “Multithreaded Algorithms for Processing Massive Graphs,” in D.A. Bader, editor, Petascale Computing: Algorithms and Applications, Chapman & Hall / CRC Press, Chapter 12, 2007.
• D.A. Bader and K. Madduri, “SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the exploration of large-scale networks,” 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Miami, FL, April 14-18, 2008.
53David A. Bader
Bader, Related Recent Publications (2009-2010)• S. Kang, D.A. Bader, “An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest
of Sparse Graphs,” 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Raleigh, NC, February 2009.
• Karl Jiang, David Ediger, and David A. Bader. “Generalizing k-Betweenness Centrality Using Short Paths and a Parallel Multithreaded Implementation.” The 38th International Conference on Parallel Processing (ICPP), Vienna, Austria, September 2009.
• Kamesh Madduri, David Ediger, Karl Jiang, David A. Bader, Daniel Chavarría-Miranda. “A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets.” 3rd Workshop on Multithreaded Architectures and Applications (MTAAP), Rome, Italy, May 2009.
• David A. Bader, et al. “STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation.” 2009.
• David Ediger, Karl Jiang, E. Jason Riedy, and David A. Bader. “Massive Streaming Data Analytics: A Case Study with Clustering Coefficients,” Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April 2010.
• Seunghwa Kang, David A. Bader. “Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a Highly Multithreaded System:,” Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April 2010.
• David Ediger, Karl Jiang, Jason Riedy, David A. Bader, Courtney Corley, Rob Farber and William N. Reynolds. “Massive Social Network Analysis: Mining Twitter for Social Good,” The 39th International Conference on Parallel Processing (ICPP 2010), San Diego, CA, September 2010.
• Virat Agarwal, Fabrizio Petrini, Davide Pasetto and David A. Bader. “Scalable Graph Exploration on Multicore Processors,” The 22nd IEEE and ACM Supercomputing Conference (SC10), New Orleans, LA, November 2010.
54David A. Bader
Acknowledgment of Support
55David A. Bader
Top Related