DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

24
DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Transcript of DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Page 1: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

DiFX Performance Testing

Chris Phillips

eVLBI Project Scientist

25 June 2009

Page 2: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

DiFX history

• Developed by Adam Deller at Swinburne University of Technology (now NRAO) to replace LBA S2 correlator to allow disk based correlation

• Production correlator of the LBA (Australia) since 2007

• Verified against LBA, VLBA and Bonn hardware correlators

Page 3: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

DiFX overview

• FX-style correlator implemented in C++ • 95% optimised C vector function call

(Heavy reliance of Intel IPP libraries)

• Non-clocked system, unlike HWCs• Maximum performance without compromising generality or ease of maintenance

• Modular design to support generality and enable “3rd party” contributors and local system optimisation

Page 4: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Capabilities

• Near-arbitrary time and frequency resolution• Advanced pulsar gating• eVLBI (LBA has done 1 Gbps eVLBI)• Correlate anything it can unpack (1/2/4/X Gbps)

• Most new formats easy to implement

Page 5: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Supported formats

• Input• LBA• Mk5A (Mk4/VLBA)• K5 (via translation)• Mk5B• VDIF(end 2009)

• Output• RPFITS, FITS-IDI

Page 6: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Current users

• Long Baseline Array (Australia)• VLBA (USA)• MPIfR (Bonn, Germany)• AuScope geodetic array (Australia/NZ, 2009)• E-LOFAR (EU)

Page 7: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Future/Imminent Capabilities

• Single pass, multiple phase center's• Improved (faster) fringe rotation• Band matching

• eg 2x64MHz with 1x128MHz

• Baseband pulsar "folder"• Native geodetic output format• Phase cal extraction• Frequency division multiplexing of VDIF• Polyphase filterbank

Page 8: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

DiFX architecture

Master Node

Core 1DataStream 1

DataStream 2

DataStream N

Core 2

Core M

… …

Timerange, destination

Baseband data

Visibilities

Source dataSource data

MPI is used for inter-process communications

Each data transfer is double buffered

Large, segmented ring buffer

Up to 100s MB/a few or more seconds Visbility buffer

Visbility buffer

Visbility buffer

processing buffer

processing buffer

processing buffer

Page 9: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Computational Distribution

• Currently: only time division multiplexing• VDIF will allow frequency division multiplexing: implementation style?

• As currently implemented all baselines must still be correlated on one Core

Page 10: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Benchmarking

• Need to eliminate disk i/o go get clear indication of potential speed of specific setup

• eVLBI!• Live eVLBI not suitable as fixed data rate

• VLBIFAKE program generates eVLBI data stream

• LBADR, Mark5B and VDIF• TCP and UDP• Only TCP usable for benchmarking

• Shell script to run correlator and save logs• Rate determined by median transfer from VLBIFAKE

CSIRO. eVLBI-Aus

Page 11: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Cuppa

• 20 nodes, dual CPU Quad core• 6 stations• Up to 12 processing nodes• Testing number of threads and processing cores

CSIRO. eVLBI-Aus

Page 12: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Scaling with Cores

Page 13: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Date Rate Per Compute Node

Page 14: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Scaling with Threads

Page 15: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Scaling with Threads

Page 16: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Scaling with Spectral Points

Page 17: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Scaling with Stations

Page 18: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

APSR

• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes

CSIRO. eVLBI-Aus

Page 19: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

APSR

• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes

CSIRO. eVLBI-Aus

Page 20: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.
Page 21: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Date Rate Per Compute Node

Page 22: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Code collaboration status

• Entire codebase has been organised on SVN (hosted by ATNF)

• DiFX wiki (hosted by Curtin): http://cira.ivec.org/dokuwiki/doku.php/difx/index

• Mailing list: [email protected]• To get on the difx-users list, search out difx-users on google groups and request access, or email me

Page 23: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

Thank you

ATNFChris PhillipseVLBI Project Scientist

Phone: +61 2 93724608Email: [email protected]: www.atnf.csiro.au/vlbi

Page 24: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009.

Benchmarks

• Non-clocked system, unlike HWCs• Indicative number of CPU cores required to correlate at real time:

• LBA @ 1 Gbps (256 MHz agg. b/w, 2 bit): 100• VLBA @ 4 Gbps (1 GHz agg. b/w, 2 bit): 800

• Weak dependencies on e.g. num. channels• 160 CPU core system (exceeding VLBA HWC capacity) costs <$100k inc. networking, annual electricity ~$10k