DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI +...

44
March 2016 Service Change Proposal— Link Connections King County Council-TrEE Committee September 29, 2015

Transcript of DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI +...

Page 1: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ architectures;Main Hardware components

Babak AbiDUNE DAQ Simulations Meeting

19 Oct 2017

1

Review :• Assumptions (requirements!)• DAQ GPU based• DAQ FPGA based• DAQ Hybrid scheme

Page 2: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ initial notions • The assumption for architecture :

1. Minimum 10s buffer to 10s of seconds

2. Pre full event construction primitive triggers + global trigger • Rough estimation of resource in terms of GPU and FPGA needed

3. 1uS timing accuracy, although it does not changes the architecture significantly, shows if it’s confirmed! How much timing hardware would be simpler.

4. Focused on TPC up-stream data, downstream data, command low voltage regulators, monitoring .... is almost independent of architectures

5. Estimate on realistic technology available on 2022 and upward (particularly GPU based DAQ)

2

Page 3: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DaqD1: Minimum In-detector architecture

3

APA 2560 channels

100Gb80 x 1.25Gb serial

Links

PC- backend DAQ

1x fiber pairs 2Km to surface

Network

Detector side Off-Detector side

at Surface

10x10Gb/S 20:1 MUX CWDM

Processor unit PCI + GPUBuffer +

Tagging + compression

GPU

Processor unit PCI +GPU Buffer +

Tagging + compression

GPU

FPGA/ASICMUX

10 x 10Gb serial Links

APA 2560 channels

100Gb80 x 1.25Gb serial

Links

FPGA/ASICMUX

10 x 10Gb serial Links

50MHz Clock distribution

Global Time PTP network

CAT6

10x10Gb/S

10x10Gb/S

Global Trigger CAT6

Another alternatives like :It is possible to send DATA through TCP/UDP Packets through optical fibre but constrains on NIC and FPGA!

Page 4: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DaqD1: Minimum In-detector architecture

4

1 APA requires :

1 Sever, 1 PCI card, 2/4 GPU, 2x10GE NIC

200GB DDR4 RAM as Ring Buffer

• PCI card • PCI 5 ready in 2019, 63 GB/s(x16)• Commercially availed BUT an RCE

with PCI extension can do the job!

• Minimum Firmware R&D

• Ring Buffer time stamp: PTP Time

• All CPU/NIC clock accuracy 100-

500nS

• All server NICs are PTP enables and has at

least two NIC port , One dedicated for DATA

transport and the other is for PTP, trigger,

monitoring ….

• GPU, easy to program, almost zero R&D

PC- backend DAQ

Network

Off-Detector side

at Surface

GPU

Global Time PTP network

CAT6

Global Trigger CAT6

PCIGPU NIC

Global Time PTP network

CAT6

Ring Buffer

Time PTP

1.Most of the components are off-shell, minimum R&D and maximum flexibility

2.It is cost effective, low maintenance and rapid R&D with minimum usage of FPGA and related custom build components.

3.Commercial availability of Optical C&D/WDM Transceiver/Ethernet Optical Interfaces 100Gb/s ( to 400Gb in near future)

4.Very low power consumption ( in-detector cave )

5.Extremely modular and scalable

Page 5: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ GPU based • Proof of concept and study were done in GPU computation farm with Titan X gpu(slightly out

dated) but results should be extrapolated to 2020 technology. (Phil’s talk)

• Volta GPU is already in market with higher specifications.

• We know it is possible the question is how many GPUs we need for process all data from 1 APA?

5

• Titan X specifications:• Core config: 3584:224:96 • Base core clock (MHz): 1500• Processing power (GFLOPS); 10157 Single precision • Memory Size (GiB) 12 GDDR5X • Bandwidth (GB/s) 480• SLI HB (High Bandwidth) bridge between two GPU.• Pick of power Consumption 250W• Launch price (USD): 1200$• Data transfer links :• PCIe 3.0 x16 : 32GB/s• NVLink 2.0 25 GT/s or 300 GByte/s (in/out)

Time need to process :1-Move DATA from PCIe to DRAM: X2-Move data 16GB/s to GRAM: X3-Do process and create trigger primitives: Y 4- Send the hits info to CPU: Z• On a SINGLE PCIe bus 2X+Y+Z <1 (servers with )• Z is neglectable compared to X and Y<X so 3X<1 is a fair

assumption. So if X=16GB/s -> 3X=48GB/s• Servers with 2+ separate PCIe 4.0 bus data transport is

64GB/s • 2019 PCIe 5.0 is 63GB/s • If we need more banwidth, NVlink is 300 GByte/s • Servers with multiple CPU and PCIe buses would reduces

above numbers or using GPUDirect RDMA This rate is for trough-put only PCIe is duplex and rates are under-estimation, look at back up slide

Page 6: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ GPU based, major components Per One APA : (should be update to 2020 or 2022 )

• 1 Server (multi CPU) with 2 10GE NIC or 1 10GE and 1 1GE (w PCIe5)

• 256GB DRAM (20S buffer)

• 2 GPU (4 GPU in case of high noise or “reserved capacity”)• NVDIA Titan X 1000$!

• ½ CWDM MUX 350$ (retail price)

• 1-2 PCIe interface card with 2QSFP+ or 2x5 SFP+??• It can be acquired commercially or design/build in house

• RCE DPM with PCIe extension( SLAC ?)

• 10xFPGA Serializers (1.2GB/s to 10G) at flange• Low-end Kintex XC7K70T(8xGTX 12.5 Gb/s) 100$

• Or 5 XC7K325(24xGTX 12.5 Gb/s) 400$

• PTP enabled 1GB switch for time distribution to 1G NIC port.(cost subtracted from time system)

6

3K(server)+2K(GPU)+.4k(Mux)+2K(flange MUX)+ 3-5K(PCIe card) ~ 10.4k- 12.4k per APA

Page 7: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ FPGA based • We need an accurate estimation of resources/gate and reserved capacity

required per APA that everybody agreed.

1. The assumption is based on Giles's suggestion on low/high noise model that could be easily adapted to high noise.

2. Architecture is based on Modularity and Scalability of a single FPGA board on a rack with one SFP+ 10GE output to transport trigger DATA and a reserved 1/10GE port for trigger primitives and timing distribution.

3. Number of board per APA recommendation is 10-12 to be able also to send the full raw data (100Gb/s) through 10x10GE in case of high noise or low buffer.1. Signal proccing for 256 channels per board constrains the selection of FPGA,

2. 8x1.25Gb/S in and 10GE out scheme. with 16GB DDR4 DRAM.

4. Low noise scheme FPGA based can be combined with GPU based architecture. As a Hybrid scheme .

7

Page 8: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ FPGA based; General concept

8

APA 2560 channels

100Gb80 x 1.25Gb serial

Links

PC- backend DAQ

1x fiber pairs 2Km to surface

Network

Detector side Off-Detector side

at Surface

10x10Gb/S 18:1 MUX CWDM

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

50MHz Clock distribution

Global Time PTP network

CAT6

10x10Gb/S

Global Trigger CAT6

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

1. Capability of interconnection links with lower cost with off-shell components.

2. Capability of high data rate throughput

3. Highly flexible and salable architecture.

Page 9: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ FPGA based; FPGA single board • It is aligned with slac’s DPM design but need

Modifications to add 1. additional DDR4 RAM

2. 8 x SFP for cold cable

3. 2 x SFP+ or SFP+ and 1 x 1GE (connected to PS)

4. ZYNQ has 1GE (PS) that is already PTP enabled

• It is no COB or lite-COB scheme to

avoid redundant components.

• Cost estimation:

FPGA + DDR4 + PCB + components

Regard to resources needed it can be ZU4CG to ZU9EG. The price range is significant.

9

So No comment of cost before clear needed resource/gates is provided

Page 10: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

DAQ Hybrid scheme• Low noise scheme FPGA based can be combined with GPU based

architecture. As a Hybrid scheme .

10

PC- backend DAQ

1x fiber pairs 2Km to surface

Network

Detector side Off-Detector side

at Surface

10x10GE 20:1 MUX CWDM

Processor unit PCI + GPUBuffer +

Tagging + compression

10GE NIC

Processor unit PCI +GPU Buffer +

Tagging + compression

10GE NIC

Global Time PTP network

CAT6

10x10Gb/S

Global Trigger CAT6

APA 2560 channels

100Gb80 x 1.25Gb serial

Links

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

50MHz Clock distribution Global Time

PTP network CAT6

10x10Gb/S

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

FPGA1 x 10GE 1 x 1GE

8 x 1.25G in

• FPGA boards are low-end FPGA for low noise scheme, doing simple signal processing rather then multiplexing cold data.

• Instead PCIe data transport card We can use multiple 10GE ports at server

• Signal processing and trigger primitives are divided between FPGA and GPU.

• GPUs or network can easily be scaled up if needed

Page 11: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

Back up slides

11

Page 12: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

PCIe transport rates • PCIe duplex, move data to DRAM and from DRAM to GRAM

12

Page 13: DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI + GPU Buffer + Tagging + compression GPU Processor unit PCI +GPU Buffer + Tagging

Data fellow; Ring Buffer & Trigger Primitives

13

1- Lower bandwidth for triggered DATA2- Relaxed Trigger latency -> send through TCP packets back to buffer holder3- Global Time 4- Separate clock -event builder can handle triggered data ?

Timing tag

Trigger Processor Farm

SN,Proton,Cosmics

TPCFront-endElectronics

Global TriggerBeam,Random,Calibration

Board-ReaderEvent-Builder

BACK-END DAQ

SSPTrigger Primitive

Generator

PhotonDetector

Ring Buffer

TPCTrigger Primitive

Generator Timing tag

Ring Buffer

Triggered DATA