DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI +...
Transcript of DAQ architectures; Main Hardware components€¦ · 10x10 Gb/S 20 :1 MUX CWDM Processor unit PCI +...
DAQ architectures;Main Hardware components
Babak AbiDUNE DAQ Simulations Meeting
19 Oct 2017
1
Review :• Assumptions (requirements!)• DAQ GPU based• DAQ FPGA based• DAQ Hybrid scheme
DAQ initial notions • The assumption for architecture :
1. Minimum 10s buffer to 10s of seconds
2. Pre full event construction primitive triggers + global trigger • Rough estimation of resource in terms of GPU and FPGA needed
3. 1uS timing accuracy, although it does not changes the architecture significantly, shows if it’s confirmed! How much timing hardware would be simpler.
4. Focused on TPC up-stream data, downstream data, command low voltage regulators, monitoring .... is almost independent of architectures
5. Estimate on realistic technology available on 2022 and upward (particularly GPU based DAQ)
2
DaqD1: Minimum In-detector architecture
3
APA 2560 channels
100Gb80 x 1.25Gb serial
Links
PC- backend DAQ
1x fiber pairs 2Km to surface
Network
Detector side Off-Detector side
at Surface
10x10Gb/S 20:1 MUX CWDM
Processor unit PCI + GPUBuffer +
Tagging + compression
GPU
Processor unit PCI +GPU Buffer +
Tagging + compression
GPU
FPGA/ASICMUX
10 x 10Gb serial Links
APA 2560 channels
100Gb80 x 1.25Gb serial
Links
FPGA/ASICMUX
10 x 10Gb serial Links
50MHz Clock distribution
Global Time PTP network
CAT6
10x10Gb/S
10x10Gb/S
Global Trigger CAT6
Another alternatives like :It is possible to send DATA through TCP/UDP Packets through optical fibre but constrains on NIC and FPGA!
DaqD1: Minimum In-detector architecture
4
1 APA requires :
1 Sever, 1 PCI card, 2/4 GPU, 2x10GE NIC
200GB DDR4 RAM as Ring Buffer
• PCI card • PCI 5 ready in 2019, 63 GB/s(x16)• Commercially availed BUT an RCE
with PCI extension can do the job!
• Minimum Firmware R&D
• Ring Buffer time stamp: PTP Time
• All CPU/NIC clock accuracy 100-
500nS
• All server NICs are PTP enables and has at
least two NIC port , One dedicated for DATA
transport and the other is for PTP, trigger,
monitoring ….
• GPU, easy to program, almost zero R&D
PC- backend DAQ
Network
Off-Detector side
at Surface
GPU
Global Time PTP network
CAT6
Global Trigger CAT6
PCIGPU NIC
Global Time PTP network
CAT6
Ring Buffer
Time PTP
1.Most of the components are off-shell, minimum R&D and maximum flexibility
2.It is cost effective, low maintenance and rapid R&D with minimum usage of FPGA and related custom build components.
3.Commercial availability of Optical C&D/WDM Transceiver/Ethernet Optical Interfaces 100Gb/s ( to 400Gb in near future)
4.Very low power consumption ( in-detector cave )
5.Extremely modular and scalable
DAQ GPU based • Proof of concept and study were done in GPU computation farm with Titan X gpu(slightly out
dated) but results should be extrapolated to 2020 technology. (Phil’s talk)
• Volta GPU is already in market with higher specifications.
• We know it is possible the question is how many GPUs we need for process all data from 1 APA?
5
• Titan X specifications:• Core config: 3584:224:96 • Base core clock (MHz): 1500• Processing power (GFLOPS); 10157 Single precision • Memory Size (GiB) 12 GDDR5X • Bandwidth (GB/s) 480• SLI HB (High Bandwidth) bridge between two GPU.• Pick of power Consumption 250W• Launch price (USD): 1200$• Data transfer links :• PCIe 3.0 x16 : 32GB/s• NVLink 2.0 25 GT/s or 300 GByte/s (in/out)
Time need to process :1-Move DATA from PCIe to DRAM: X2-Move data 16GB/s to GRAM: X3-Do process and create trigger primitives: Y 4- Send the hits info to CPU: Z• On a SINGLE PCIe bus 2X+Y+Z <1 (servers with )• Z is neglectable compared to X and Y<X so 3X<1 is a fair
assumption. So if X=16GB/s -> 3X=48GB/s• Servers with 2+ separate PCIe 4.0 bus data transport is
64GB/s • 2019 PCIe 5.0 is 63GB/s • If we need more banwidth, NVlink is 300 GByte/s • Servers with multiple CPU and PCIe buses would reduces
above numbers or using GPUDirect RDMA This rate is for trough-put only PCIe is duplex and rates are under-estimation, look at back up slide
DAQ GPU based, major components Per One APA : (should be update to 2020 or 2022 )
• 1 Server (multi CPU) with 2 10GE NIC or 1 10GE and 1 1GE (w PCIe5)
• 256GB DRAM (20S buffer)
• 2 GPU (4 GPU in case of high noise or “reserved capacity”)• NVDIA Titan X 1000$!
• ½ CWDM MUX 350$ (retail price)
• 1-2 PCIe interface card with 2QSFP+ or 2x5 SFP+??• It can be acquired commercially or design/build in house
• RCE DPM with PCIe extension( SLAC ?)
• 10xFPGA Serializers (1.2GB/s to 10G) at flange• Low-end Kintex XC7K70T(8xGTX 12.5 Gb/s) 100$
• Or 5 XC7K325(24xGTX 12.5 Gb/s) 400$
• PTP enabled 1GB switch for time distribution to 1G NIC port.(cost subtracted from time system)
6
3K(server)+2K(GPU)+.4k(Mux)+2K(flange MUX)+ 3-5K(PCIe card) ~ 10.4k- 12.4k per APA
DAQ FPGA based • We need an accurate estimation of resources/gate and reserved capacity
required per APA that everybody agreed.
1. The assumption is based on Giles's suggestion on low/high noise model that could be easily adapted to high noise.
2. Architecture is based on Modularity and Scalability of a single FPGA board on a rack with one SFP+ 10GE output to transport trigger DATA and a reserved 1/10GE port for trigger primitives and timing distribution.
3. Number of board per APA recommendation is 10-12 to be able also to send the full raw data (100Gb/s) through 10x10GE in case of high noise or low buffer.1. Signal proccing for 256 channels per board constrains the selection of FPGA,
2. 8x1.25Gb/S in and 10GE out scheme. with 16GB DDR4 DRAM.
4. Low noise scheme FPGA based can be combined with GPU based architecture. As a Hybrid scheme .
7
DAQ FPGA based; General concept
8
APA 2560 channels
100Gb80 x 1.25Gb serial
Links
PC- backend DAQ
1x fiber pairs 2Km to surface
Network
Detector side Off-Detector side
at Surface
10x10Gb/S 18:1 MUX CWDM
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
50MHz Clock distribution
Global Time PTP network
CAT6
10x10Gb/S
Global Trigger CAT6
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
1. Capability of interconnection links with lower cost with off-shell components.
2. Capability of high data rate throughput
3. Highly flexible and salable architecture.
DAQ FPGA based; FPGA single board • It is aligned with slac’s DPM design but need
Modifications to add 1. additional DDR4 RAM
2. 8 x SFP for cold cable
3. 2 x SFP+ or SFP+ and 1 x 1GE (connected to PS)
4. ZYNQ has 1GE (PS) that is already PTP enabled
• It is no COB or lite-COB scheme to
avoid redundant components.
• Cost estimation:
FPGA + DDR4 + PCB + components
Regard to resources needed it can be ZU4CG to ZU9EG. The price range is significant.
9
So No comment of cost before clear needed resource/gates is provided
DAQ Hybrid scheme• Low noise scheme FPGA based can be combined with GPU based
architecture. As a Hybrid scheme .
10
PC- backend DAQ
1x fiber pairs 2Km to surface
Network
Detector side Off-Detector side
at Surface
10x10GE 20:1 MUX CWDM
Processor unit PCI + GPUBuffer +
Tagging + compression
10GE NIC
Processor unit PCI +GPU Buffer +
Tagging + compression
10GE NIC
Global Time PTP network
CAT6
10x10Gb/S
Global Trigger CAT6
APA 2560 channels
100Gb80 x 1.25Gb serial
Links
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
50MHz Clock distribution Global Time
PTP network CAT6
10x10Gb/S
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
FPGA1 x 10GE 1 x 1GE
8 x 1.25G in
• FPGA boards are low-end FPGA for low noise scheme, doing simple signal processing rather then multiplexing cold data.
• Instead PCIe data transport card We can use multiple 10GE ports at server
• Signal processing and trigger primitives are divided between FPGA and GPU.
• GPUs or network can easily be scaled up if needed
Back up slides
11
PCIe transport rates • PCIe duplex, move data to DRAM and from DRAM to GRAM
12
Data fellow; Ring Buffer & Trigger Primitives
13
1- Lower bandwidth for triggered DATA2- Relaxed Trigger latency -> send through TCP packets back to buffer holder3- Global Time 4- Separate clock -event builder can handle triggered data ?
Timing tag
Trigger Processor Farm
SN,Proton,Cosmics
TPCFront-endElectronics
Global TriggerBeam,Random,Calibration
Board-ReaderEvent-Builder
BACK-END DAQ
SSPTrigger Primitive
Generator
PhotonDetector
Ring Buffer
TPCTrigger Primitive
Generator Timing tag
Ring Buffer
Triggered DATA