IPbus: A flexible Ethernet-based control system for xTCA ...

22
IPbus: A flexible Ethernet-based control system for xTCA hardware Tom Williams Rutherford Appleton Laboratory On behalf of the IPbus team (Bristol, Imperial, RAL, CERN) 24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 1

Transcript of IPbus: A flexible Ethernet-based control system for xTCA ...

IPbus: A flexible Ethernet-based control system for xTCA hardware

Tom Williams

Rutherford Appleton Laboratory

On behalf of the IPbus team (Bristol, Imperial, RAL, CERN)

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 1

Content

• What is IPbus?

• IPbus firmware and software suite:• Firmware core

• uHAL library

• ControlHub

• Control system topology

• Reliability testing

• Performance measurements

• Lessons learnt & future directions

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 2

Control system requirements

• Reliable• Main link for configuring, monitoring & debugging hardware

• Control system must have reliable and predictable behaviour under all conditions

• Scalable• 100’s of new xTCA electronics boards in CMS Phase-1 upgrades

• Simple• Ideally, same ease of setup and use from single ‘board on benchtop’ scenario

to final system

• Long maintainable lifetime• Industry-standard technologies

• Complexity in software rather than firmware• Software on commercial PC hardware easier to debug than firmware

• Preferably low latency and high bandwidth

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 3

What is IPbus?• Previously: used VME standard

• Dedicated signalling, arbitration and hardware access protocols

• xTCA standards (uTCA & ATCA)• Include industry-standard communication technologies – GbE & PCIe

• Used in e.g. LHC experiment upgrades

• Ethernet & IP:• Highly flexible technology, ubiquitous (incl. Gigabit Ethernet)

• IP-based networks: easily, cheaply scalable

• IPbus: A simple IP-based control protocol for xTCA• Designed for controlling xTCA hardware – i.e. read/write registers, etc.

• Originally created by Jeremy Mans et al in 2009/2010

• Now main development from UK collaboration (CMS upgrades)

• Recent focus on production-level reliability, performance, and scalability

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 4

The IPbus protocol

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 5

• A simple IP-based control protocol • Read & write (single register, block RAM, FIFOs)

• Atomic read-modify-write

• A32/D32

• Lies in application layer

of networking model • Transport protocol agnostic

• Contains recovery mechanism for dropped/reordered/duplicated UDP packets

• Extensively-tested SoC implementation in default IPbus firmware core

• Current version: 2.0• Released early 2013, focus on reliability and bandwidth

The IPbus suite• Defining a protocol is useful, but really need implementations …

1. IPbus firmware• Reference VHDL implementation of IPbus 2.0 UDP server

• Complete system-on-chip implemetation

• Interprets and implements IPbus transactions (read, write, …) on FPGA

2. uHAL library• C++ and Python end-user Hardware Access Library

• Design mimics recursive modularity of firmware blocks

3. ControlHub• Software application analogous to VME crate controller

• Mediates/Arbitrates simultaneous hardware access from multiple clients

• Implements IPbus reliability mechanism

• Documentation, installation instructions, etc – http://cactus.web.cern.ch

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 6

IPbus firmware core• Reference VHDL SoC implementation of IPbus 2.0 UDP server

• Currently Xilinx-specifc; but successfully adapted for Altera devices & custom ASICs

• Interprets IPbus transactions (read, write, …) on FPGA

• Transport protocol: UDP vs TCP• Main processing logic firmware (e.g. trigger algos) must fit on same FPGA

• TCP: complex algorithm• Implementing full protocol in FPGA => high resource usage

• UDP: Much simpler algorithm• Can implement in firmware with low resource usage

• Use UDP, correct for packet loss with IPbus-level reliability mechanism

• Also ICMP (unix ping command), ARP, and RARP (IP address assignment)

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 7

Resource usage FFs Slices BRAMs

Fully-featured 3500 2900 17

Minimal config 2000 1000 525% of smallest Spartan-6 chip

uHAL• C++ library providing end-user API for reads, writes, etc.

• Also has Python bindings

• Register layout specified in XML files• Reflect hierarchical and modular nature of firmware

• Promotes reuse and modularity of address table files

• Includes example GUI for hardware development

• Fast and scalable in

conjunction with

ControlHub

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 8

ControlHub• Software application analogous to VME crate controller

• Purpose:• Route uHAL IPbus traffic from multiple control applications to single board

• Implement packet-loss recovery over UDP

• Also, implementation must allow multiple clients to communicate with multiple targets reliably, efficiently & independently

• Implemented in Erlang:• Concurrent programming language developed by Ericsson (J Armstrong et al)

• Scales transparently across multiple CPU cores

• Standard libraries for creating high-availability, fault-tolerant applications

• Efficient, mature network protocol implementations

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 9

Example system topologies (1)• Simplest scenario: 1 board, 1 computer

• Simple network topology; may not need ControlHub

• Several IPbus targets• E.g. integration tests, test beam, …

• Multiple control/monitoring

applications

• ControlHub arbitrates

hardware access

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 10

Example system topologies (2)• Full-scale system, large experiment

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 11

Testing with realistic layout

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 12

• Extensive program of reliability testing & performance measurements carried out with uTCA hardware for CMS upgrades

• One uTCA crate, 2 rack PCs:• 12 AMCs (GLIBs & Mini-T5s)

• Both Vadatech & NAT MCHs tested• MCH contains Ethernet switch

• Same network components

as planned for final system

System reliability• Tested full chain: uHAL – ControlHub – firmware

• Including recovery from software-induced packet loss

• Continuous testing over many hours• Random sequences of read/write & read-modify-write

• Continually verifying registers have correct values

• O(1010) transactions to various boards (GLIB, MP7, Mini-T5)

• No errors!

• Software also tested nightly• wide range of unit test executables

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 13

Performance (1)• Definitions:

• Latency = Time taken for uHAL client to perform IPbus transaction

• Throughput = Data transferred / Latency

• 1 client, 1 target:• Larger single-word latency wrt VME/PCIe

compensated by multiple reads/writes per packet, and multiple packets in flight

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 14

// E.g. for read ...

timer.start();

client->readBlock(addr, size,

uhal::NON_INCREMENTAL);

client->dispatch();

timer.stop();

100kB1MB

10kB

Number of words Number of words

Performance (2)• Polling register in targets

• Each client continuously polls 1 target

• 1 to 12 targets; 1, 2 or 4 clients per target …

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 15

Performance (3)• Block writes/reads, multiple targets

• For reads, could get congestion-induced packet loss at MCH switch

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 16

Performance (4)• Block writes/reads, multiple targets

• 1 client per target; 600MB read from / written to crate

• Default IPbus software setup

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 17

NAT MCH Vadatech MCH

Performance (5)• Block writes/reads, multiple targets

• 1 client per target; 600MB read from / written to crate

• Reducing number of IPbus packets in flight (edit ControlHub config file)• Lowers 1 client <–> 1 target throughput by 12%

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 18

NAT MCH Vadatech MCH

Lessons learnt• You can (almost) never have too much testing

• Attack the problem (system reliability) from as many angles as possible

• Software unit tests, full chain tests with real hardware, …

• If possible, test early with planned hardware

• Hardware vendors may have different interpretations of standards• Affecting seemingly trivial day-to-day operation tasks

• E.g. MAC & IP address management in CMS by IPMI

• Need a robust, failsafe system that works through all scenarios

• … have found subtly differing behaviours of MCHs from different vendors

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 19

Conclusions• IPbus protocol and suite

• Being integrated into CMS for LHC Run 2

• Also in ATLAS & ALICE upgrades; FNAL g-2, SOLID

• Past few years: Significant progress on reliability and performance• Solving subtle, rare bugs

• Improving system scalability

• Utilising bandwidth of Gigabit Ethernet

• In (pre-)production environment

• Future plans:• Code consolidation; improve debugging of error cases in complex systems

• IPbus locking mechanism for exclusive access to hardware from single client

• Update IPbus suite for 10 Gigabit Ethernet

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 20

More information• IPbus protocol and suite

• Extensively-tested, tightly-integrated suite with Gigabit performance

• Easily scalable control system

• Applicable to any hardware

with Ethernet interface

• For more information …• Main page:

http://cactus.web.cern.ch/

• Firmware:

https://svnweb.cern.ch/trac/cactus/wiki/IPbusFirmware

• Software (uHAL + ControlHub):

Quick start tutorial (Easy installation on SL(C) 5/6)

https://svnweb.cern.ch/trac/cactus/wiki/uhalQuickTutorial

• Bug reports, feature requests, clarifications

https://svnweb.cern.ch/trac/cactus/newticket24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 21

Backup: IPbus firmware design• Bus topology

24/09/2014 IPbus -- Tom Williams -- TWEPP 2014 22