Post on 27-Jun-2015
description
Running Monitoring Applica0ons on Accelerated Capture Engines
Nicola Bonelli
N. Bonelli, R.G Garroppo, L. Gazzarrini, S. Giordano, G. Procissi, F. Russo, G. Volpi
Agenda
• Capture engines overview • What’s new in PFQ (2.0)
• Accelerated pcap library – PF_RING, PF_RING+DNA, NETMAP, PFQ
• Pcap-‐perf: a tool for benchmarking pcap apps
• Experimental results
Speed maXers…
Accelerated Capture Engine
• Linux is provided with a default capture engine – the PF_PACKET socket
• Because of speed, other capture engines emerged: – 2004: PF_RING
• designed for single core, beXer performance than the then PF_PACKET
– 2011: PFQ • first to address mul0-‐core architecture and mul0-‐queues NICs (Best Paper Award @PAM2012)
– 2012: PF_RING-‐DNA • accelerated drivers (Intel)
– 2012: NetMap • accelerated drivers (Intel,Broadcom) (Best Paper Award @Usenix ATC’12)
… but what happens on these tracks?
What’s new in PFQ 2.0 • From capture engine to monitoring framework… • Improved performance
– ~14.8 Mpps single user-‐space thread
• Improved features: – compliant with a plethora of NICs: pfq-‐oma0c – monitoring groups and classes – in-‐kernel extensible engine for packet steering: dispatching, copying, cloning, filtering
– na0ve bindings: C, C++11, Haskell (more to come) – per-‐group filtering: BFP, vlan (un-‐tagging) – pcap library
Feature comparison PF_PACKET PF_RING 5.x PF_RING-‐DNA NETMAP -‐ 0813 PFQ 2.0
NIC * *, PF-‐AWARE (Intel, Broadcom)
only Intel 1/10G Intel 1/10G, forcedeth
* accelerated
Driver compat. * yes, non accel. no no yes, dynamic
mul0-‐core -‐ Hardware (RSS) Hardware (RSS) Hardware (RSS) Hw RSS + sog
mul0-‐queue yes (poor) yes yes yes yes
na0ve binding C C C C C, C++11, Haskell, Java, Python
groups -‐ -‐ -‐ -‐ yes
class -‐ -‐ -‐ -‐ yes
concurrent mon. yes yes commercial ? -‐ yes
clustering -‐ yes -‐ -‐ yes (MT, group)
steering -‐ -‐ commercial -‐ yes (MT, group)
STM state -‐ -‐ -‐ -‐ work in progress
Feature comparison PF_PACKET PF_RING 5.x PF_RING-‐DNA NETMAP -‐ 0813 PFQ 2.0
Pcap library yes yes yes buggy/incomplete yes
BPF (filters) yes (MT) yes (MT) yes (user-‐space) -‐ yes (MT, group)
vlan filters -‐ yes yes (hw Intel) -‐ yes (MT, group)
vlan untagging -‐ -‐ -‐ -‐ yes (MT, sog.)
Intel hw filters -‐ yes yes -‐ No
bloom filters -‐ -‐ -‐ -‐ work in progress
Accelerated PCAP library • Pcap library is the standard de-‐facto interface for packet capture • Accelerated capture engines provide their own pcap library:
– Both PF_RING and PF_RING-‐DNA provide a complete accelerated version
– NetMap provides an experimental and incomplete pcap support • BPF is missing
• PFQ provides a complete implementa0on – PFQ C-‐API mapped over pcap interface wherever possible,
implemented as environment variables otherwise – Clustering is enabled specifying mul0ple NICs in colon-‐separated
fashion, steering by means of PFQ_STEER variable
PFQ_GROUP=10 PFQ_STEER=ipv4-‐addr tcpdump –n –i eth2:eth3 PFQ_GROUP=10 PFQ_STEER=ipv4-‐addr tcpdump –n –i eth2:eth3
Pcap-‐perf
• Pcap-‐perf is a C++11 applica0on designed for benchmarking capture engines through pcap interfaces
• Support for mul0-‐threads, BPF filter and plug-‐ins:
plug-‐in kind
Null packet counter
IP checksum light CPU computa0on
MD5 CPU computa0on
SHA256 heavy CPU computa0on
Bloom Filter memory (linear)
Protocol Classifica0on memory tree
TCP/UDP flow counter memory (std::unordered_set)
Test-‐bed and measurements
• Intel Xeon 6 cores x5650 @2.67Ghz, 16G Ram + Intel 82599 10G (Debian Wheezy) • Accelerated drivers
– PF_RING: ixgbe 3.11.33 PF_RING-‐aware – PF_RING-‐DNA: ixgbe 3.10.16-‐DNA driver – Netmap: ixgbe driver shipped with the netmap package – PFQ: intel ixgbe 3.11.33 vanilla, recompiled through pfq-‐oma0c
• Best Interrupt affinity (MSI-‐X) – 4 or 5 kernel threads (NAPI) bound to fixed core (RSS), 1 or 2 user-‐space threads bound to
other core(s)
• Traffic is generated with randomized IP addresses, 64/128 bytes long UDP packets – using both PF_DIRECT and PF_RING-‐DNA
10 Gb link
mascara monsters
Coun0ng packets is useless
(na0ve speed)
uint64_t counter = 0;!! ! !for(;;)!! ! !{!
! ! !counter++;!! ! !}!
1 thread user-‐space (Intel 10G)
pcap library
Pcap library, 1 thread counter
Pcap, 1 thread counter, BPF=udp
Pcap, 1 thread counter, BPF=hXp || udp
pcap-‐perf
pcap-‐perf
pcap-‐perf with BPF = udp
pcap-‐perf (2 threads)
tcpdump
tcpdump –s 64 –i dev –w /ramdisk/dump.pcap (300M@14.8Mpps)
tcpdump –s 138 –i dev –w /ramdisk/dump.pcap (100M@~8Mpps)
tcpdump –i dev –w /ramdisk/dump.pcap vlan (5 Gbps)
tcpdump –i dev –w /ramdisk/dump.pcap ip host 192.168.0.10 (voip call)
Thanks for the aXen0on!
nicola.bonelli@cnit.it