Migrating Unicore Network Packet Processing Applications ... · Comparison – Migration Strategies...
Transcript of Migrating Unicore Network Packet Processing Applications ... · Comparison – Migration Strategies...
TM
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009.
Challenges and Techniques (1.0)
Migrating Unicore Network Packet Processing Applications to Multicore
Wilson LoArchitect, Network Software Division, NMG
August, 2009
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 2
Outline
►Characteristics of Unicore Network Packet Processing Applications• Potential areas to be considered for multicore migration
►Multicore Migration Strategies (SoC with homogenous cores)• SMP/AMP/Hybrid/Partitioning• Advantages/Disadvantages
►Deep Dive into SMP Strategies for multicore-based real applications• Stateless Packet processing applications
E.g. Routers/Bridges• Stateful Packet processing Applications
E.g. Stateful Inspection Firewall• Proxy-based Applications
E.g. POP3, IMAP Proxy with Anti Virus & Anti Spam, Web Proxy with URL Filtering►QorIQ™ P4080 features for SMP/AMP/Hybrid►Examples for Migration Models
• Example for SMP Model • Example for Hybrid Model
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 3
Unicore Network Packet Processing Applications - Characteristics
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 4
Stateless Packet Processing Applications - Characteristics
Examples Characteristics
No Special State maintained across packets
Lookup tables – e.g. Routing DB/Forwarding DB- Looked up on a per packet basis (Frequent reads)- DB update infrequent (Infrequent writes)- Potential large number of routes/FDB entries
Statistics- Updated on per packet basis (Frequent writes)- Read by management applications (Infrequent reads)
Switches, Routers etc.
Packet ordering- Typically packets are not re-ordered, with exceptions such as QoS
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 5
Stateful Packet Processing Applications - Characteristics
Examples Characteristics
Session data structure to track states across packets belonging to a connection-Session Lookup and update on per packet basis (Frequent reads and Frequent Writes)
Typically millions of sessions maintained in a Session Table
Sessions fairly independent of each other, with a few exceptions
Simultaneous access of a session across multiple processing contexts is infrequent
Deletion based on inactivity/lifetime and/or special packets
Packet Ordering-Packet re-ordering across sessions is not expected.-Packets should not be re-ordered within a flow
Offload accelerators - Pattern matching engine, Security Engine
Configuration Data-Frequent access upon session creation (Frequent reads)-Infrequent updates by management agents
Stateful Inspection Firewall, IPS, IPSec-VPN etc.
Statistics-Update within a session - Frequent updates on a per packet basis-Update across a session – Relatively infrequent, but not necessarily small.-Infrequent reads by management applications.
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 6
Proxy Applications - Characteristics
Examples Characteristics
Socket based applications – typically multithreaded-Master and Worker, threads or processes-Threads/Processes handle multiple sessions
Session fairly independent of each other, with some exceptions
SMTP, IMAP, POP3 Proxy with Anti Virus/Anti Spam,
HTTP Proxy with URL Filtering, etc.
Offload dedicated computational chores to other processes.
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 7
Application Migration Strategy for Multicore
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 8
Migration Strategies – Asymmetric Multiprocessing (AMP)
►Single Image Multifunction AMP• Map different functions of application to
different cores
• Packets visit multiple cores for different function processing
• Queues and Inherent pipelining
• Example – Firewall/IPS/IPSec-VPN in three different cores
SINGLE IMAGE AMP
IMAGE-1CORE-1
-Function 1
CORE-2-Function 2
CORE-3-Function 3
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 9
Migration Strategies – AMP (Contd..)
►Multiple independent applications • Independent application images or
functions mapped to different cores• In this scenario, packets are distributed
such that a given packet needs to visit only one specific core for application processing.
Example - Based on VLAN ID to customer mapping, a specific core could be dedicated to particular customer
►Multi Image AMP• Simple variation of Single Image AMP• Different images run on different cores
each offering a specific function• Packets visit images through queues for
complete processingUse Case – Integration with third party images
IMAGE3
IMAGE2
IMAGE1
MULTI IMAGE AMP
CORE-1-Function 1
CORE-2-Function 2
CORE-3-Function 3
IMAGE-2
IMAGE-3
MULIPLE INDEPENDENT APPS
IMAGE-1CORE-1
-Function 1
CORE-2-Function 1
CORE-3-Function 1
Fig-1 Fig-2
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 10
Migration Strategies – Symmetric Multiprocessing (SMP)
►All cores do all functions; no need for function to core mapping
►Packets typically visit one core for all processing
• Runs packet processing to completion
►Example Use Case• Security Appliance that includes Firewall,
IPSec-VPN and IPS functions running on Linux Kernel or Bare Metal OS
IMAGE1
CORE-1-Function 1-Function 2-Function 3
CORE-2-Function 1-Function 2-Function 3
CORE 3-Function 1-Function 2-Function 3
SINGLE IMAGE SMP
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 11
Migration Strategies – Hybrid SMP and AMP
►SMP and AMP in single image
• Some functions in SMP mode on some cores
• Some functions in AMP on some cores
• Queues between the functions
• Example -Some cores running Firewall/VPN/IPS in SMP mode, One core running SSL-VPN and the other core running Proxy applications
Single Image
CORE-1 SMP-Function 1 –Function 2 –Function 3
CORE-2 SMP-Function 1 –Function 2 –Function 3
CORE-3 AMP-Function4
CORE-4 AMP-Function5
IMAGE 1
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 12
SMP and SMP Image
CORE-1-Function 1 –Function 2 –Function 3
CORE-2-Function 1 –Function 2 –Function 3
IMAGE 1 SMP
CORE-1-Function 4 –Function 5
CORE-2-Function 4 –Function 5
IMAGE 2 SMP
Migration Strategies – Hybrid SMP and AMP (contd..)
►SMP and AMP in separate images (Partitioning)
• Some cores run SMP image• Some cores run AMP image• Queues between the images• E.g.- Third party or legacy application
integration
►Multiple SMP Images used in AMP model (Partitioning)
• Some cores run one SMP image• Some cores run another SMP image• Queues for communication between
images• E.g. – Control Plane/Data Plane
model.IMAGE 3
AMP
CORE-1-Function 1 –Function 2 –Function 3
CORE-2-Function 1 –Function 2 –Function 3
IMAGE 1 SMP
CORE-3-Function4
IMAGE 2 AMP
CORE-4-Function5
SMP And AMP Image
Fig-1 Fig-2
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 13
Comparison – Migration StrategiesAdvantages Disadvantages
AMP (Single Image or Multiple Images)
► Minimal and easier changes in code; Fast development
► Efficient use of core specific caches for code (IC)
► Pipelining IssuesLatency may increaseOut of order
Each core runs different functions. Since some packets may have to traverse additional cores for required processing while others do not, there is high chance of out-of-order packet delivery.
► Uneven core utilizationDifficult to predict traffic behavior
Some functions require more cycles than others► Scaling issues when there is less functionality and
more cores available to map.
SMP
► Better utilization of processing power
Any core can do any work.Cores are not reserved for
functionality.► Improved latency
•No pipelining issues
► Code changes may be required. • Single threaded user space applications, Linux kernel space or bare metal applications do require code changes
► Possible performance Impact [*]Cache thrashing if cache is dedicated to the coreContention for protected shared data
►In AMP memory required for each function has to be reserved and assigned at design time►In SMP memory is accessed by all cores and does not require any reservation.[*] SMP Techniques discussed later in this session help in reducing the performance degradation due to SMP
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 14
Implementation of SMP Strategies for Multicore-based Real Applications
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 15
Stateless Application - Migration into SMPMapping Techniques
Characteristics Mapping TechniqueNo special state maintained across packets
No special considerations
Lookup Tables -Potential large number of routes/FDB entries-Looked up on a per packet basis (Frequent reads)-DB update infrequent (Infrequent writes)
Read/Write lock for routing table, forwarding table
-Packet forwarding with read lock-Routing table update with write locks
Statistics-Updated on per packet basis (Frequent writes)-Read by management applications (Infrequent reads)
-Per core statistics + consolidation-Decorated storage(P4080)
Packet ordering- Typically packets are not re-ordered, with exceptions such as QoS
Special hardware features such as ORP in P4080
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 16
Stateless Application - Migration into SMPOther Details
Performance degradation due to SMP – How to mitigate?►Most multicore processors mitigate cache thrashing by having shared L2/L3 cache.
► Contention for protected shared resources is much less because of infrequent updates.
Expectations from multicore hardware for packet distribution
►Packet Distribution-Interface based-VLAN header based-3 Tuple based-Round Robin scheduling of interrupts to cores
Priority-based scheduling-When core asks for work, hardware returns the next packet based on QoS priority.
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 17
Stateful Application - Migration into SMPMapping Techniques
Characteristics Mapping TechniqueSession to track states across packets belonging to a connection
-Frequent reads (Lookup) and writes (updates) on per packet basis►Session parallelization technique
•Map multiple sessions across multiple cores such that one session is processed by only one core at any point of time.
Sessions maintained in Session Table ►Read/Write Lock • Write mode for session creation and deletion; read mode during packet processing
Sessions fairly independent of each other, with few exceptions ►Use locks
Simultaneous access of session across multiple processing contexts ►Use Reference counting technique
Deletion based on inactivity/lifetime and/or special packets ►Multi-step delete handling i.e.- Mark first, delete later
Packet Ordering-Packet re-ordering is not expected.-Packets should not be re-ordered within a flow as new packets processing is impacted by state left by previous packet of the session.
►Special hardware features such as ORP, FMAN capabilities in P4080►Backlog queue in session parallelization technique
Configuration data-Frequent reads (upon session creation) and Infrequent writes (upon update)
►Read/Write Lock•Read mode for packet processing; Write mode during update
Statistics-Update within a session - Frequent updates on a per packet basis-Update across a session – Relatively infrequent, but not necessarily small.-Infrequent reads by management applications.
►Within a session -Implicitly handled by session parallelization technique►Across session -Decorated storage (P4080) /Per core statistics
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 18
Stateful Application - Migration into SMP Session Parallelization Technique
Why it is needed Description of Technique
►Removes unnecessary core spinning if same session were to be processed simultaneously by multiple cores
►One session being processed in only one core at any point of time►While core x processes session a, core y can process session b and so on.►Multiple sessions mapped to multiple cores
► Variations of TechniqueStatic session pinning to core where all packets of a given session can be made to
be processed by same core by hardware packet distributionNo Static session to core affinity. Packets of a given session can be processed by
different cores at different times
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 19
Stateful Application - Migration into SMP Session Parallelization Technique
t0 t1 t2 t3
Core 3
Core 1
Core 2
Queue in Backlog Q
Received Packet
Session Lookup
Session ‘IN USE’ == NO, Set Session ‘IN USE’
Session Function 1
Backlog Q != EMPTY; Dequeue Packet
Session Function 2
Received Packet
Session ‘IN USE’ == YES
Session Lookup
Queue in Backlog Q
Received Packet
Session ‘IN USE’ == YES
Session Lookup
Backlog Q Empty
Exit to Main Loop; Process next pkt
Exit to Main loop; Process Next packet
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 20
Stateful Application - Migration into SMP Other Techniques
Technique Why it is needed Description
Reference counting ►Operations on a session such as deletion or update may happen simultaneously from different cores.
►Increment reference count while session is being accessed.►Decrement reference count when session is no longer needed.►When reference is held, the session cannot be freed.
Deletion handling ►Because multiple references may be held, immediate deletion is not possible
►Multi-step deletionMarkup for deletion and
subsequent cleanup
Critical section ►Avoid race conditions ►Use Locks•Improve performance by using fine-grained locks
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 21
Stateful Application - Migration into SMP Other Details
Expectations from multicore hardware for packet distribution►Packet Distribution
-Interface based-VLAN header based-3 Tuple based or 5 Tuple based-Round Robin scheduling of interrupts to cores
►Priority based scheduling-When core asks for work, hardware returns the next packet based on QoS priority.
►Performance degradation due to SMP – How to mitigate?►Most multicore processors mitigate cache thrashing impact by having
shared L2/L3 cache.
►By virtue of session parallelization technique, resource contention is greatly reduced.
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 22
Proxy Applications- Migration into SMP
Characteristics Mapping
►Socket-based applications –typically multithreaded
Master and Worker, threads or processesThreads/Processes handle multiple sessions
Assign threads/processes to cores
►Sessions fairly independent of each other, with some exceptions
Use Mutex/Spin-locks to protect critical section/resources
►Offload dedicated computational chores to other processes
Assign threads/processes to cores.
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 23
Migration into SMP – Memory and Performance Implications
►Memory• Marginal increase in memory requirement
To support locksPer core statistics variables – duplicate variables across coresQueues and variables required for Session parallelization technique
►Performance• If there is no resource contention, there is no performance degradation.
Quantification – In a pure firewall application, performance increases almost linearly with more
cores.– If firewall has features such as rate limiting across sessions etc. (which causes
some resource contention) there could be around 5 to 10% performance degradation.
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 24
QorIQ™ P4080 Processor
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 25
QorIQ™ P4080 Processor
Feature Enabling Applications for MulticoreFMAN Packet distribution
QMAN Inter-function/Inter-Image Queuing in AMP/SMP or hybrid model
ODP/ORP Packet order preservation and restoration
Decorated Storage Statistics
Shared L3 cache Mitigates cache thrashing
Hypervisor Support Partitioning in AMP/SMP/hybrid model
Atomic variable References, Critical resources
Lock routines required in SMP are typically available in Linux® or bare metal OS
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 26
Migration Models - Examples
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 27
Migration Model – Single Image SMPVortiQa Software with UTM for Enterprises
• SMP Model• All cores
running all functionality in SMP Mode
•Functionality• Firewall• IPSec-VPN• IPS• Anti-X
Proxies
Model
Anti-X IKEV1/v2HA Infra.
Authentication
SMTP/S Proxy
POP3/S Proxy
HTTP Proxy
PKI (SCEP,OCSP)
XAUTH, EAP
IRAS, IRAC
L2TPoIPSEC
LAN/WAN Mgmt
DHCPC, DHCPS
PPPoE, PPTP, L2TP
Dyn DNS, DNSRD
Routing (RIP v1/v2)
IGMP proxy
VSRP
HA Monitor
HA Xport
LDAP
RADIUS
Local DB
Embedded Management: CLI, HTTP, LDSV, SYSLOG, EMAIL
TCP/IP
Firewall/NAT
ACLs ALGsProxy Infra
Intrusion Detection & Prevention Engine
P2P/IM Detection Engine
IPSec Engine
High Availability
Session Management
QoS- Traffic Policing QoS- Traffic Shaping
Ethernet, VLAN, Bridging, WAN Protocols, WAN Load Balancing
Frame Manager/Ethernet Controller Security Engine (SEC) Pattern Matching Engine
Hardware
User Space
Kernel Space
L3/L4 Attack
Defense
TMFreescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2009. 28
Data Plane
Migration Model – Control Plane/Data Plane Hybrid ModelVortiQa Software with UTM for Service Providers
►Hybrid Model►Control Plane
SMP►Data Plane
SMP►CP-DP communication
QMAN Functionality
FirewallIPSec-VPNIPS/P2P
Model
IKEV1/v2 Authentication
PKI (SCEP,OCSP)
XAUTH, EAP
IRAS, IRAC
LAN/WAN
DNSRD
Routing Protocols LDAP
RADIUS
Local DB
Embedded Management: CLI, HTTP, LDSV, SYSLOG, EMAIL
Firewall/NAT
ACLs ALGsProxy Infra
Intrusion Detection &
Prevention Engine
P2P/IM
Detection
Engine
IPSec Engine
Session Management
QOS- Traffic Policing & Shaping
Ethernet Interfaces, VLAN, Bridging
Frame Manager Security Engine (SEC) Pattern Matching Engine
Hardware - Data Path Acceleration Architecture (DPAA)
Control Plane
CP-DP Communications
CP-DP Communications
CP-DP Database Sync
Route Updater
ARP Helper
Interfaces Helper
CP-DP Dynamic Databases
Routing
Interfaces
ARP
VSGs
Light Weight Executive (LWE)
DP Monitoring
IP Reassembly
Queue Manager Buffer Manager
L3/L4 Attack
Defense
TM