SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at...
Transcript of SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at...
![Page 1: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/1.jpg)
SSLShader: Cheap SSL Acceleration with
Commodity Processors
Keon Jang+, Sangjin Han+, Seungyeop Han*,
Sue Moon+, and KyoungSoo Park+
KAIST+ and University of Washington*
1
![Page 2: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/2.jpg)
Security of Paper Submission Websites
2
Network and Distributed System Security Symposium
![Page 3: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/3.jpg)
Security Threats in the Internet
Public WiFi without encryption
• Easy target that requires almost no effort
Deep packet inspection by governments
• Used for censorship
• In the name of national security
NebuAd’s targeted advertisement
• Modify user’s Web traffic in the middle
3
![Page 4: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/4.jpg)
Secure Sockets Layer (SSL)
A de-facto standard for secure communication
• Authentication, Confidentiality, Content integrity
4
Client Server TCP handshake
Encrypted data
Key exchange using public key algorithm
(e.g., RSA) Server identification
![Page 5: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/5.jpg)
SSL Deployment Status
Most of Web-sites are not SSL-protected
• Less than 0.5%
• [NETCRAFT Survey Jan ‘09]
Why is SSL not ubiquitous?
• Small sites: lack of recognition, manageability, etc.
• Large sites: cost
• SSL requires lots of computation power
5
![Page 6: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/6.jpg)
SSL Computation Overhead
Performance overhead (HTTPS vs. HTTP)
• Connection setup
• Data transfer
Good privacy is expensive
• More servers
• H/W SSL accelerators
Our suggestion:
• Offload SSL computation to GPU
6
22x
50x
![Page 7: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/7.jpg)
SSL-accelerator leveraging GPU
• High-performance
• Cost-effective
SSL reverse proxy
• No modification on existing servers
SSLShader
7
SSLShader
Web Server
SMTP Server
POP3 Server
Plain TCP SSL-encrypted session
![Page 8: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/8.jpg)
Our Contributions
GPU cryptography optimization
• The fastest RSA on GPU
• Superior to high-end hardware accelerators
• Low latency
SSLShader
• Complete system exploiting GPU for SSL processing
• Batch processing
• Pipelining
• Opportunistic offloading
• Scaling with multiple cores and NUMA nodes
8
![Page 9: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/9.jpg)
CRYPTOGRAPHIC PROCESSING
WITH GPU
9
![Page 10: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/10.jpg)
How GPU Differs From CPU?
Intel Xeon 5650 CPU:
6 cores
NVIDIA GTX580 GPU:
512 cores
Control
ALU
ALU
ALU
ALU
ALU ALU
Cache
ALU
10
62×109 870×109 < Instructions / sec
Core
Cache
![Page 11: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/11.jpg)
void VecAdd(
int *A, int *B, int *C, int N)
{
//iterate over N elements
for(int i = 0; i < N; i++)
C[i] = A[i] + B[i]
}
VecAdd(A, B, C, N);
__global__ void VecAdd(
int *A, int *B, int *C)
{
int i = threadIdx.x;
C[i] = A[i] + B[i]
}
//Launch N threads
VecAdd<<<1, N>>>(A, B, C);
Single Instruction Multiple Threads (SIMT)
11
GPU code CPU code
Example code: vector addition (C = A + B)
1/3지점 8분 10초
![Page 12: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/12.jpg)
Parallelism in SSL Processing
12
Client 1
Client 2
Client N 1. Independent Sessions
SSL Record SSL Record SSL Record 2. Independent SSL Record
3. Parallelism in Cryptographic Operations
SSLShader
![Page 13: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/13.jpg)
Our GPU Implementation
Choices of cipher-suite
Optimization of GPU algorithms • Exploiting massive parallel processing
• Parallelization of algorithms
• Batch processing
• Data copy overhead is significant
• Concurrent copy and execution
13
앞에랑 매핑이 되게-_- 그림을 가져와서 매핑이 되게 하는게 좋을듯
Client Server
Encryption: AES Message Authentication: SHA1
Key exchange: RSA
![Page 14: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/14.jpg)
Basic RSA Operations
M: plain-text, C: cipher-text
(e, n): public key, (d, n): private key
Encryption:
C = Me mod n
Decryption:
M = Cd mod n
14
1024/2048 bits integer (300 ~ 600 digits)
Small number: 3, 17, 65537
Decryption at the server side is the bottleneck
Exponentiation many multiplications
Server-side
Server
Client
![Page 15: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/15.jpg)
Breakdown of Large Integer Multiplication
15
Schoolbook multiplication
649 X 627 ---------
63 280 4200 180 800
12000 5400 32000
+ 360000 ---------
406923
Accumulation is difficult to parallelize due to
“overlapping digits”
“carry propagation”
3 x 3 = 9 multiplications 9 addition of 6-digits integers
![Page 16: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/16.jpg)
O(s) Parallel Multiplications
16
Example of 649 x 627 = 406,923
2s steps
1 or 2 steps (s – 1 worst case)
s = # of words in a large integer (E.g., 1024-bits = 16 x 64 bits word)
![Page 17: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/17.jpg)
More Optimizations on RSA
Common optimizations for RSA • Chinese Remainder Theorem (CRT) • Montgomery Multiplication • Constant Length Non-zero Window (CLNW)
Parallelization of serial algorithms • Faster Calculation of M×n • Interleaving of T + M×n • Mixed-Radix Conversion Offloading
GPU specific optimizations • Warp Utilization • Loop Unrolling • Elimination of Divergence • Avoiding Bank Conflicts • Instruction-Level Optimization
17
4054 6620 13281 9891 10146 6627 21041
0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000
Throughput (operations/s)
Initial (1)
(2)
(3) Warp
Utilization
(4)
(5) (6) 64-bit words (7) Avoiding bank
conflicts
(8) Instruction-level
Optimization CLNW (9) Post-exponentiation offloading
Read our paper for details
![Page 18: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/18.jpg)
Parallelism in SSL Processing
18
Client 1
Client 2
Client N 1. Independent Sessions
SSL Record SSL Record SSL Record 2. Independent SSL Record
3. Parallelism in Cryptographic Operations
SSLShader
Batch Processing
![Page 19: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/19.jpg)
GTX580 Throughput w/o Batching
19
0.08x 0.02x
1.57x
0.02x 0.0x
0.5x
1.0x
1.5x
2.0x
RSA AES-ENC AES-DEC SHA1
Throughput relative to a “single CPU core”
Intel Nehalem single core (2.66Ghz)
![Page 20: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/20.jpg)
22.1x
6.8x 7.7x 9.4x
0x
5x
10x
15x
20x
25x
RSA AES-ENC AES-DEC SHA1
Throughput relative to a "single CPU core"
GTX580 Throughput w/ Batching
20
Difference: ratio of computation to copy
Batch size: 32~4096 depending on the algorithm
![Page 21: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/21.jpg)
Copy Overhead in GPU Cryptography
GPU processing works by
• Data copy: CPU GPU
• Execution in GPU
• Data copy: GPU -> CPU
21
AES-ENC (Gbps)
AES-DEC (Gbps)
HMAC-SHA1 (Gbps)
GTX580 w/ copy 8.8 10 31
GTX580 no copy 21.8 33 124
0
20
40
60
80
100
120
140
AES-ENC AES-DEC HMAC-SHA1
Th
rou
gh
pu
t (G
bp
s)
↑2.4x ↑3.3x
↑4x
w/o copy
w/ copy
w/o copy
![Page 22: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/22.jpg)
Hiding Copy Overhead
22
Synchronous Execution
Pipelining
Processing time : 3t
t
Amortized processing time : t
…
…
…
Data copy: CPU -> GPU
Execution in GPU
Data copy: GPU -> CPU
Data copy: CPU -> GPU
Execution in GPU
Data copy: GPU -> CPU
![Page 23: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/23.jpg)
0x
5x
10x
15x
20x
AES-ENC AES-DEC SHA1
Throughput relative to a single core
GTX580 Performance w/ Pipelining
23
↑ 36% ↑ 36%
↑ 51% w/o copy
synchronous
pipelining
9x 9x
14x
![Page 24: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/24.jpg)
Summary of GPU Cryptography
Performance gain from GTX580
• GPU performs as fast as 9 ~ 28 CPU cores
• Superior to high-end hardware accelerators
Lessons • Batch processing is essential to fully utilize a GPU
• AES and SHA1 are bottlenecked by data copy • PCIe 3.0
• Integrated GPU and CPU
24
RSA-1024 (ops/sec)
AES-ENC (Gbps)
AES-DEC (Gbps)
SHA1 (Gbps)
GTX580 91.9K 11.5 12.5 47.1
CPU core 3.3K 1.3 1.3 3.3
16분 30초
![Page 25: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/25.jpg)
BUILDING SSL-PROXY THAT
LEVERAGES GPU
25
![Page 26: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/26.jpg)
SSLShader Design Goals
Use existing application without modification
• SSL reverse proxy
Effectively leverage GPU
• Batching cryptographic operations
• Load balancing between CPU and GPU
Scale performance with architecture evolution
• Multi-core CPUs
• Multiple NUMA nodes
26
![Page 27: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/27.jpg)
Batching Crypto Operations
Network workloads vary over time • Waiting for fixed batch size doesn’t work
27
Output queue
GPU
Input queue
CPU
GPU
SSL Stack
Batch size is dynamically adjusted to queue length
![Page 28: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/28.jpg)
Balancing Load Between CPU and GPU
For small batch, CPU is faster than GPU • Opportunistic offloading
28
Output queue
GPU
Input queue
CPU processing
GPU processing when input queue length > threshold
GPU queue
CPU
Cryptographic operation Minimum Maximum
RSA (1024-bit) 16 512
AES Decryption 32 2048
AES Encryption 128 2048
HMAC-SHA1 128 2048
Input queue length > threshold
![Page 29: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/29.jpg)
Scaling with Multiple Cores
29
Per-core worker threads • Network I/O, cryptographic operation
Sharing a GPU with multiple cores • More parallelism with larger batch size
Output queues
GPU
CPU
CPU
CPU
Input queues GPU
queue CPU
GPU
Core0
Core1
Core2
![Page 30: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/30.jpg)
Scaling with NUMA systems
A process = worker threads + a GPU thread
• Separate process per NUMA node
• Minimizes data sharing across NUMA nodes
30
CPU0
IOH0
GPU0
RAM
NIC0
CPU1
IOH1
GPU1
RAM
NIC1
Node 0 Node 1
![Page 31: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/31.jpg)
Evaluation
Experimental configurations
31
Model Spec Qty
CPU Intel X5650 2.66Ghz x 6 croes 2
GPU NVIDIA GTX580 1.5Ghz x 512 cores 2
NIC Intel X520-DA2 10GbE x 2 2
SSLShader+ Lighttpd
8 Clients
4x 10GbE
…
Server Server
Lighttpd
SSLShader
Lighttpd
OpenSSL
HTTP
Clients
GPU
Clients
HTTPS HTTPS
Server Specification
![Page 32: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/32.jpg)
Evaluation Metrics
HTTPS connection handling performance
• Use small content size
• Stress on RSA computation
Latency distribution at different loads
• Test opportunistic offloading
Data transfer rate at various content size
32
![Page 33: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/33.jpg)
HTTPS Connection Rate
33
29K
21K
11K
3.6K 0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
1024 bits 2048 bits
SSLShader
lighttpd
2.5x
6x
RSA Key Size
Connections / sec
![Page 34: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/34.jpg)
CPU Usage Breakdown (RSA 1024)
34
Kernel NIC device driver,
2.32
SSLShader, 5.31
Libc , 9.88
IPP + libcrypto,
12.89
lighttpd, 4.9
others, 4.35
Kernel (Including
TCP/IP stack), 60.35
Current Bottleneck
![Page 35: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/35.jpg)
Latency at Light Load
35
0102030405060708090
100
1 10 100 1000
CD
F (
%)
Latency (ms)
Similar latency at light load
Lighttpd at 1k connections / sec
SSLShader at 1k connections / sec
![Page 36: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/36.jpg)
Latency at Heavy Load
36
Lower latency and higher throughput at heavy load
0
20
40
60
80
100
1 10 100 1000
CD
F (
%)
Latency (ms)
Lighttpd at 11k connections / sec
SSLShader at 29k connections / sec
![Page 37: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/37.jpg)
Data Transfer Performance
37
0.0x
0.5x
1.0x
1.5x
2.0x
2.5x
4KB 16KB 64KB 256KB 1MB 4MB 16MB 64MB
Rel
ati
ve
Per
form
an
ce
Content Size
2.1x
0.87x
Lighttpd performance
Typical web content size is under 100KB
SSLShader: 13 Gbps
![Page 38: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/38.jpg)
CONCLUSIONS
38
![Page 39: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/39.jpg)
Summary
Cryptographic algorithms in GPU
• Fast RSA, AES, and SHA1
• Superior to high-end hardware accelerators
SSLShader
• Transparent integration
• Effective utilization of GPU for SSL processing
• Up to 6x connections / sec
• 13 Gbps throughput
39
Linux network stack performance
Copy overhead
![Page 40: SSLShader: Cheap SSL Acceleration with …...•Test opportunistic offloading Data transfer rate at various content size 32 HTTPS Connection Rate 33 29K 21K 11K 3.6K 0 5,000 10,000](https://reader036.fdocuments.net/reader036/viewer/2022071107/5fe1cf7677e520182d35a8de/html5/thumbnails/40.jpg)
QUESTIONS?
THANK YOU!
For more details
https://shader.kaist.edu/sslshader
40