1 FLOATING-POINT ARITHMETIC Floating-point representation ...
Cranking Floating Point Performance Up To 11
-
Upload
john-wilker -
Category
Technology
-
view
1.562 -
download
1
description
Transcript of Cranking Floating Point Performance Up To 11
![Page 1: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/1.jpg)
Cranking Floating Point Performance Up To 11
Noel LlopisSnappy Touch
http://twitter.com/[email protected]
http://gamesfromwithin.com
![Page 2: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/2.jpg)
![Page 3: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/3.jpg)
![Page 4: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/4.jpg)
![Page 5: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/5.jpg)
![Page 6: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/6.jpg)
![Page 7: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/7.jpg)
Floating Point Performance
![Page 8: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/8.jpg)
Floating point numbers
![Page 9: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/9.jpg)
Floating point numbers
• Representation of rational numbers
![Page 10: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/10.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
![Page 11: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/11.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
![Page 12: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/12.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
![Page 13: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/13.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
• Double precision: 64 bits
![Page 14: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/14.jpg)
Floating point numbers
![Page 15: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/15.jpg)
Floating point numbers
![Page 16: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/16.jpg)
Why floating point performance?
![Page 17: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/17.jpg)
Why floating point performance?
• Most games use floating point numbers for most of their calculations
![Page 18: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/18.jpg)
Why floating point performance?
• Most games use floating point numbers for most of their calculations
• Positions, velocities, physics, etc, etc.
![Page 19: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/19.jpg)
Why floating point performance?
• Most games use floating point numbers for most of their calculations
• Positions, velocities, physics, etc, etc.
• Maybe not so much for regular apps
![Page 20: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/20.jpg)
CPU
![Page 21: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/21.jpg)
CPU
• 32-bit RISC ARM 11
![Page 22: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/22.jpg)
CPU
• 32-bit RISC ARM 11
• 400-535Mhz
![Page 23: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/23.jpg)
CPU
• 32-bit RISC ARM 11
• 400-535Mhz
• iPhone 2G/3G and iPod Touch 1st and 2nd gen
![Page 24: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/24.jpg)
CPU (iPhone 3GS)
![Page 25: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/25.jpg)
CPU (iPhone 3GS)
• Cortex-A8 600MHz
![Page 26: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/26.jpg)
CPU (iPhone 3GS)
• Cortex-A8 600MHz
• More advanced architecture
![Page 27: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/27.jpg)
CPU
![Page 28: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/28.jpg)
CPU
• No floating point support in the ARM CPU!!!
![Page 29: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/29.jpg)
How about integer math?
![Page 30: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/30.jpg)
How about integer math?
• No need to do any floating point operations
![Page 31: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/31.jpg)
How about integer math?
• No need to do any floating point operations
• Fully supported in the ARM processor
![Page 32: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/32.jpg)
How about integer math?
• No need to do any floating point operations
• Fully supported in the ARM processor
• But...
![Page 33: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/33.jpg)
Integer Divide
![Page 34: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/34.jpg)
Integer Divide
![Page 35: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/35.jpg)
Integer Divide
There is no integer divide
![Page 36: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/36.jpg)
Fixed-point arithmetic
![Page 37: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/37.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
![Page 38: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/38.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
![Page 39: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/39.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
![Page 40: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/40.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer values at a reduced range/resolution.
![Page 41: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/41.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer values at a reduced range/resolution.
• Not so great...
![Page 42: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/42.jpg)
Floating point support
![Page 43: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/43.jpg)
Floating point support
• There’s a floating point unit
![Page 44: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/44.jpg)
Floating point support
• There’s a floating point unit
• Compiled C/C++/ObjC code uses the VFP unit for any floating point operations.
![Page 45: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/45.jpg)
Sample program
![Page 46: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/46.jpg)
Sample program struct Particle { float x, y, z; float vx, vy, vz; };
![Page 47: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/47.jpg)
Sample program struct Particle { float x, y, z; float vx, vy, vz; };
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
![Page 48: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/48.jpg)
Sample program struct Particle { float x, y, z; float vx, vy, vz; };
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
• 7.2 seconds on an iPod Touch 2nd gen
![Page 49: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/49.jpg)
Floating point support
![Page 50: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/50.jpg)
Floating point support
Trust no one!
![Page 51: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/51.jpg)
Floating point support
Trust no one!When in doubt, check the
assembly generated
![Page 52: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/52.jpg)
Floating point support
![Page 53: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/53.jpg)
Thumb Mode
![Page 54: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/54.jpg)
Thumb Mode
![Page 55: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/55.jpg)
Thumb Mode• CPU has a special thumb
mode.
![Page 56: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/56.jpg)
Thumb Mode• CPU has a special thumb
mode.
• Less memory, maybe better performance.
![Page 57: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/57.jpg)
Thumb Mode• CPU has a special thumb
mode.
• Less memory, maybe better performance.
• No floating point support.
![Page 58: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/58.jpg)
Thumb Mode• CPU has a special thumb
mode.
• Less memory, maybe better performance.
• No floating point support.
• Every time there’s an fp operation, it switches out of Thumb, does the fp operation, and switches back on.
![Page 59: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/59.jpg)
Thumb Mode
![Page 60: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/60.jpg)
Thumb Mode
• It’s on by default!
![Page 61: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/61.jpg)
Thumb Mode
• It’s on by default!
• Potentially HUGE wins turning it off.
![Page 62: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/62.jpg)
Thumb Mode
• It’s on by default!
• Potentially HUGE wins turning it off.
![Page 63: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/63.jpg)
Thumb Mode
![Page 64: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/64.jpg)
Thumb Mode
• Turning off Thumb mode increased performance in Flower Garden by over 2x
![Page 65: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/65.jpg)
Thumb Mode
• Turning off Thumb mode increased performance in Flower Garden by over 2x
• Heavy usage of floating point operations though
![Page 66: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/66.jpg)
Thumb Mode
• Turning off Thumb mode increased performance in Flower Garden by over 2x
• Heavy usage of floating point operations though
• Most games will probably benefit from turning it off (especially 3D games)
![Page 67: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/67.jpg)
![Page 68: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/68.jpg)
2.6 seconds!
![Page 69: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/69.jpg)
ARM assemblyDISCLAIMER:
![Page 70: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/70.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
![Page 71: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/71.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
![Page 72: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/72.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
![Page 73: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/73.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
Z80!!!
![Page 74: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/74.jpg)
ARM assembly
![Page 75: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/75.jpg)
ARM assembly
• Hit the docs
![Page 76: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/76.jpg)
ARM assembly
• Hit the docs
• References included in your USB card
![Page 77: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/77.jpg)
ARM assembly
• Hit the docs
• References included in your USB card
• Or download them from the ARM site
![Page 78: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/78.jpg)
ARM assembly
• Hit the docs
• References included in your USB card
• Or download them from the ARM site
• http://bit.ly/arminfo
![Page 79: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/79.jpg)
ARM assembly
![Page 80: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/80.jpg)
ARM assembly
• Reading assembly is a very important skill for high-performance programming
![Page 81: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/81.jpg)
ARM assembly
• Reading assembly is a very important skill for high-performance programming
• Writing is more specialized. Most people don’t need to.
![Page 82: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/82.jpg)
VFP unit
![Page 83: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/83.jpg)
VFP unitA0
![Page 84: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/84.jpg)
VFP unitA0
+
![Page 85: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/85.jpg)
VFP unitA0
B0+
![Page 86: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/86.jpg)
VFP unitA0
B0+
=
![Page 87: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/87.jpg)
VFP unitA0
B0+
C0=
![Page 88: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/88.jpg)
VFP unitA0
B0+
C0=
A1
B1+
C1=
![Page 89: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/89.jpg)
VFP unitA0
B0+
C0=
A1
B1+
C1=
A2
B2+
C2=
![Page 90: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/90.jpg)
VFP unitA0
B0+
C0=
A1
B1+
C1=
A2
B2+
C2=
A3
B3+
C3=
![Page 91: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/91.jpg)
VFP unit
![Page 92: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/92.jpg)
VFP unitA0 A1 A2 A3
![Page 93: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/93.jpg)
VFP unit
+A0 A1 A2 A3
![Page 94: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/94.jpg)
VFP unit
+A0 A1 A2 A3
B0 B1 B2 B3
![Page 95: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/95.jpg)
VFP unit
+
=
A0 A1 A2 A3
B0 B1 B2 B3
![Page 96: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/96.jpg)
VFP unit
+
=
A0 A1 A2 A3
B0 B1 B2 B3
C0 C1 C2 C3
![Page 97: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/97.jpg)
VFP unit
+
=
A0 A1 A2 A3
B0 B1 B2 B3
C0 C1 C2 C3
Sweet! How do we use the vfp?
![Page 98: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/98.jpg)
"fldmias %2, {s8-s23} \n\t" "fldmias %1!, {s0-s3} \n\t" "fmuls s24, s8, s0 \n\t" "fmacs s24, s12, s1 \n\t"
"fldmias %1!, {s4-s7} \n\t"
"fmacs s24, s16, s2 \n\t" "fmacs s24, s20, s3 \n\t" "fstmias %0!, {s24-s27} \n\t"
Like this!
![Page 99: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/99.jpg)
Writing vfp assembly
![Page 100: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/100.jpg)
Writing vfp assembly
• There are two parts to it
![Page 101: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/101.jpg)
Writing vfp assembly
• There are two parts to it
• How to write any assembly in gcc
![Page 102: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/102.jpg)
Writing vfp assembly
• There are two parts to it
• How to write any assembly in gcc
• Learning ARM and VPM assembly
![Page 103: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/103.jpg)
vfpmath library
![Page 104: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/104.jpg)
vfpmath library
• Already done a lot of work for you
![Page 105: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/105.jpg)
vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
![Page 106: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/106.jpg)
vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
![Page 107: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/107.jpg)
vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
• Might not be exactly what you need, but it’s a great starting point
![Page 108: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/108.jpg)
Assembly in gcc
• Only use it when targeting the device
![Page 109: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/109.jpg)
Assembly in gcc
• Only use it when targeting the device
#include <TargetConditionals.h>#if (TARGET_IPHONE_SIMULATOR == 0) && (TARGET_OS_IPHONE == 1) #define USE_VFP#endif
![Page 110: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/110.jpg)
Assembly in gcc
• The basics
asm (“cmp r2, r1”);
![Page 111: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/111.jpg)
Assembly in gcc
• The basics
asm (“cmp r2, r1”);
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
![Page 112: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/112.jpg)
Assembly in gcc
• Multiple lines
asm ( “mov r0, #1000\n\t” “cmp r2, r1\n\t”);
![Page 113: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/113.jpg)
Assembly in gcc• Accessing C variables
asm (//assembly code : // output operands : // input operands : // clobbered registers);
![Page 114: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/114.jpg)
Assembly in gcc• Accessing C variables
asm (//assembly code : // output operands : // input operands : // clobbered registers);
int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );
![Page 115: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/115.jpg)
Assembly in gcc• Accessing C variables
asm (//assembly code : // output operands : // input operands : // clobbered registers);
int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );
%0, %1, etc are the variables in order
![Page 116: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/116.jpg)
Assembly in gcc
![Page 117: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/117.jpg)
Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );
![Page 118: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/118.jpg)
Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );
Clobber register list are registers used by
the asm block
![Page 119: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/119.jpg)
Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );
Clobber register list are registers used by
the asm block
volatile prevents “optimizations”
![Page 120: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/120.jpg)
VFP asmFour banks of 8 32-bit registers each
![Page 121: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/121.jpg)
VFP asmFour banks of 8 32-bit registers each
#define VFP_VECTOR_LENGTH(VEC_LENGTH) "fmrx r0, fpscr \n\t" \ "bic r0, r0, #0x00370000 \n\t" \ "orr r0, r0, #0x000" #VEC_LENGTH "0000 \n\t" \ "fmxr fpscr, r0 \n\t"
![Page 122: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/122.jpg)
VFP asm
![Page 123: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/123.jpg)
VFP asm
![Page 124: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/124.jpg)
VFP asmfor (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
![Page 125: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/125.jpg)
VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
![Page 126: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/126.jpg)
VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
Was: 2.6 seconds
![Page 127: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/127.jpg)
VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
Was: 2.6 secondsNow: 1.4 seconds!!
![Page 128: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/128.jpg)
VFP asmLet’s do 6 operations at once!
struct Particle2 { float x0, y0, z0; float x1, y1, z1; float vx0, vy0, vz0; float vx1, vy1, vz1; };
![Page 129: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/129.jpg)
VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
![Page 130: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/130.jpg)
VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds
![Page 131: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/131.jpg)
VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds
Now: 1.2 seconds
![Page 132: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/132.jpg)
VFP asmWhat’s the loop/cache overhead?
for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }
![Page 133: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/133.jpg)
VFP asmWhat’s the loop/cache overhead?
for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }
Was: 1.2 seconds
![Page 134: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/134.jpg)
VFP asmWhat’s the loop/cache overhead?
for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }
Was: 1.2 secondsNow: 1.2 seconds!!!!
![Page 135: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/135.jpg)
![Page 136: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/136.jpg)
Matrix multiply
![Page 137: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/137.jpg)
Matrix multiplyStraight from vfpmathlib
![Page 138: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/138.jpg)
Matrix multiply
Touch: 0.037919 s
Straight from vfpmathlib
![Page 139: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/139.jpg)
Matrix multiply
Touch: 0.037919 sNormal: 0.096855 s
Straight from vfpmathlib
![Page 140: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/140.jpg)
Matrix multiply
Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s
Straight from vfpmathlib
![Page 141: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/141.jpg)
Matrix multiply
Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s
About 2x faster!
Straight from vfpmathlib
![Page 142: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/142.jpg)
Good use of vfp
![Page 143: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/143.jpg)
Good use of vfp
• Matrix operations
![Page 144: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/144.jpg)
Good use of vfp
• Matrix operations
• Particle systems
![Page 145: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/145.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
![Page 146: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/146.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
![Page 147: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/147.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
• Procedural content generation
![Page 148: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/148.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
• Procedural content generation
• ....
![Page 149: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/149.jpg)
What about the 3GS?
![Page 150: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/150.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 151: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/151.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 152: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/152.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 153: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/153.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 154: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/154.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 155: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/155.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 156: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/156.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 157: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/157.jpg)
More 3GS: NEON
![Page 158: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/158.jpg)
More 3GS: NEON
• SIMD coprocessor
![Page 159: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/159.jpg)
More 3GS: NEON
• SIMD coprocessor
• Floating point and integer
![Page 160: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/160.jpg)
More 3GS: NEON
• SIMD coprocessor
• Floating point and integer
• Huge potential
![Page 161: Cranking Floating Point Performance Up To 11](https://reader033.fdocuments.net/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/161.jpg)
More 3GS: NEON
• SIMD coprocessor
• Floating point and integer
• Huge potential
• Very little documentation right now :-(