Alternatives to IEEE: NextGenNumber Formats for Scientific...
Transcript of Alternatives to IEEE: NextGenNumber Formats for Scientific...
LLNL-PRES-759606This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Alternatives to IEEE:NextGen Number Formats for Scientific ComputingIPAM Workshop II: HPC and Data Science for Scientific Discovery
Peter LindstromJeff Hittinger, Matt Larsen, Scott Lloyd, Markus Salasoo
October 15, 2018
LLNL-PRES-7596062
Data movement will dictate performance and power usage at exascale
Mega
Giga
Tera
Peta
Exa
1995 2000 2005 2010 2015 2020
Compute power (FLOPs/s)
Memory bandwidth (bytes/s)
O(106)increase
O(103)increase
LLNL-PRES-7596063
§ IEEE assumes “one exponent size fits all”— Insufficient precision for some computations, e.g. catastrophic cancellation— Insufficient dynamic range for others, e.g. under- or overflow in exponentials— For these reasons, double precision is preferred in scientific computing
§ Many bits of a double are contaminated by error— Error from round-off, truncation, iteration, model, observation, …— Pushing error around and computing on it is wasteful
§ To significantly reduce data movement, we must make every bit count!— Next generation floating types must be re-designed from ground up
Uneconomical floating types and excessive precision are contributing to data movement
LLNL-PRES-7596064
POSITS [Gustafson 2017] are a new floatrepresentation that improves on IEEE
IEEE 754 floating point Posits: UNUM version 3.0
Bit Length {16, 32, 64, 128} fixed length but arbitrary
Sign sign-magnitude (−0 ≠ +0) two’s complement (−0 = +0)
Exponent fixed length (biased binary) variable length (Golomb-Rice)
Fraction Map linear (φ(x) = 1 + x) linear (φ(x) = 1 + x)
Infinities {−∞, +∞} ±∞ (single point at infinity)
NaNs many (9 quadrillion) one
Underflow gradual (subnormals) gradual (natural)
Overflow 1 / FLT_TRUE_MIN = ∞ (oops!) never (exception: 1 / 0 = ±∞)
Base {2, 10} 22m ∈ {2, 4, 16, 256, …}
LLNL-PRES-7596065
§ Both represent real value as y = (-1)s 2e (1 + f)
§ IEEE: fixed-length encoding of exponent
§ POSITS: variable-length encoding of exponent— Exponent, e, encoded as m trailing bits in binary (e mod 2m), 1 + ⎣e / 2m⎦ leading bits in unary— More fraction bits are available when exponent is small
In terms of encoding scheme, IEEE and POSITS differ in one important way
IEEE float IEEE double POSIT(7)
e = D(1 x) = 1 + x
1-bit exponent “sign”7-bit exponent magnitude x
e = D(1 000 x) = 1 + x e = D(1 0 x) = xe = D(1 001 x) = 129 + x e = D(1 10 x) = 128 + xe = D(1 010 x) = 257 + x e = D(1 110 x) = 256 + x
… …e = D(1 111 x) = 897 + x e = D(1 11111110 x) = 896 + x
LLNL-PRES-7596066
§ Decompose integer z ≥ 1 as z = 2e + r— e ≥ 0 is exponent— r = z mod 2e is e-bit residual
§ Elias proposed three “prefix-free” variable-length codes for integers— Elias γ code is equivalent to POSIT(0) [Gustafson & Yonemoto 2017]— Elias δ code is equivalent to URR [Hamada 1983]
Related work: Universal coding of positive integers [Elias 1975]
Elias γ code Elias δ code Elias ω code
Exponent unary γ ω (recursive)
Code(1) 0 0 0
Code(2e + r) 1e 0 r 1 γ(e) r 1 ω(e) r
Code(17 = 24 + 1) 1111 0 0001 1 11 0 00 0001 1 1 1 0 0 00 0001
LLNL-PRES-7596067
§ From z ∈ ℕ (positive integers) to y ∈ ℝ (reals)
— Rationals: 1 ≤ z < y < z + 1 encoded by appending fraction bits (as in IEEE, POSITS)
— Subunitaries: 0 < y < 1 encoded via two’s complement exponent (as in POSITS)
— Negatives: y < 0 encoded via two’s complement binary representation (as in POSITS)
— 2-bit prefix tells where we are on the real line (as in POSITS)
We extend Elias codes from positive integers to reals
+1 +2 +3 +4+1 +2 +3 +40 +1 +2 +3 +4�4 �3 �2 �1 0 +1 +2 +3 +400 011110
�4 �3 �2 �1 0 +1 +2 +3 +4
LLNL-PRES-7596068
POSITS, Elias recursively refine intervals (0, ±1), (±1, ±∞)Example: base-2 posits (aka. Elias γ)
+1 0 1�11 1
00
0±1
10
+1 0 10�11 10
000
0±1
100
0 11
+2
0 01
+ 12
�2
1 01
�12
1 11
+1 0 10 0�11 10 0
0000
0±1
1000
0 110
+2
0 01 0
+ 12
�2
1 01 0
�12
1 110
+32
0 101
011
1
+4
0 01 1
+ 34
0001
+14
�4
1001
� 32
1 01 1
�341 10
1
�1 4
111
1
+1 0 10 00�11 10 00
00000
0±1
10000
0 1100
+2
0 01 00
+ 12
�2
1 01 00
�12
1 1100
+32
0 1010
011
10
+4
0 01 10
+ 34
0001
0
+14
�4
1001
0
� 32
1 01 10
�34
1 1010
�1 4
111
10
+54
0 10 01
�78
1 10 01
00001
+18
�8
10001
01101
+3
0 01 01
+ 58
� 74
1 01 01
�3 8
11101
+74
0 10 1
1
011
11+8
0 01 11+78
0001
1
+ 38
�3
1001
1
� 54
1 01 11
�58
1 10 1
1
�1 8
111
11
sign bitexponent signexponent valuefraction value
LLNL-PRES-7596069
§ NUMREP is a modular C++ framework for defining number systems— Exponent coding scheme (unary, binary, Golomb-Rice, gamma, omega, …)— Fraction map (linear, reciprocal, exponential, rational, quadratic, logarithmic, …)
• Conjugate fraction maps for sub- & superunitaries enable reciprocal closure— Handling of under- & overflow
— Rounding rules (to nearest, toward {−∞, 0, +∞})
§ NUMREP Unifies IEEE, POSITS, Elias, URR, LNS, …, under a single schema— Uses auxiliary arithmetic type to perform numerical computations (e.g. IEEE, MPFR)— Uses operator overloading to mimic intrinsic floating types— Supports 8, 16, 32, 64-bit types
Lindstrom et al., “Universal coding of the reals: Alternatives to IEEE floating point,” Conference for Next Generation Arithmetic, 2018
Beyond POSITS and Elias: NUMREP templated framework
NUMREP is a powerful but complex framework—let’s take a simplified view
LLNL-PRES-75960610
Many useful NUMREPS are given by simple interpolation and extrapolation rules (plus closure under negation)
+1 0 10 0�11 10 0
0000
0
±1
1000
0 110
0 01 0
1 01 0
1 110
0 101
011
1
0 01 1
0001
1001
1 01 1
1 101
111
1
+
f ma
x
�fm
a
x
+
fm
i
n�f m
i
n
extrapolation on (fmax, ∞)interpolation on (fmin, fmax)reciprocation on (0, fmin)
recIpr
negation o negationcatIon
LLNL-PRES-75960611
§ Extrapolation rule: 1 ≤ fmax(p) < fmax(p + 1) < ∞
§ Reciprocation rule: fmin(p) = 1 / fmax(p)
Extrapolation gives next fmax between previous fmax and ±∞ when adding one more bit of precision
type fmax(p + 1) sequence
Unary 1 + fmax(p) 1, 2, 3, 4, 5, …
Elias γ 2 × fmax(p) 1, 2, 4, 8, 16, …
POSITS (base b) b × fmax(p) 1, b, b2, b3, b4, …
Elias δ fmax(p)2 1, 2, 4, 16, 256, …
Elias ω 2fmax(p) 1, 2, 4, 16, 65536, …fmax
(1) 0 1
±1
1
0 11
fma
x
(
2
)
011
1
f ma
x
(
3
)
011
11
f ma
x
(
4
)
LLNL-PRES-75960612
§ For POSITS, Elias {γ, δ, ω}:
§ Examples— g(2, 4) = 3 arithmetic mean— g(4, 16) = 2g(2, 4) = 23 = 8 geometric mean— g(16, 65536) = 2g(4, 16) = 22g(2,4) = 223 = 28 = 256 “hypergeometric” mean (for ω only)
§ For LNS:
Interpolation rule generates new number g(a, b) between two finite positive numbers a < b
! ", $ = &" + $2 if $ ≤ 2"
2,(./ 0, ./ 1) otherwise
! ", $ = "$
LLNL-PRES-75960613
§ Templated on generator that implements two rules— Interpolation function, g(a, b)— Extrapolation recurrence, ai+1 = f(ai) = g(ai, ∞)
§ Rules + bisection implement encoding, decoding, rounding, fraction map— Recursively bisect (−∞, +∞) using applicable interpolation or extrapolation rule
• g(−∞, +∞) = 0• g(0, ±∞) = ±1
— Exploit symmetry of negation, reciprocation— Encode: bracket number and recursively test if less (0) or greater (1) than bisection point— Decode: narrow interval toward lower (0) or upper (1) bound; return final lower bound— Round: round number to lower bound if less than bisection point, otherwise to upper— Map: given by interpolation rule
§ NOTE: Not very performant, but great for experimentation & validation
FLEX (FLexible EXponent): NUMREP on a shoestring
LLNL-PRES-75960614
FLEX’s POSIT implementation is 8 lines of code
// posit generator with m-bit exponent and base 2^(2^m)template <int m = 0, typename real = flex_real>class Posit {public:
real operator()(real x) const { return real(base) * x; }real operator()(real x, real y) const { return real(2) * x >= y ? (x + y) / real(2) : std::sqrt(x * y); }
protected:static const int base = 1 << (1 << m);
};
LLNL-PRES-75960615
Elias delta and the Kessel Run:What is the distance to Proxima Centauri?
0 01111111 010011001111011001010110 10 01001100111101100101010111000
IEEE
ELIAS δ
error = 1.2e−08
error = 4.8e−101.3 parsecs
2 bits
0 10000001 000011110111111010010000 11100 00001111011111101001000100
IEEE
ELIAS δ
error = 5.6e−08
error = 9.0e−114.2 lightyears
5 bits
0 10110110 000111010010101000101010 1111111010111 000111010010101001
IEEE
ELIAS δ
error = 1.5e−08
error = 1.2e−064.0 × 1016 meters
13 bits
0 11111111 000000000000000000000000 11111111100101010 10101000110001
IEEE
ELIAS δ
error = ∞
error = 1.4e−052.4 × 1051 Planck lengths
17 bits
LLNL-PRES-75960616
Variable-length exponents lead to tapered precision: 6-9 more bits of precision than IEEE for numbers near one
0
4
8
12
16
20
24
28
32
-256 -192 -128 -64 0 64 128 192 256
prec
ision
exponent
float binary(8) gamma posit(1) posit(2) delta omega
subnormals
29 bits
LLNL-PRES-75960617
Tapered precision (close-up): tension between precision and dynamic range
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-48 -44 -40 -36 -32 -28 -24 -20 -16 -12 -8 -4 0 4 8 12 16 20 24 28 32 36 40 44 48
prec
ision
exponent
float binary(8) gamma posit(1) posit(2) delta omega
LLNL-PRES-75960618
§ Universality property (POSITS, Elias {γ, δ, ω}, but not IEEE)1. Each integer is represented by a prefix code
• No codeword is a prefix of any other codeword• Codeword length is self-describing—remaining available bits represent fraction
2. Integer representation has monotonic length• Larger integers have longer codewords
3. Codeword length of integer y is O(lg y)• Codeword length is proportional to the number of significant bits
§ Asymptotic optimality property (Elias {δ, ω})— Relative cost of coding exponent (leading-one position) vanishes in the limit
Elias delta and omega are asymptotically optimal universal representations
LLNL-PRES-75960619
Elias delta and omega are asymptotically optimal: exponent overhead approaches zero in the limit
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0 1 2 4 8 16 32 64 128 256
rela
tive
codi
ng c
ost
lg(y)
binary(8) gamma posit(1) posit(2) delta omega
LLNL-PRES-75960620
§ So far we have assumed that fraction (significand), f ∈ [0, 1), is mapped linearly— y = (-1)s 2e (1 + f)— We may replace (1 + f) with transfer function φ(f)
§ Fraction map, φ : [0, 1) ⟶ [1, 2), maps fraction bits, f, to value, φ(f)— Subunitary map φ–: |y| < 1— Superunitary map φ+: |y| ≥ 1
Fraction maps dictate how fraction bits map to values
map φ(f) notesLinear 1 + f introduces implicit leading-one bitLinear reciprocal 2 / (2 – f) conjugate with linear map ⇒ reciprocal closureExponential 2f LNS (logarithmic number system): y = 2e 2f = 2e+f
Linear rational (1 + p f) / (1 – q f) p = 2 − 1, q = (1 – p) / 2
LLNL-PRES-75960621
1.0
1.5
2.0
0.0 0.5 1.0
valu
e φ
(f)
fraction f
linear reciprocal exponential rational
§ Conjugacy: φ–(f) φ+(1 – f) = 2 0 ≤ f ≤ 1— E.g. 2 / (2 – f) × (1 + (1 – f)) = 2
§ Cn continuity: φ(n)(1) / φ(n)(0) = 2
Conjugate pair (φ–, φ+) guarantees reciprocal closure
LLNL-PRES-75960622
§ Arithmetic closure and relative error (vs. analytic solution) ! = # + %— Addition and multiplication of 16-bit operands
§ Addition of long floating-point sequences (vs. analytic solution) s = ∑( )(— Using naïve and best-of-breed addition algorithms
§ Matrix inversion (vs. analytic solution) * = +,-— LU-based inversion of Hilbert and Vandermonde matrices
§ Symmetric matrix eigenvalues (vs. analytic solution) Λ = /*/0— QR-based dense eigenvalue solver
§ Euler2D PDE solver (vs. IEEE quad precision) 123 + ∇ 5 6 3 = 0— Complex physics involving interacting shock waves
We present extensive results that span a spectrum of simple to increasingly complex numerical computations
LLNL-PRES-75960623
Closure rate (C) and mean relative error (E) for 16-bit addition suggest POSIT(0) (aka. Elias gamma) performs best
negative error no error positive error infinite/NaN
LLNL-PRES-75960624
Optimal multiplicative closure of LNS does not imply superior relative errors
negative error no error positive error infinite/NaN
LLNL-PRES-75960625
Simple summation algorithms benefit from tapered precision:Sum of Gaussian PDF and its first derivative
FLT_EPSILON
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
1E-01
sequential increasing insertion cascade Kahan
abso
lute
err
orfloat posit(2) delta
LLNL-PRES-75960626
New types excel in 64-bit Hilbert matrix inversion (Eigen3)
1E-181E-171E-161E-151E-141E-131E-121E-111E-101E-091E-081E-071E-061E-051E-04
2 3 4 5 6 7 8 9 10
mat
rix
inve
rse
RMS
erro
r
matrix rows
IEEE gamma posit(1) posit(2) delta omega
LLNL-PRES-75960627
Elias outperforms IEEE by O(103) in 64-bit eigenvalue accuracy
1E-19
1E-18
1E-17
1E-16
1E-15
1E-14
1E-13
1E-12
1E-11
4 16 64 256 1024
eige
nval
ue R
MS
erro
r
matrix rows
IEEE gamma posit(1) posit(2) delta omega
LLNL-PRES-75960628
Euler2D shock propagation illustrates benefits of new types
LLNL-PRES-75960629
IEEE consistently is the least accurate floating-point representation in numerical calculations (64-bit types)
1E-19
1E-18
1E-17
1E-16
1E-15
1E-14
1E-13
1E-12
1E-11
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
dens
ity R
MS
erro
r
time
IEEE binary(11) LNS(11) gamma posit(1)
posit(2) posit(3) delta(0) delta(1) omega(3)
LLNL-PRES-75960630
Nonlinear fraction maps sometimes improve accuracy substantially (32-bit types)
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
dens
ity R
MS
erro
r
time
IEEE posit(1) posreclin(1) poslinrec(1) posexp(1)
LLNL-PRES-75960631
§ Many numerical computations involve continuous fields— E.g. partial differential equations over ℝ2, ℝ3
— Nearby scalars on Cartesian grids have similar values• In statistical speak, fields exhibit autocorrelation• Such redundancy is not exploited by any of the representations discussed so far
Beyond IEEE and posit representations of ℝ:Compressed representations of ℝn
0
1
2
3
0 64 128 192 256 320 384
func
tion
valu
e
grid point
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 8 16 24 32 40 48 56 64
stor
age
precision
entropy uncompressed
LLNL-PRES-75960632
(64-bit) floating-point data does not compress well losslessly
1.11 1.07 1.05 1.04 1.03 1.02 1.01
[Lin
dstr
om '0
6]
[Alte
d '0
9]
[Bur
tsch
er '0
7]
[Bur
tsch
er '1
6]
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
fpzip BLOSC FPC gzip szip SPDP bzip2
com
pres
sion
ratio
(unc
omp.
/com
p. s
ize)
LLNL-PRES-75960633
§ Large improvements in compression arepossible by allowing even small errors— Simulation often computes on meaningless bits
• Round-off, truncation, iteration, model errors abound• Last few floating-point bits are effectively random noise
§ Still, lossy compression often makes scientists nervous— Even though lossy data reduction is ubiquitous
• Decimation in space and/or time (e.g. store every 100 time steps)• Averaging (hourly vs. daily vs. monthly averages)• Truncation to single precision (e.g. for history files)
— State-of-the-art compressors support error tolerances
Lossy compression enables greater reduction, but is often met with skepticism by scientists
0.27 bits/value 240x compression
LLNL-PRES-75960634
Can lossy-compressed in-memory storage of numerical simulation state be tolerated?
Simulation Outer Loop
Update elementsUpdate nodes
Update elements
Compute time constraints
Decompress block
Element kernel(s)
Compress block
Our Approach
Decompress all state
Update nodes
Update elements
Compute time constraints
Compressall state
LLNL-PRES-75960635
High-order Eulerian hydrodynamics• QoI: Rayleigh-Taylor mixing layer thickness• 10,000 time steps• At 4x compression, relative error < 0.2%
Laser-plasma multi-physics• QoI: backscattered laser energy• At 4x compression, relative error < 0.1%
Lagrangian shock hydrodynamics• QoI: radial shock position• 25 state variables compressed over 2,100 time steps• At 4x compression, relative error < 0.06%
Using lossy FPZIP to store simulation state compressed, we have shown that 4x lossy compression can be tolerated
20 bits/value uncompressed16 bits/value
Lossy compression of state is viable, but streaming compression increases data movement—need inline compression
LLNL-PRES-75960636
§ Inspired by ideas from h/w texture compression— d-dimensional array divided into blocks of 4d values— Each block is independently (de)compressed
• e.g. to a user-specified number of bits or quality— Fixed-size blocks Þ random read/write access— (De)compression is done inline, on demand— Write-back cache of uncompressed blocks limits data loss
§ Compressed arrays via C++ operator overloading— Can be dropped into existing code by changing type declarations— double a[n] Û std::vector<double> a(n) Û zfp::array<double> a(n, precision)
Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE Transactions on Visualization and Computer Graphics, 2014
ZFP is an inline compressor for floating-point arrays
LLNL-PRES-75960637
ZFP’S C++ compressed arrays can replace STL vectors and C
arrays with minimal code changes
// example using STL vectors
std::vector<double> u(nx * ny, 0.0);
u[x0 + nx*y0] = 1;for (double t = 0; t < tfinal; t += dt) {
std::vector<double> du(nx * ny, 0.0);
for (int y = 1; y < ny - 1; y++)for (int x = 1; x < nx - 1; x++) {
double uxx = (u[(x-1)+nx*y] - 2*u[x+nx*y] + u[(x+1)+nx*y]) / dxx;double uyy = (u[x+nx*(y-1)] - 2*u[x+nx*y] + u[x+nx*(y+1)]) / dyy;du[x + nx*y] = k * dt * (uxx + uyy);
}for (int i = 0; i < u.size(); i++)
u[i] += du[i];}
// example using ZFP arrays
zfp::array2<double> u(nx, ny, bits_per_value);
u(x0, y0) = 1;for (double t = 0; t < tfinal; t += dt) {
zfp::array2<double> du(nx, ny, bits_per_value);
for (int y = 1; y < ny - 1; y++)for (int x = 1; x < nx - 1; x++) {
double uxx = (u(x-1, y) - 2*u(x, y) + u(x+1, y)) / dxx;double uyy = (u(x, y-1) - 2*u(x, y) + u(x, y+1)) / dyy;du(x, y) = k * dt * (uxx + uyy);
}for (int i = 0; i < u.size(); i++)
u[i] += du[i];}
required changes
optional changes for improved readability
LLNL-PRES-75960638
ZFP arrays limit data loss via a small write-back cache
virtual array compressed blocks
dirtybit uncompressed blocks
software cache
blockindex
application
LLNL-PRES-75960639
The ZFP compressor is comprised of three distinct components
39
Raw floating-point array
Block floating-point transform
Orthogonalblock transform
Embeddedcoding
Compressedbit stream
§ Align values in a 4d block to a common largest exponent
§ Transmit exponent verbatim
§ Lifted, separable transform using integer adds and shifts
§ Similar to but faster and more effective than JPEG DCT
§ Encode one bit plane at a time from MSB using group testing
§ Each bit increases quality—can truncate stream anywhere
LLNL-PRES-75960640
6-bit ZFP gives one more digit of accuracy than 32-bit IEEE in2nd, 3rd derivative computations (Laplacian and highlight lines)
IEEE half (16 bits) POSIT(1) (16 bits) IEEE float (32 bits)
ZFP (6 bits) IEEE double (64 bits) IEEE double (64 bits)
LLNL-PRES-75960641
ZFP improves accuracy in finite difference computations using less precision than IEEE
LLNL-PRES-75960642
ZFP variable-rate arrays adapt storage spatially and allocate bits to regions where they are most needed
16 bits/value8 bits/value 32 bits/value 64 bits/value
LLNL-PRES-75960643
ZFP adaptive arrays improve accuracy in PDE solution over IEEEby 6 orders of magnitude using less storage
LLNL-PRES-75960644
ZFP adaptive arrays improve accuracy in PDE solution over IEEEby 6 orders of magnitude using less storage
1E-15
1E-14
1E-13
1E-12
1E-11
1E-10
1E-09
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03
1E-02
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
RMS
erro
r
time
IEEE (32-bit) posit1 (32-bit) zfp (32-bit) arc (28-bit)
LLNL-PRES-75960645
§ Emerging alternatives to IEEE floats show promising improvements in accuracy— Tapered precision allows for better balance between range and precision— New types have many desirable properties
• No IEEE warts: subnormals, excess of NaNs, signed zeros, overflow, exponent size dichotomy, …• Features: nesting, monotonicity, reciprocal closure, universality, asymptotic optimality, …
§ NUMREP framework allows for experimentation with new number types in apps— Modular design supports plug-and-play of independent concepts— Nonlinear maps are expensive but worth investigating— POSITS are often among the best performing types; IEEE among the worst
§ NextGen scalar representations still leave considerable room for improvement— ZFP fixed-rate compressed arrays consistently outperform IEEE and POSITS— ZFP variable-rate arrays optimize bit distribution and and make very bit count
Conclusions: Novel scalar representations are driving research into new vector/matrix/tensor representations
LLNL-PRES-75960646
ZFP is available as open source on GitHub: https://github.com/LLNL/zfp
DisclaimerThis document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwisedoes not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.
LLNL-PRES-75960648
ZFP provides >100x compression with imperceptible loss in visual quality
LLNL-PRES-75960649
ZFP provides >100x compression with imperceptible loss in visual quality
LLNL-PRES-75960650
ZFP provides >100x compression with imperceptible loss in visual quality
LLNL-PRES-75960651
ZFP shows no artifacts in derivative computations(velocity divergence)
LLNL-PRES-75960652
ZFP shows no artifacts in derivative computations(velocity divergence)
LLNL-PRES-75960653
ZFP shows no artifacts in derivative computations(velocity divergence)
LLNL-PRES-75960654
Parallel ZFP achieves up to 150 GB/s throughput
0
20
40
60
80
100
120
140
160
180
1 2 4 8 16 32
thro
ughp
ut (G
B/s)
compressed storage size (bits/value)
ZFP CUDA compression speed on NVIDIA V100