Alternatives to IEEE: NextGenNumber Formats for Scientific...

LLNL-PRES-759606This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Alternatives to IEEE:NextGen Number Formats for Scientific ComputingIPAM Workshop II: HPC and Data Science for Scientific Discovery

Peter LindstromJeff Hittinger, Matt Larsen, Scott Lloyd, Markus Salasoo

October 15, 2018

LLNL-PRES-7596062

Data movement will dictate performance and power usage at exascale

Mega

Giga

Tera

Peta

Exa

1995 2000 2005 2010 2015 2020

Compute power (FLOPs/s)

Memory bandwidth (bytes/s)

O(106)increase

O(103)increase

LLNL-PRES-7596063

§ IEEE assumes “one exponent size fits all”— Insufficient precision for some computations, e.g. catastrophic cancellation— Insufficient dynamic range for others, e.g. under- or overflow in exponentials— For these reasons, double precision is preferred in scientific computing

§ Many bits of a double are contaminated by error— Error from round-off, truncation, iteration, model, observation, …— Pushing error around and computing on it is wasteful

§ To significantly reduce data movement, we must make every bit count!— Next generation floating types must be re-designed from ground up

Uneconomical floating types and excessive precision are contributing to data movement

LLNL-PRES-7596064

POSITS [Gustafson 2017] are a new floatrepresentation that improves on IEEE

IEEE 754 floating point Posits: UNUM version 3.0

Bit Length {16, 32, 64, 128} fixed length but arbitrary

Sign sign-magnitude (−0 ≠ +0) two’s complement (−0 = +0)

Exponent fixed length (biased binary) variable length (Golomb-Rice)

Fraction Map linear (φ(x) = 1 + x) linear (φ(x) = 1 + x)

Infinities {−∞, +∞} ±∞ (single point at infinity)

NaNs many (9 quadrillion) one

Underflow gradual (subnormals) gradual (natural)

Overflow 1 / FLT_TRUE_MIN = ∞ (oops!) never (exception: 1 / 0 = ±∞)

Base {2, 10} 22m ∈ {2, 4, 16, 256, …}

LLNL-PRES-7596065

§ Both represent real value as y = (-1)s 2e (1 + f)

§ IEEE: fixed-length encoding of exponent

§ POSITS: variable-length encoding of exponent— Exponent, e, encoded as m trailing bits in binary (e mod 2m), 1 + ⎣e / 2m⎦ leading bits in unary— More fraction bits are available when exponent is small

In terms of encoding scheme, IEEE and POSITS differ in one important way

IEEE float IEEE double POSIT(7)

e = D(1 x) = 1 + x

1-bit exponent “sign”7-bit exponent magnitude x

e = D(1 000 x) = 1 + x e = D(1 0 x) = xe = D(1 001 x) = 129 + x e = D(1 10 x) = 128 + xe = D(1 010 x) = 257 + x e = D(1 110 x) = 256 + x

… …e = D(1 111 x) = 897 + x e = D(1 11111110 x) = 896 + x

LLNL-PRES-7596066

§ Decompose integer z ≥ 1 as z = 2e + r— e ≥ 0 is exponent— r = z mod 2e is e-bit residual

§ Elias proposed three “prefix-free” variable-length codes for integers— Elias γ code is equivalent to POSIT(0) [Gustafson & Yonemoto 2017]— Elias δ code is equivalent to URR [Hamada 1983]

Related work: Universal coding of positive integers [Elias 1975]

Elias γ code Elias δ code Elias ω code

Exponent unary γ ω (recursive)

Code(1) 0 0 0

Code(2e + r) 1e 0 r 1 γ(e) r 1 ω(e) r

Code(17 = 24 + 1) 1111 0 0001 1 11 0 00 0001 1 1 1 0 0 00 0001

LLNL-PRES-7596067

§ From z ∈ ℕ (positive integers) to y ∈ ℝ (reals)

— Rationals: 1 ≤ z < y < z + 1 encoded by appending fraction bits (as in IEEE, POSITS)

— Subunitaries: 0 < y < 1 encoded via two’s complement exponent (as in POSITS)

— Negatives: y < 0 encoded via two’s complement binary representation (as in POSITS)

— 2-bit prefix tells where we are on the real line (as in POSITS)

We extend Elias codes from positive integers to reals

+1 +2 +3 +4+1 +2 +3 +40 +1 +2 +3 +4�4 �3 �2 �1 0 +1 +2 +3 +400 011110

�4 �3 �2 �1 0 +1 +2 +3 +4

LLNL-PRES-7596068

POSITS, Elias recursively refine intervals (0, ±1), (±1, ±∞)Example: base-2 posits (aka. Elias γ)

+1 0 1�11 1

00

0±1

10

+1 0 10�11 10

000

0±1

100

0 11

+2

0 01

+ 12

�2

1 01

�12

1 11

+1 0 10 0�11 10 0

0000

0±1

1000

0 110

+2

0 01 0

+ 12

�2

1 01 0

�12

1 110

+32

0 101

011

1

+4

0 01 1

+ 34

0001

+14

�4

1001

� 32

1 01 1

�341 10

1

�1 4

111

1

+1 0 10 00�11 10 00

00000

0±1

10000

0 1100

+2

0 01 00

+ 12

�2

1 01 00

�12

1 1100

+32

0 1010

011

10

+4

0 01 10

+ 34

0001

0

+14

�4

1001

0

� 32

1 01 10

�34

1 1010

�1 4

111

10

+54

0 10 01

�78

1 10 01

00001

+18

�8

10001

01101

+3

0 01 01

+ 58

� 74

1 01 01

�3 8

11101

+74

0 10 1

1

011

11+8

0 01 11+78

0001

1

+ 38

�3

1001

1

� 54

1 01 11

�58

1 10 1

1

�1 8

111

11

sign bitexponent signexponent valuefraction value

LLNL-PRES-7596069

§ NUMREP is a modular C++ framework for defining number systems— Exponent coding scheme (unary, binary, Golomb-Rice, gamma, omega, …)— Fraction map (linear, reciprocal, exponential, rational, quadratic, logarithmic, …)

• Conjugate fraction maps for sub- & superunitaries enable reciprocal closure— Handling of under- & overflow

— Rounding rules (to nearest, toward {−∞, 0, +∞})

§ NUMREP Unifies IEEE, POSITS, Elias, URR, LNS, …, under a single schema— Uses auxiliary arithmetic type to perform numerical computations (e.g. IEEE, MPFR)— Uses operator overloading to mimic intrinsic floating types— Supports 8, 16, 32, 64-bit types

Lindstrom et al., “Universal coding of the reals: Alternatives to IEEE floating point,” Conference for Next Generation Arithmetic, 2018

Beyond POSITS and Elias: NUMREP templated framework

NUMREP is a powerful but complex framework—let’s take a simplified view

LLNL-PRES-75960610

Many useful NUMREPS are given by simple interpolation and extrapolation rules (plus closure under negation)

+1 0 10 0�11 10 0

0000

0

±1

1000

0 110

0 01 0

1 01 0

1 110

0 101

011

1

0 01 1

0001

1001

1 01 1

1 101

111

1

+

f ma

x

�fm

a

x

+

fm

i

n�f m

i

n

extrapolation on (fmax, ∞)interpolation on (fmin, fmax)reciprocation on (0, fmin)

recIpr

negation o negationcatIon

LLNL-PRES-75960611

§ Extrapolation rule: 1 ≤ fmax(p) < fmax(p + 1) < ∞

§ Reciprocation rule: fmin(p) = 1 / fmax(p)

Extrapolation gives next fmax between previous fmax and ±∞ when adding one more bit of precision

type fmax(p + 1) sequence

Unary 1 + fmax(p) 1, 2, 3, 4, 5, …

Elias γ 2 × fmax(p) 1, 2, 4, 8, 16, …

POSITS (base b) b × fmax(p) 1, b, b2, b3, b4, …

Elias δ fmax(p)2 1, 2, 4, 16, 256, …

Elias ω 2fmax(p) 1, 2, 4, 16, 65536, …fmax

(1) 0 1

±1

1

0 11

fma

x

(

2

)

011

1

f ma

x

(

3

)

011

11

f ma

x

(

4

)

LLNL-PRES-75960612

§ For POSITS, Elias {γ, δ, ω}:

§ Examples— g(2, 4) = 3 arithmetic mean— g(4, 16) = 2g(2, 4) = 23 = 8 geometric mean— g(16, 65536) = 2g(4, 16) = 22g(2,4) = 223 = 28 = 256 “hypergeometric” mean (for ω only)

§ For LNS:

Interpolation rule generates new number g(a, b) between two finite positive numbers a < b

! ", $ = &" + $2 if $ ≤ 2"

2,(./ 0, ./ 1) otherwise

! ", $ = "$

LLNL-PRES-75960613

§ Templated on generator that implements two rules— Interpolation function, g(a, b)— Extrapolation recurrence, ai+1 = f(ai) = g(ai, ∞)

§ Rules + bisection implement encoding, decoding, rounding, fraction map— Recursively bisect (−∞, +∞) using applicable interpolation or extrapolation rule

• g(−∞, +∞) = 0• g(0, ±∞) = ±1

— Exploit symmetry of negation, reciprocation— Encode: bracket number and recursively test if less (0) or greater (1) than bisection point— Decode: narrow interval toward lower (0) or upper (1) bound; return final lower bound— Round: round number to lower bound if less than bisection point, otherwise to upper— Map: given by interpolation rule

§ NOTE: Not very performant, but great for experimentation & validation

FLEX (FLexible EXponent): NUMREP on a shoestring

LLNL-PRES-75960614

FLEX’s POSIT implementation is 8 lines of code

// posit generator with m-bit exponent and base 2^(2^m)template <int m = 0, typename real = flex_real>class Posit {public:

real operator()(real x) const { return real(base) * x; }real operator()(real x, real y) const { return real(2) * x >= y ? (x + y) / real(2) : std::sqrt(x * y); }

protected:static const int base = 1 << (1 << m);

};

LLNL-PRES-75960615

Elias delta and the Kessel Run:What is the distance to Proxima Centauri?

0 01111111 010011001111011001010110 10 01001100111101100101010111000

IEEE

ELIAS δ

error = 1.2e−08

error = 4.8e−101.3 parsecs

2 bits

0 10000001 000011110111111010010000 11100 00001111011111101001000100

IEEE

ELIAS δ

error = 5.6e−08

error = 9.0e−114.2 lightyears

5 bits

0 10110110 000111010010101000101010 1111111010111 000111010010101001

IEEE

ELIAS δ

error = 1.5e−08

error = 1.2e−064.0 × 1016 meters

13 bits

0 11111111 000000000000000000000000 11111111100101010 10101000110001

IEEE

ELIAS δ

error = ∞

error = 1.4e−052.4 × 1051 Planck lengths

17 bits

LLNL-PRES-75960616

Variable-length exponents lead to tapered precision: 6-9 more bits of precision than IEEE for numbers near one

0

4

8

12

16

20

24

28

32

-256 -192 -128 -64 0 64 128 192 256

prec

ision

exponent

float binary(8) gamma posit(1) posit(2) delta omega

subnormals

29 bits

LLNL-PRES-75960617

Tapered precision (close-up): tension between precision and dynamic range

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

-48 -44 -40 -36 -32 -28 -24 -20 -16 -12 -8 -4 0 4 8 12 16 20 24 28 32 36 40 44 48

prec

ision

exponent

float binary(8) gamma posit(1) posit(2) delta omega

LLNL-PRES-75960618

§ Universality property (POSITS, Elias {γ, δ, ω}, but not IEEE)1. Each integer is represented by a prefix code

• No codeword is a prefix of any other codeword• Codeword length is self-describing—remaining available bits represent fraction

2. Integer representation has monotonic length• Larger integers have longer codewords

3. Codeword length of integer y is O(lg y)• Codeword length is proportional to the number of significant bits

§ Asymptotic optimality property (Elias {δ, ω})— Relative cost of coding exponent (leading-one position) vanishes in the limit

Elias delta and omega are asymptotically optimal universal representations

LLNL-PRES-75960619

Elias delta and omega are asymptotically optimal: exponent overhead approaches zero in the limit

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 1 2 4 8 16 32 64 128 256

rela

tive

codi

ng c

ost

lg(y)

binary(8) gamma posit(1) posit(2) delta omega

LLNL-PRES-75960620

§ So far we have assumed that fraction (significand), f ∈ [0, 1), is mapped linearly— y = (-1)s 2e (1 + f)— We may replace (1 + f) with transfer function φ(f)

§ Fraction map, φ : [0, 1) ⟶ [1, 2), maps fraction bits, f, to value, φ(f)— Subunitary map φ–: |y| < 1— Superunitary map φ+: |y| ≥ 1

Fraction maps dictate how fraction bits map to values

map φ(f) notesLinear 1 + f introduces implicit leading-one bitLinear reciprocal 2 / (2 – f) conjugate with linear map ⇒ reciprocal closureExponential 2f LNS (logarithmic number system): y = 2e 2f = 2e+f

Linear rational (1 + p f) / (1 – q f) p = 2 − 1, q = (1 – p) / 2

LLNL-PRES-75960621

1.0

1.5

2.0

0.0 0.5 1.0

valu

e φ

(f)

fraction f

linear reciprocal exponential rational

§ Conjugacy: φ–(f) φ+(1 – f) = 2 0 ≤ f ≤ 1— E.g. 2 / (2 – f) × (1 + (1 – f)) = 2

§ Cn continuity: φ(n)(1) / φ(n)(0) = 2

Conjugate pair (φ–, φ+) guarantees reciprocal closure

LLNL-PRES-75960622

§ Arithmetic closure and relative error (vs. analytic solution) ! = # + %— Addition and multiplication of 16-bit operands

§ Addition of long floating-point sequences (vs. analytic solution) s = ∑( )(— Using naïve and best-of-breed addition algorithms

§ Matrix inversion (vs. analytic solution) * = +,-— LU-based inversion of Hilbert and Vandermonde matrices

§ Symmetric matrix eigenvalues (vs. analytic solution) Λ = /*/0— QR-based dense eigenvalue solver

§ Euler2D PDE solver (vs. IEEE quad precision) 123 + ∇ 5 6 3 = 0— Complex physics involving interacting shock waves

We present extensive results that span a spectrum of simple to increasingly complex numerical computations

LLNL-PRES-75960623

Closure rate (C) and mean relative error (E) for 16-bit addition suggest POSIT(0) (aka. Elias gamma) performs best

negative error no error positive error infinite/NaN

LLNL-PRES-75960624

Optimal multiplicative closure of LNS does not imply superior relative errors

negative error no error positive error infinite/NaN

LLNL-PRES-75960625

Simple summation algorithms benefit from tapered precision:Sum of Gaussian PDF and its first derivative

FLT_EPSILON

1E-09

1E-08

1E-07

1E-06

1E-05

1E-04

1E-03

1E-02

1E-01

sequential increasing insertion cascade Kahan

abso

lute

err

orfloat posit(2) delta

LLNL-PRES-75960626

New types excel in 64-bit Hilbert matrix inversion (Eigen3)

1E-181E-171E-161E-151E-141E-131E-121E-111E-101E-091E-081E-071E-061E-051E-04

2 3 4 5 6 7 8 9 10

mat

rix

inve

rse

RMS

erro

r

matrix rows

IEEE gamma posit(1) posit(2) delta omega

LLNL-PRES-75960627

Elias outperforms IEEE by O(103) in 64-bit eigenvalue accuracy

1E-19

1E-18

1E-17

1E-16

1E-15

1E-14

1E-13

1E-12

1E-11

4 16 64 256 1024

eige

nval

ue R

MS

erro

r

matrix rows

IEEE gamma posit(1) posit(2) delta omega

LLNL-PRES-75960628

Euler2D shock propagation illustrates benefits of new types

LLNL-PRES-75960629

IEEE consistently is the least accurate floating-point representation in numerical calculations (64-bit types)

1E-19

1E-18

1E-17

1E-16

1E-15

1E-14

1E-13

1E-12

1E-11

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

dens

ity R

MS

erro

r

time

IEEE binary(11) LNS(11) gamma posit(1)

posit(2) posit(3) delta(0) delta(1) omega(3)

LLNL-PRES-75960630

Nonlinear fraction maps sometimes improve accuracy substantially (32-bit types)

1E-09

1E-08

1E-07

1E-06

1E-05

1E-04

1E-03

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

dens

ity R

MS

erro

r

time

IEEE posit(1) posreclin(1) poslinrec(1) posexp(1)

LLNL-PRES-75960631

§ Many numerical computations involve continuous fields— E.g. partial differential equations over ℝ2, ℝ3

— Nearby scalars on Cartesian grids have similar values• In statistical speak, fields exhibit autocorrelation• Such redundancy is not exploited by any of the representations discussed so far

Beyond IEEE and posit representations of ℝ:Compressed representations of ℝn

0

1

2

3

0 64 128 192 256 320 384

func

tion

valu

e

grid point

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 8 16 24 32 40 48 56 64

stor

age

precision

entropy uncompressed

LLNL-PRES-75960632

(64-bit) floating-point data does not compress well losslessly

1.11 1.07 1.05 1.04 1.03 1.02 1.01

[Lin

dstr

om '0

6]

[Alte

d '0

9]

[Bur

tsch

er '0

7]

[Bur

tsch

er '1

6]

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

fpzip BLOSC FPC gzip szip SPDP bzip2

com

pres

sion

ratio

(unc

omp.

/com

p. s

ize)

LLNL-PRES-75960633

§ Large improvements in compression arepossible by allowing even small errors— Simulation often computes on meaningless bits

• Round-off, truncation, iteration, model errors abound• Last few floating-point bits are effectively random noise

§ Still, lossy compression often makes scientists nervous— Even though lossy data reduction is ubiquitous

• Decimation in space and/or time (e.g. store every 100 time steps)• Averaging (hourly vs. daily vs. monthly averages)• Truncation to single precision (e.g. for history files)

— State-of-the-art compressors support error tolerances

Lossy compression enables greater reduction, but is often met with skepticism by scientists

0.27 bits/value 240x compression

LLNL-PRES-75960634

Can lossy-compressed in-memory storage of numerical simulation state be tolerated?

Simulation Outer Loop

Update elementsUpdate nodes

Update elements

Compute time constraints

Decompress block

Element kernel(s)

Compress block

Our Approach

Decompress all state

Update nodes

Update elements

Compute time constraints

Compressall state

LLNL-PRES-75960635

High-order Eulerian hydrodynamics• QoI: Rayleigh-Taylor mixing layer thickness• 10,000 time steps• At 4x compression, relative error < 0.2%

Laser-plasma multi-physics• QoI: backscattered laser energy• At 4x compression, relative error < 0.1%

Lagrangian shock hydrodynamics• QoI: radial shock position• 25 state variables compressed over 2,100 time steps• At 4x compression, relative error < 0.06%

Using lossy FPZIP to store simulation state compressed, we have shown that 4x lossy compression can be tolerated

20 bits/value uncompressed16 bits/value

Lossy compression of state is viable, but streaming compression increases data movement—need inline compression

LLNL-PRES-75960636

§ Inspired by ideas from h/w texture compression— d-dimensional array divided into blocks of 4d values— Each block is independently (de)compressed

• e.g. to a user-specified number of bits or quality— Fixed-size blocks Þ random read/write access— (De)compression is done inline, on demand— Write-back cache of uncompressed blocks limits data loss

§ Compressed arrays via C++ operator overloading— Can be dropped into existing code by changing type declarations— double a[n] Û std::vector<double> a(n) Û zfp::array<double> a(n, precision)

Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE Transactions on Visualization and Computer Graphics, 2014

ZFP is an inline compressor for floating-point arrays

LLNL-PRES-75960637

ZFP’S C++ compressed arrays can replace STL vectors and C

arrays with minimal code changes

// example using STL vectors

std::vector<double> u(nx * ny, 0.0);

u[x0 + nx*y0] = 1;for (double t = 0; t < tfinal; t += dt) {

std::vector<double> du(nx * ny, 0.0);

for (int y = 1; y < ny - 1; y++)for (int x = 1; x < nx - 1; x++) {

double uxx = (u[(x-1)+nx*y] - 2*u[x+nx*y] + u[(x+1)+nx*y]) / dxx;double uyy = (u[x+nx*(y-1)] - 2*u[x+nx*y] + u[x+nx*(y+1)]) / dyy;du[x + nx*y] = k * dt * (uxx + uyy);

}for (int i = 0; i < u.size(); i++)

u[i] += du[i];}

// example using ZFP arrays

zfp::array2<double> u(nx, ny, bits_per_value);

u(x0, y0) = 1;for (double t = 0; t < tfinal; t += dt) {

zfp::array2<double> du(nx, ny, bits_per_value);

for (int y = 1; y < ny - 1; y++)for (int x = 1; x < nx - 1; x++) {

double uxx = (u(x-1, y) - 2*u(x, y) + u(x+1, y)) / dxx;double uyy = (u(x, y-1) - 2*u(x, y) + u(x, y+1)) / dyy;du(x, y) = k * dt * (uxx + uyy);

}for (int i = 0; i < u.size(); i++)

u[i] += du[i];}

required changes

optional changes for improved readability

LLNL-PRES-75960638

ZFP arrays limit data loss via a small write-back cache

virtual array compressed blocks

dirtybit uncompressed blocks

software cache

blockindex

application

LLNL-PRES-75960639

The ZFP compressor is comprised of three distinct components

39

Raw floating-point array

Block floating-point transform

Orthogonalblock transform

Embeddedcoding

Compressedbit stream

§ Align values in a 4d block to a common largest exponent

§ Transmit exponent verbatim

§ Lifted, separable transform using integer adds and shifts

§ Similar to but faster and more effective than JPEG DCT

§ Encode one bit plane at a time from MSB using group testing

§ Each bit increases quality—can truncate stream anywhere

LLNL-PRES-75960640

6-bit ZFP gives one more digit of accuracy than 32-bit IEEE in2nd, 3rd derivative computations (Laplacian and highlight lines)

IEEE half (16 bits) POSIT(1) (16 bits) IEEE float (32 bits)

ZFP (6 bits) IEEE double (64 bits) IEEE double (64 bits)

LLNL-PRES-75960641

ZFP improves accuracy in finite difference computations using less precision than IEEE

LLNL-PRES-75960642

ZFP variable-rate arrays adapt storage spatially and allocate bits to regions where they are most needed

16 bits/value8 bits/value 32 bits/value 64 bits/value

LLNL-PRES-75960643

ZFP adaptive arrays improve accuracy in PDE solution over IEEEby 6 orders of magnitude using less storage

LLNL-PRES-75960644

ZFP adaptive arrays improve accuracy in PDE solution over IEEEby 6 orders of magnitude using less storage

1E-15

1E-14

1E-13

1E-12

1E-11

1E-10

1E-09

1E-08

1E-07

1E-06

1E-05

1E-04

1E-03

1E-02

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190

RMS

erro

r

time

IEEE (32-bit) posit1 (32-bit) zfp (32-bit) arc (28-bit)

LLNL-PRES-75960645

§ Emerging alternatives to IEEE floats show promising improvements in accuracy— Tapered precision allows for better balance between range and precision— New types have many desirable properties

• No IEEE warts: subnormals, excess of NaNs, signed zeros, overflow, exponent size dichotomy, …• Features: nesting, monotonicity, reciprocal closure, universality, asymptotic optimality, …

§ NUMREP framework allows for experimentation with new number types in apps— Modular design supports plug-and-play of independent concepts— Nonlinear maps are expensive but worth investigating— POSITS are often among the best performing types; IEEE among the worst

§ NextGen scalar representations still leave considerable room for improvement— ZFP fixed-rate compressed arrays consistently outperform IEEE and POSITS— ZFP variable-rate arrays optimize bit distribution and and make very bit count

Conclusions: Novel scalar representations are driving research into new vector/matrix/tensor representations

LLNL-PRES-75960646

ZFP is available as open source on GitHub: https://github.com/LLNL/zfp

https://github.com/LLNL/zfp

DisclaimerThis document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwisedoes not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

LLNL-PRES-75960648

ZFP provides >100x compression with imperceptible loss in visual quality

LLNL-PRES-75960649


LLNL-PRES-75960650


LLNL-PRES-75960651

ZFP shows no artifacts in derivative computations(velocity divergence)

LLNL-PRES-75960652


LLNL-PRES-75960653


LLNL-PRES-75960654

Parallel ZFP achieves up to 150 GB/s throughput

0

20

40

60

80

100

120

140

160

180

1 2 4 8 16 32

thro

ughp

ut (G

B/s)

compressed storage size (bits/value)

ZFP CUDA compression speed on NVIDIA V100

Alternatives to IEEE: NextGenNumber Formats for Scientific...

Documents

Transcript of Alternatives to IEEE: NextGenNumber Formats for Scientific...