Ryo Maezono Japan Advanced Institute of Science and Technology, Kanazawa, Japan. TTI Tuscany, Italy....

Ryo Maezono

Japan Advanced Institute of Science and Technology,

Kanazawa, Japan.

TTI 2008@ Tuscany, Italy.

Some New Random Number Generators

Tested on CASINO

QuickTime˛ Ç∆TIFFÅià≥èkÇ»ÇµÅj êLí£ÉvÉçÉOÉâÉÄÇ™Ç±ÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

(Multi Recursive Generator)

Collaborators…

Prof. Ken-ichi Miura(National Institute of

Informatics)

Dr. Kenta Hongo


QuickTime˛ Ç∆ êLí£ÉvÉçÉOÉâÉÄÇ™Ç±ÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

(Maezono group)

Summary

“MRG8”


- A new Random Number Generator (RNG) developed by L’Eucuyer and Miura.

- Much Simpler (only 33 lines!) than recent fancy RNG. Theoretically clear as

well.

It Shows good performance...

- Seems equivalent to Ranlux4

with less cost than Ranlux3.

Random Number Generator(RNG)

- A kernel component of QMC.- What is the point for us?

- Auto-correlation length and Bias, mostly.- No “serious” RNG required such as, - Physical RNG (based on thermal

noise)- Cryptographic RNG (based on thermal noise)

... These matter only on Cryptograph, Gambles etc.

©前園涼・本郷研太 2008

Points for us- Auto-correlation Length

- Bias


Better RNG has less correlation.→ Shorter Blocking

Length.→ Effectively costless accumulation.

Worse RNG has less homogeneous distribution (Sparse Lattice

Structure)→ Sampling would be biased.→ Biased results.

(3N-dim in our simulation)

Representative RNG- Linear Congruential Method

(pseudo RNG)

:-) Simple, Costless and Fast.

Xn =a1Xn−1 m od p

:-( Sparse Hyper plane → Biased sampling1) Recursive generation is just

1st. order.2)

(N.B., Not the genuine drawback of Congruential method in general)

- Feedback Shift Register Method ⊕

... using

with more than two Xn−k{ }‘ ’


- Generalization for Higher Order ...

using

with more than two Xn−k{ }+‘ ’

(Topics of the study here)

... because of further reason of bad choice of a1 =16807

Famous story about “RANDU” on IBM360/370.

Further developments (1)

Long period sequence with practically easy implementation.

(Generalized Feedback Shift Register)Further improved GFSRs...

In practical situation, however, ...

Part of it is used.Wrong “period” appears...

include “Mersenne Twister”(Matsumoto&Nishimura, 1996)

on Feedback Shift Register Method ⊕

... using

with more than two Xn−k{ }‘ ’


(Not the target of the study here, brief overview)

Further developments(2)Include... - “Fibonacci-based”- “Subtract-with-Borrow” and its relatives

... using

with more than two Xn−k{ }+‘ ’

on Higher order congruential-based.


Further ‘tune-up’-ed → “Ranlux”(implemented in CASINO)

it had been the practical obstacle.

How to choose coeffsak{ } ? (Knuth’s criteria)

Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od phas been known as a simple/powerful RNG, but...

- “Multiple Recursive Generator (MRG)”

(Main target of the study here)

Desired properties of RNG

- Costless and Fast.

Desired properties :

- Easy to be accelerated (Vector/Parallel, discussed later).

- Theoretically simple

N.B.) “Mersenne Twister” is said to be ‘Good’ as well.

“Multiplicative Recursive Generator” has several

N.B.) Non-linear congruential methods have no firm theoretical background.


Choice of coefficients Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p

“Knuth’s criteria”

⇔

Characteristic Polynomial f z( )=zk − a1z

k−1 +L + ak−1z+ ak( ) Galois Fieldis a primitive polynomial on GF(p)

Choose so thatak{ } f z( ) can be factorized (mod p) in proper way.


Choosing coeffsak{ }

P =pk −1

the Longest possible Period

so that it has

(Pierre L’Ecuyer@Univ. Montreal)Random Search for ak{ }

r =pk −1p−1

~ Factorization of

勘所は ...

- 疑似乱數生成とは、


- 計算機による數値演算とは、いわば有限體上での元操作である。

有限體上での元から元への射影操作を生成する漸化式のうち

周期が恐ろしく長いものを實現するという事である。

- したがって其の本質は所与の有限體の代數構造で決まっている。

MRG8 Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p

A good choice obtained for k=8 and


A RNG named “MRG8”(implemented/tested by Prof. Miura)P = 231 −1( )

8≈4.5 ×1074 Only 33 lines !

p =231 −1

a1= 1089656042a2= 1906537547a3= 1764115693a4= 1304127872

a5= 189748160 a6= 1984088114a7= 626062218 a8= 1927846343

(Found/Chosen by P. L’Ecuyer@Univ. Montreal)

Point (1)Quality of RNG is critically depending on Coef. Choice.

a1= 1089656042a2= 1906537547a3= 1764115693a4= 1304127872

a5= 189748160 a6= 1984088114a7= 626062218 a8= 1927846343


(Sparse Lattice structure)

L’Ecuyer’s group carefully choose/test them.

- Recursive generation is just 1st. order.

Famous horror about “RANDU” on IBM360/370.

- Bad choice of coefficients.

Again...

Point (2)


Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p

makes the implementation quite simple/fast!

p =231 −1The choice of

On 64-bit architecture

(No dividing operation required)

z = Z63L Z32Z31L Z2Z1( )2 =: z1·231 + z2

z =z1·231 −z1 + z1 + z2 =z1 2

31 −1( )+ z1 + z2( )

zmod 231 −1( )= z1 + z2( )

z1 = 0L 0Z63L Z32( )2 =shift z,−31( )

z2 = 0L 0Z31L Z2Z1( )2 = Z63L Z32Z31L Z2Z1( )2 .and. 0L 01L 1( )=and z,231 −1( )

=shift z,−31( ) + and z,231 −1( )

then,

(binary description on 64-bit architecture)

Because...

∴

∵)

(No dividing operation required)

Statistical Tests


(To be completed)

Done by Miura or L’Ecuyer…

QMC Test(combined with CASINO-v1.8)

SO2 (VMC)

-548.209

-548.208

-548.207

-548.206

-548.205

1.02604152729604E-04

(# of step = 300 million)

MRG8

RANLUX-0

-3

RANLUX-4

-1

-2


(4)

(16)

(16)

(1)(2)

(2)

(Blocking length)

Tests using G1 setGround St. Ene. (a.u.)

Notes on Ranlux


N.B.) “RANLUX-0” = “Subtract with Borrow”

“Tuned-up” version of SwB algorithm(Martin & Luescher, 1993)

Plucking of sequence to reduce auto-correlation

“RANLUX-0” = “Subtract with Borrow” “RANLUX-1”“RANLUX-2”“RANLUX-3”

More plucking, better performance in Spectrum Test

Consistent with D.P. Landau’s work by Monte Carlo.

~ “1st. order Linear congruential method” (an effective implementation with very large prime number) (Tezuka & L’Ecuyer,

1992)

SO2 (VMC)

-548.209

-548.208

-548.207

-548.206

-548.205

1.02604152729604E-04

(# of step = 300 million)

MRG8

RANLUX-0

-3

RANLUX-4

-1

-2


(4)

(16)

(16)

(1)(2)

(2)

(Blocking length)

... can be viewed as 1st. order linear congruential RNGas well as the ‘Subtract with Borrow’

RNG.

Tests using G1 setGround St. Ene. (a.u.)

# of step = 40,000# of config. = 10,000

-548.566

-548.564

-548.562

-548.560

1.02604152729604E-04

MRG8

RANLUX-0-3

RANLUX-4

-1-2

(512)

(512)

(8192)

(2048)

(256)

(2048)


(Blocking length)

SO2 (DMC)

Source of Different Bias in DMC/VMC


1) Generating Random Walk

2) Metropolis reject/accept

(VMC/DMC)

3) Branching

(VMC/DMC)

(DMC)


MRG8RANLUX-0RANLUX-1RANLUX-2

Timing Info

176.0448 189.3554 365.6764 0.42101.2084 190.2169 290.9649 0.33127.7751 192.9557 321.1352 0.36176.5535 183.0453 359.7706 0.42304.1154 182.4181 486.7478 0.57470.0464 182.1101 652.6842 0.76

RANLUX-3RANLUX-4

User System Total CPU % CPU

(second)

SO2, DMC.# of step = 1,000 # of config. = 10,000

He & PH2 (VMC)

PH2He# of step = 150 milion# of step = 1000 milion

-342.240

-342.239

-342.238

-342.237

-342.236

-342.235

1.02604152729604E-04

-2.903692

-2.903690

-2.903688

-2.903686

1.02604152729604E-04


(Blocking length)

(2)

MRG8-3

RANLUX-1 -2 -4-0

MRG8-3

RANLUX-1 -2 -4-0

(1) (1

)(1)

(1)

(1) (16

)

(1)(4) (4) (2)

(1)

He & PH2 (DMC)

PH2He# of step = 50,000

# of config. = 10,000

-342.478

-342.476

-342.475

-342.474

-342.473

1.02604152729604E-04

-2.90374

-2.90373

-2.90372

-2.90371

1.02604152729604E-04 # of step = 30,000

# of config. = 10,000


(Blocking length)

MRG8-3

RANLUX-1 -2 -4-0

(512)

MRG8-3

RANLUX-1 -2 -4-0

(1024)

(1024)(2048)

(512)(1024)

(1024)

(1024)

(2048)

(4096)(512)

(1024)

System dependence


- (Bias of results) ~ (Homogeneity of RNG)

gets worse in higher dim. of sampling space… (3N-dim’ in our

case)

- Sampling space has nodal structure.“All-electron systems” and “Pseudo

Potential systems” differ in its character.

Systems with larger # of electrons are interesting.

Should examine Both.

SH4 (VMC/DMC, Pseudo Potential calc.)


-6.28984

-6.28976

-6.28968

-6.28960

-6.28952

-6.28944

-6.28936

-6.28928

E(SiH

4-

-6.3085

-6.3080

-6.3075

-6.3070

-6.3065

-6.3060

-6.3055

E/H

# of step = 30,000# of config. = 10,000

(Blocking length)

MRG8

-3RANLUX

-1 -2 -4-0

(32)

(1024)

(2048)(512)

(2048)MRG8

-3RANLUX

-1 -2 -4-0

# of step = 10 million

(VMC) (DMC)(16) (64

) (32)

(32)

(16)

(512)

(512)

QMC Test(combined with CASINO-v1.8)

RNG research people are interested in it.

“A Research Background”

c.f.) D.P. Landau’s test of RNG by QMC (Ising model)

Better RNG in spectrum tests gives not always better performance on

application.

RNG people start to consider“harmony of RNG” depending on applications”

Discussions


4) Difference of Bias appeared in DMC/VMC

1) MRG8 always give the same answer as Ranlux4

1) Generating Random Walk 2) Metropolis reject/accept

(VMC/DMC)

3) Branching

(VMC/DMC)

(DMC)

2) MRG8 is costless than Ranlux3.

3) MRG8 gives no significant improvement on Blocking length.

Acceleration of RNG


Generating Sequence

can be written as

Xn

Xn−1

MXn−k+1

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟=

a1 a2 L ak0 1 0L 0

M0M

O M

0 0 0L 1

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

Xn−1

Xn−2

MXn−k

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

=: A

Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p

X2 ,L ,X8 ,X9( )→ X10 X1,L ,X8( )→ X9

X2 ,L ,X9 ,X10( )→ X11

i.e.,

X8

X7

MX1

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟→ A → X9 → A → X10 → L

(Normal Sequence)

Acceleration (cont’d)


=:A

X10X9

MX3

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟=A

X9

X8

MX2

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

X8+n

X8+n−1

MX1+n

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟=An

X8

X7

MX1

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

Xn

Xn−1

MXn−k+1

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟=

a1 a2 L ak0 1 0L 0

M0M

O M

0 0 0L 1

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

Xn−1

Xn−2

MXn−k

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

Having evaluated

in advance...

An

=A2

X8

X7

MX1

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

Acceleration (cont’d)


Evaluation of can be made faster by ...An

- Contemporary Compiler (Vectorization)- Intra-node parallelization (“thread parallel”)

Then...

(Accelerated generation)

→

AA2

A3

A4

→ X9

→ X10→ X11→ X12

→

AA2

A3

A4

→ X13→ X14→ X15→ X16

→

AA2

A3

A4

→ X17→ X18→ X19→ X20

→L

X8

X7

MX1

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

→ A → X9 → A → X10 → A → X11→ A → X12 →L(normal ‘serial’ generation)

Ryo Maezono Japan Advanced Institute of Science and Technology, Kanazawa, Japan. TTI Tuscany, Italy....

Documents

Transcript of Ryo Maezono Japan Advanced Institute of Science and Technology, Kanazawa, Japan. TTI Tuscany, Italy....