Ryo Maezono Japan Advanced Institute of Science and Technology, Kanazawa, Japan. TTI Tuscany, Italy....
-
Upload
ariel-cole -
Category
Documents
-
view
220 -
download
0
description
Transcript of Ryo Maezono Japan Advanced Institute of Science and Technology, Kanazawa, Japan. TTI Tuscany, Italy....
Ryo Maezono
Japan Advanced Institute of Science and Technology,
Kanazawa, Japan.
TTI 2008@ Tuscany, Italy.
Some New Random Number Generators
Tested on CASINO
QuickTime˛ Ç∆TIFFÅià≥èkǻǵÅj êLí£ÉvÉçÉOÉâÉÄǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
(Multi Recursive Generator)
Collaborators…
Prof. Ken-ichi Miura(National Institute of
Informatics)
Dr. Kenta Hongo
QuickTime˛ Ç∆TIFFÅià≥èkǻǵÅj êLí£ÉvÉçÉOÉâÉÄǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
QuickTime˛ Ç∆ êLí£ÉvÉçÉOÉâÉÄǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
(Maezono group)
Summary
“MRG8”
QuickTime˛ Ç∆TIFFÅià≥èkǻǵÅj êLí£ÉvÉçÉOÉâÉÄǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB
- A new Random Number Generator (RNG) developed by L’Eucuyer and Miura.
- Much Simpler (only 33 lines!) than recent fancy RNG. Theoretically clear as
well.
It Shows good performance...
- Seems equivalent to Ranlux4
with less cost than Ranlux3.
Random Number Generator(RNG)
- A kernel component of QMC.- What is the point for us?
- Auto-correlation length and Bias, mostly.- No “serious” RNG required such as, - Physical RNG (based on thermal
noise)- Cryptographic RNG (based on thermal noise)
... These matter only on Cryptograph, Gambles etc.
©前園涼・本郷研太 2008
Points for us- Auto-correlation Length
- Bias
©前園涼・本郷研太 2008
Better RNG has less correlation.→ Shorter Blocking
Length.→ Effectively costless accumulation.
Worse RNG has less homogeneous distribution (Sparse Lattice
Structure)→ Sampling would be biased.→ Biased results.
(3N-dim in our simulation)
Representative RNG- Linear Congruential Method
(pseudo RNG)
:-) Simple, Costless and Fast.
Xn =a1Xn−1 m od p
:-( Sparse Hyper plane → Biased sampling1) Recursive generation is just
1st. order.2)
(N.B., Not the genuine drawback of Congruential method in general)
- Feedback Shift Register Method ⊕
... using
with more than two Xn−k{ }‘ ’
©前園涼・本郷研太 2008
- Generalization for Higher Order ...
using
with more than two Xn−k{ }+‘ ’
(Topics of the study here)
... because of further reason of bad choice of a1 =16807
Famous story about “RANDU” on IBM360/370.
Further developments (1)
Long period sequence with practically easy implementation.
(Generalized Feedback Shift Register)Further improved GFSRs...
In practical situation, however, ...
Part of it is used.Wrong “period” appears...
include “Mersenne Twister”(Matsumoto&Nishimura, 1996)
on Feedback Shift Register Method ⊕
... using
with more than two Xn−k{ }‘ ’
©前園涼・本郷研太 2008
(Not the target of the study here, brief overview)
Further developments(2)Include... - “Fibonacci-based”- “Subtract-with-Borrow” and its relatives
... using
with more than two Xn−k{ }+‘ ’
on Higher order congruential-based.
©前園涼・本郷研太 2008
Further ‘tune-up’-ed → “Ranlux”(implemented in CASINO)
it had been the practical obstacle.
How to choose coeffsak{ } ? (Knuth’s criteria)
Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od phas been known as a simple/powerful RNG, but...
- “Multiple Recursive Generator (MRG)”
(Main target of the study here)
Desired properties of RNG
- Costless and Fast.
Desired properties :
- Easy to be accelerated (Vector/Parallel, discussed later).
- Theoretically simple
N.B.) “Mersenne Twister” is said to be ‘Good’ as well.
“Multiplicative Recursive Generator” has several
N.B.) Non-linear congruential methods have no firm theoretical background.
©前園涼・本郷研太 2008
Choice of coefficients Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p
“Knuth’s criteria”
⇔
Characteristic Polynomial f z( )=zk − a1z
k−1 +L + ak−1z+ ak( ) Galois Fieldis a primitive polynomial on GF(p)
Choose so thatak{ } f z( ) can be factorized (mod p) in proper way.
©前園涼・本郷研太 2008
Choosing coeffsak{ }
P =pk −1
the Longest possible Period
so that it has
(Pierre L’Ecuyer@Univ. Montreal)Random Search for ak{ }
r =pk −1p−1
~ Factorization of
勘所は ...
- 疑似乱數生成とは、
©前園涼・本郷研太 2008
- 計算機による數値演算とは、いわば有限體上での元操作である。
有限體上での元から元への射影操作を生成する漸化式のうち
周期が恐ろしく長いものを實現するという事である。
- したがって其の本質は所与の有限體の代數構造で決まっている。
MRG8 Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p
A good choice obtained for k=8 and
©前園涼・本郷研太 2008
A RNG named “MRG8”(implemented/tested by Prof. Miura)P = 231 −1( )
8≈4.5 ×1074 Only 33 lines !
p =231 −1
a1= 1089656042a2= 1906537547a3= 1764115693a4= 1304127872
a5= 189748160 a6= 1984088114a7= 626062218 a8= 1927846343
(Found/Chosen by P. L’Ecuyer@Univ. Montreal)
Point (1)Quality of RNG is critically depending on Coef. Choice.
a1= 1089656042a2= 1906537547a3= 1764115693a4= 1304127872
a5= 189748160 a6= 1984088114a7= 626062218 a8= 1927846343
©前園涼・本郷研太 2008
(Sparse Lattice structure)
L’Ecuyer’s group carefully choose/test them.
- Recursive generation is just 1st. order.
Famous horror about “RANDU” on IBM360/370.
- Bad choice of coefficients.
Again...
Point (2)
©前園涼・本郷研太 2008
Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p
makes the implementation quite simple/fast!
p =231 −1The choice of
On 64-bit architecture
(No dividing operation required)
z = Z63L Z32Z31L Z2Z1( )2 =: z1·231 + z2
z =z1·231 −z1 + z1 + z2 =z1 2
31 −1( )+ z1 + z2( )
zmod 231 −1( )= z1 + z2( )
z1 = 0L 0Z63L Z32( )2 =shift z,−31( )
z2 = 0L 0Z31L Z2Z1( )2 = Z63L Z32Z31L Z2Z1( )2 .and. 0L 01L 1( )=and z,231 −1( )
=shift z,−31( ) + and z,231 −1( )
then,
(binary description on 64-bit architecture)
Because...
∴
∵)
(No dividing operation required)
Statistical Tests
©前園涼・本郷研太 2008
(To be completed)
Done by Miura or L’Ecuyer…
QMC Test(combined with CASINO-v1.8)
SO2 (VMC)
-548.209
-548.208
-548.207
-548.206
-548.205
1.02604152729604E-04
(# of step = 300 million)
MRG8
RANLUX-0
-3
RANLUX-4
-1
-2
©前園涼・本郷研太 2008
(4)
(16)
(16)
(1)(2)
(2)
(Blocking length)
Tests using G1 setGround St. Ene. (a.u.)
Notes on Ranlux
©前園涼・本郷研太 2008
N.B.) “RANLUX-0” = “Subtract with Borrow”
“Tuned-up” version of SwB algorithm(Martin & Luescher, 1993)
Plucking of sequence to reduce auto-correlation
“RANLUX-0” = “Subtract with Borrow” “RANLUX-1”“RANLUX-2”“RANLUX-3”
More plucking, better performance in Spectrum Test
Consistent with D.P. Landau’s work by Monte Carlo.
~ “1st. order Linear congruential method” (an effective implementation with very large prime number) (Tezuka & L’Ecuyer,
1992)
SO2 (VMC)
-548.209
-548.208
-548.207
-548.206
-548.205
1.02604152729604E-04
(# of step = 300 million)
MRG8
RANLUX-0
-3
RANLUX-4
-1
-2
©前園涼・本郷研太 2008
(4)
(16)
(16)
(1)(2)
(2)
(Blocking length)
... can be viewed as 1st. order linear congruential RNGas well as the ‘Subtract with Borrow’
RNG.
Tests using G1 setGround St. Ene. (a.u.)
# of step = 40,000# of config. = 10,000
-548.566
-548.564
-548.562
-548.560
1.02604152729604E-04
MRG8
RANLUX-0-3
RANLUX-4
-1-2
(512)
(512)
(8192)
(2048)
(256)
(2048)
©前園涼・本郷研太 2008
(Blocking length)
SO2 (DMC)
Source of Different Bias in DMC/VMC
©前園涼・本郷研太 2008
1) Generating Random Walk
2) Metropolis reject/accept
(VMC/DMC)
3) Branching
(VMC/DMC)
(DMC)
©前園涼・本郷研太 2008
MRG8RANLUX-0RANLUX-1RANLUX-2
Timing Info
176.0448 189.3554 365.6764 0.42101.2084 190.2169 290.9649 0.33127.7751 192.9557 321.1352 0.36176.5535 183.0453 359.7706 0.42304.1154 182.4181 486.7478 0.57470.0464 182.1101 652.6842 0.76
RANLUX-3RANLUX-4
User System Total CPU % CPU
(second)
SO2, DMC.# of step = 1,000 # of config. = 10,000
He & PH2 (VMC)
PH2He# of step = 150 milion# of step = 1000 milion
-342.240
-342.239
-342.238
-342.237
-342.236
-342.235
1.02604152729604E-04
-2.903692
-2.903690
-2.903688
-2.903686
1.02604152729604E-04
©前園涼・本郷研太 2008
(Blocking length)
(2)
MRG8-3
RANLUX-1 -2 -4-0
MRG8-3
RANLUX-1 -2 -4-0
(1) (1
)(1)
(1)
(1) (16
)
(1)(4) (4) (2)
(1)
He & PH2 (DMC)
PH2He# of step = 50,000
# of config. = 10,000
-342.478
-342.476
-342.475
-342.474
-342.473
1.02604152729604E-04
-2.90374
-2.90373
-2.90372
-2.90371
1.02604152729604E-04 # of step = 30,000
# of config. = 10,000
©前園涼・本郷研太 2008
(Blocking length)
MRG8-3
RANLUX-1 -2 -4-0
(512)
MRG8-3
RANLUX-1 -2 -4-0
(1024)
(1024)(2048)
(512)(1024)
(1024)
(1024)
(2048)
(4096)(512)
(1024)
System dependence
©前園涼・本郷研太 2008
- (Bias of results) ~ (Homogeneity of RNG)
gets worse in higher dim. of sampling space… (3N-dim’ in our
case)
- Sampling space has nodal structure.“All-electron systems” and “Pseudo
Potential systems” differ in its character.
Systems with larger # of electrons are interesting.
Should examine Both.
SH4 (VMC/DMC, Pseudo Potential calc.)
©前園涼・本郷研太 2008
-6.28984
-6.28976
-6.28968
-6.28960
-6.28952
-6.28944
-6.28936
-6.28928
E(SiH
4-
-6.3085
-6.3080
-6.3075
-6.3070
-6.3065
-6.3060
-6.3055
E/H
# of step = 30,000# of config. = 10,000
(Blocking length)
MRG8
-3RANLUX
-1 -2 -4-0
(32)
(1024)
(2048)(512)
(2048)MRG8
-3RANLUX
-1 -2 -4-0
# of step = 10 million
(VMC) (DMC)(16) (64
) (32)
(32)
(16)
(512)
(512)
QMC Test(combined with CASINO-v1.8)
RNG research people are interested in it.
“A Research Background”
c.f.) D.P. Landau’s test of RNG by QMC (Ising model)
Better RNG in spectrum tests gives not always better performance on
application.
RNG people start to consider“harmony of RNG” depending on applications”
Discussions
©前園涼・本郷研太 2008
4) Difference of Bias appeared in DMC/VMC
1) MRG8 always give the same answer as Ranlux4
1) Generating Random Walk 2) Metropolis reject/accept
(VMC/DMC)
3) Branching
(VMC/DMC)
(DMC)
2) MRG8 is costless than Ranlux3.
3) MRG8 gives no significant improvement on Blocking length.
Acceleration of RNG
©前園涼・本郷研太 2008
Generating Sequence
can be written as
Xn
Xn−1
MXn−k+1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟=
a1 a2 L ak0 1 0L 0
M0M
O M
0 0 0L 1
⎛
⎝
⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟
Xn−1
Xn−2
MXn−k
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
=: A
Xn = a1Xn−1 + a2Xn−2 +L + akXn−k( ) m od p
X2 ,L ,X8 ,X9( )→ X10 X1,L ,X8( )→ X9
X2 ,L ,X9 ,X10( )→ X11
i.e.,
X8
X7
MX1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟→ A → X9 → A → X10 → L
(Normal Sequence)
Acceleration (cont’d)
©前園涼・本郷研太 2008
=:A
X10X9
MX3
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟=A
X9
X8
MX2
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
X8+n
X8+n−1
MX1+n
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟=An
X8
X7
MX1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
Xn
Xn−1
MXn−k+1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟=
a1 a2 L ak0 1 0L 0
M0M
O M
0 0 0L 1
⎛
⎝
⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟
Xn−1
Xn−2
MXn−k
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
Having evaluated
in advance...
An
=A2
X8
X7
MX1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
Acceleration (cont’d)
©前園涼・本郷研太 2008
Evaluation of can be made faster by ...An
- Contemporary Compiler (Vectorization)- Intra-node parallelization (“thread parallel”)
Then...
(Accelerated generation)
→
AA2
A3
A4
→ X9
→ X10→ X11→ X12
→
AA2
A3
A4
→ X13→ X14→ X15→ X16
→
AA2
A3
A4
→ X17→ X18→ X19→ X20
→L
X8
X7
MX1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
→ A → X9 → A → X10 → A → X11→ A → X12 →L(normal ‘serial’ generation)