ADMIS_Walker のぶろぐ: 論文の紹介と検証：Marsaglia, G. 2003. Xorshift RNGs.

Abstract of this post

This post will introduce Marsaglia, G. 2003. Xorshift RNGs. [1] which is a paper of Xorshift random number generator. And, in the reproducing process of calculating the hyperparameters reveals that [1] has a few calculation mistakes of hyperparameters. This result means that there is a parameter which enable to generate random number by only 2 calculations of 64 bits xorshifts while [1] said that at least 3 calculations are required.

この投稿では，Xorshift 疑似乱数生成器の論文 Marsaglia, G. 2003. Xorshift RNGs. [1] を紹介する．前提条件として，参考文献 [2] 読了レベルの知識を必要とする．以下，論文の節に沿って適宜翻訳を行いながら，必要な計算の実装と実行結果を示す．また，再現実験および，[追記2] より，[1] のパラメータのいくつかとは異なる計算結果を得た．また，shift 操作 2 回で乱数を生成できるパラメータは存在しないとされていたが[1]，64 bits 時において，存在することを示す計算結果を得た．

Abstract

Description of a class of simple, extremely fast random number generators (RNGs) with periods $2^k-1$ for $k = 32, 64, 96, 128, 160, 192$. These RNGs seem to pass tests of randomness very well.[1]

訳:
非常に高速で簡素な $2^k-1\ (k=32, 64, 96, 128, 160, 192)$ の周期を持つ疑似乱数生成器の説明．これらの疑似乱数生成器は，多くの乱雑性テストを通過しているように伺える．

1 Introduction

省略．([2] を参照せよ．)

2 Theory

省略．([2] を参照せよ．)

2.1 Matrices $T$ that generate all non-null binary vectors
2.1 行列 $T$ はすべての要素が零とはならないバイナリベクトルを生成する

  ここでは，下記の定理について説明を行うが，[1] では [7] の内容を簡略化して記述しているため，解読が困難である．したがって，[7] の同一箇所について説明を行う．
  ところで，そもそも疑似乱数生成器の問題設定として，一様分布して欲しいという前提条件がある．これは，空間全体に分布するとき，一様に分布していると言えるから，必然的に周期は，$2^n-1$ になって欲しいと思う訳である．もう少し具体的には，$1\times n$ ビットあるので，$2^n$ の自由度 (表現力) があり，元の状態に戻るために $1$ つ使うので，位数が $2^n-1$ であれば，偏りなく表現することができるし，周期としても最大となるため，非常に都合がよい．一様に分布する理由としては，全ての位数について，異なる値を取ることが保証されているからである．これは，次の乱数地は今の乱数値を元に決定されるため，仮に同一の値を取ったとすれば，それ以降の乱数値は全て一致しているからである※．
  ※これは，疑似乱数生成器の内部状態と出力される乱数値が全単射 (一対一対応) であることを暗に仮定している．なお，当然ではあるが，シード値が異なれば乱数列は異なるため，現在の値が分かれば直ちに次の値が分かる訳ではなく，シード値の情報も必要となる．

Theorem: 定理

Let $T$ be an $n\times n$ nonsingular binary matrix. In order that the sequence $\beta, \beta T, \beta T^2, \beta T^3,\ldots$ have period $k=2^n-1$ for every nonnull binary vector $p$, it is necessary and sufficient that the matrix $T$ have order $2^n-1$ in the group of nonsingular $n\times n$ binary matrices. [7]

意訳:
$T$ が $n\times n$ 正則バイナリ行列として $2^n-1$ の位数を持つことは，数列 $\beta, \beta T, T^2, \beta T^3, \ldots$ が非負バイナリベクトル $\beta$ 全てについて $k=2^n-1$ の周期を持つことと同値 (必要十分条件) である．

Proof: 証明

Proof of Necessity. Assume $\beta, \beta T, \beta T^2, \ldots$ has period $k=2^n-1$ for some nonzero $\beta$. Then the order of T is at most k, and the set $\{\beta, \beta T, \beta T^2, \ldots, \beta T^{k-1}\}$ includes all nonnull $1\times n$ binary vectors. Thus $\gamma\ T^k=\gamma$ for every $1\times n$ binary vector $\gamma$, so the null space of $T^k-I$ has dimension $n$, which makes $T^k=I$, and the order of $T$ is at least $k$.

Proof of Sufficiency. Assume $T$ has order $k=2^n-1$ and let $\mathscr{S}$ be the set of distinct powers of $T$: \[ \mathscr{S}=\{I,T,T^2,T^3,\ldots,T^{k-1}\}. \] Let $g(x)$ be the minimal polynomial of $T$: the polynomial of least degree for which $g(T)=0$. Let $g$ have degree $t$. Then $t\lt n$, since $T$ satisfies its characteristic equation. Define \[ \mathscr{S}=\{{\rm all\ polynomials\ in}\ T{\rm\ of\ degree}\lt t\}. \] Then $\mathscr{S}=2^t$, and every element in $\mathscr{S}$ may be expressed as an element of $\mathscr{S}$ by means of Euclid’s algorithm: \[ x^s=q(x)g(x)+r(x), {\rm with\ deg}(r) \lt t, \] so that \[ T^s=q(T)g(T)+r(T)=r(T). \] Thus $\mathscr{S}$ is a subset of the set $\mathscr{S}-\{0\}$, $|\mathscr{S}|\leq |\mathscr{S}-\{0\}|$, and hence $2^n-1\leq 2^t-1$, which, taken with $t\lt n$, makes $t=n$ and $\mathscr{S}=\mathscr{S}-\{0\}$. In particular, the elements of $\mathscr{S}-\{0\}$ are nonsingular, since the elements of $\mathscr{S}$ are.
  This proves that $\beta, \beta T, \beta T^2, \ldots$ has period $k=2^n-1$ for every nonnull $\beta$, since $\beta T^j=\beta$ requires that $T^j+I$ be singular. Every $T^j+I$ equals an element of $\mathscr{S}$, through Euclid’s algorithm, but the only nonsingular matrix in $\mathscr{S}$ is $0$, and $T^j+I=0$ for $j<2^n-1$ contradicts the assumption that $T$ has order $2^n-1$. [7]

意訳:
必要条件の証明: $\Leftarrow$
  ここでは，数列 $\beta, \beta T, \beta T^2, \beta T^3, \ldots$ が $k=2^n-1$ の周期を持つならば，$T$ の位数は $k$ であることを示す．
  非零な $\beta$ について，数列 $\beta, \beta T, \beta T^2, \beta T^3, \ldots$ が $k=2^n-1$ の周期を持つとする．このとき $T$ の位数は最大 $k$ であり，数列 $\{\beta, \beta T, T^2, \beta T^3, \ldots, \beta T^{k-1}\}$ は全て非零な $1\times n$ バイナリベクトルである．したがって，任意の $1\times n$ バイナリベクトル $\gamma$ について $\gamma T^k=\gamma$ とすると， \begin{align*} &\gamma T^k=\gamma \\ \Leftrightarrow\ \ &\gamma T^k - \gamma I = 0 \\ \Leftrightarrow\ \ &\gamma(T^k - I) = 0 \\ \Leftrightarrow\ \ &T^k - I = O \\ \Leftrightarrow\ \ &T^k = I \end{align*} より，$T$ の位数は $k$ である．

十分条件の証明: $\Rightarrow$
  ここでは，$T$ の位数が $k=2^n-1$ ならば，数列 $\{\beta, \beta T, T^2, \beta T^3, \ldots, \beta T^{k-1}\}$ の周期は $k$ であることを示す．この命題は，$\{\beta, \beta T, T^2, \beta T^3, \ldots, \beta T^{k-1}\}\Leftrightarrow\beta\{I, T, T^2, T^3, \ldots, T^{k-1}\}\Leftrightarrow\{I, T, T^2, T^3, \ldots, T^{k-1}\}$ より，「$T$ の位数が $k=2^n-1$ ならば，数列 $\{I, T, T^2, T^3, \ldots, T^{k-1}\}$ の周期は $k$ であることを示す．」に書き直せる．
  まず，数列 $\{I, T, T^2, T^3, \ldots, T^{k-1}\}$ を \[ \mathscr{S}=\{I,T,T^2,T^3,\ldots,T^{k-1}\} \] と表す．このとき，$\mathscr{S}$ の次数 $t$ は $t\leq n$ である．$t=n$ とならないのは，$t$ が $n$ の約数のとき，数列の途中で $T^t=I$ となる可能性を排除できていないことに起因する．しかし，$\mathscr{S}$ は互いに異なるため，ユークリッドの互除法より， \[ T^s=q(T)g(T)+r(T)=0+r(T)=r(T) \] となる．したがって，$2^n-1\leq 2^t-1$ より $n\leq t$ となる．ゆえに，$n=t$ である．

3 Application to Xorshift RNGs
3 Xorshift 擬似乱数生成器への応用

The $n\times n$ matrices $I+L^a$ and $I+R^b$ are nonsingular, since, for example, $L^n=0$ and thus the finite series $I+L^a+L^{2a}+L^{3a}+\cdots$ is inverse of $(I+L^a)$.[1]

$(I+L^a+L^{2a}+L^{3a}+\cdots)(I+L^a) = I$ を示す．
Proof: 証明 \begin{eqnarray*} (I+L^a+L^{2a}+L^{3a}+\cdots)(I+L^a) &=& (I+L^a+L^{2a}+L^{3a}+\cdots)I + (I+L^a+L^{2a}+L^{3a}+\cdots)L^a\\ &=& (I+L^a+L^{2a}+L^{3a}+\cdots) + (L^a+L^2a+L^{3a}+L^{4a}+\cdots)\\ &=& I+(L^a+L^a)+(L^{2a}+L^{2a})+(L^{3a}+L^{3a})+\cdots\\ &=& I \end{eqnarray*} このとき，Bit 行列のため，同じ値の和は，桁上がりにより零となるため，$L^x+L^x=O$．また，最後 2 つペアとれない $L^{xa}$ に関しては，$L^m=O\ (m\geq n)$ より，やはり消えることに注意する．

So an obvious candidate for a matrix that has order $2^n−1$ is $T = (I + L^a)(I + R^b)$ or $T = (I + R^b)(I + L^a)$. Unfortunately, when $n$ is $32$ or $64$, no choices for $a$ and $b$ will provide such a $T$ with the required order. (A preliminary test on candidates is to square $T$ $n$ times. If the result is not $T$, then $T$ cannot have order $2^n−1$. By representing the rows of a binary matrix as a computer word, then xoring all of the rows for which a (left) multiplying binary vector has 1’s, very fast binary matrix products can be evaluated with simple C or Fortran programs, and it only takes a few minutes to check every possible $a$, $b$ in $T = (I + L^a)(I + R^b)$.)[1]

訳:
$T = (I + L^a)(I + R^b)$ または，$T = (I + R^b)(I + L^a)$ が位相 $2^n−1$ を取り得る明確な候補だが，残念ながら，$n=32, 64$ について，全ての可能な $a$, $b$ の組み合わせを実際に計算してたところ，残念ながら，どの組み合わせも条件を満たさなかった．簡単な C や Fortran のプログラムを用いて，数分で計算できる．

これを実際に，参考文献 [3] を用いて計算してみる．計算にあたり，$32$ bits 行列の計算は，[3] の通りである．計算には，$2^{32}-1$ の約数全てを求め利用した．この約数を求めるにあたり，$(2^{32}-1)/2$ までの素数をエラトステネスの篩により求め，素因数分解を行った後，素因数の組み合わせから，全ての約数を求めた．しかし，$64$ bits の行列について，そのまま同様の計算を行おうとすると，篩によって求める素数が非常に大きく，そのままのアルゴリズムでは計算できない．しかしながら，$2^{64}-1$ は稀な数であるため，[4] より素因数分解結果を発見した．

$2^{64}-1 = 3\times 5\times 17\times 257\times 641\times 65537\times 6700417$[4]

また，上記の素因数分解結果の正しさに関しては，[5] より全て素数であることを確認した．実際に分解された結果だけを見れば，小さな素数の掛け算であったため，わざわざ巨大な素数を用意せずとも，素因数分解可能であったことが分かる．
上記の素因数分解結果から，実際に $2^{64}-1$ の約数を全て求める．単なる組み合わせの問題ではあるが，一般に，自然数 $N$ の素因数分解を $N=2^{a_1}\times 3^{a_2}\times 5^{a_3}\times 7^{a_4}\times \cdots$ とするとき，約数の組み合わせの数 $d(N)$ は， $d(N)=(a_1+1)(a_2+1)(a_3+1)(a_4+1)\cdots$ である [6]．したがって，$2^{64}-1$ の約数の個数は，$2^7=128$ 個となる．手計算では，間違えかねないため，計算は sstd により行う．下記に実装と結果を示す．

実装

#define use_sstd_measureTime
#include <sstd/sstd.hpp>

int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

    // struct fact{
    //     uint64 prime;
    //     uint64 num;
    // };
    std::vector<struct sstd::fact> factorList = {{3,1},{5,1},{17,1},{257,1},{641,1},{65537,1},{6700417,1}};
    std::vector<uint64> divs = sstd::divisor(factorList);
    printf("Divisors of 2^64-1 is \n");
    sstd::print(divs);
    
    printf("\n");
    printf("■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe.exe
■ measureTime_start---------------

Divisors of 2^64-1 is
[ 1 3 5 15 17 51 85 255 257 641 771 1285 1923 3205 3855 4369 9615 10897 13107 21845 32691 54485 65535 65537 163455 164737 196611 327685 494211 823685 983055 1114129 2471055 2800529 3342387 5570645 6700417 8401587 14002645 16711935 16843009 20101251 33502085 42007935 42009217 50529027 84215045 100506255 113907089 126027651 210046085 252645135 286331153 341721267 569535445 630138255 714156689 858993459 1431655765 1708606335 1722007169 2142470067 3570783445 4294967295 4294967297 5166021507 8610035845 10712350335 10796368769 12884901891 21474836485 25830107535 29274121873 32389106307 53981843845 64424509455 73014444049 87822365619 146370609365 161945531535 183538269073 219043332147 365072220245 439111828095 439125228929 550614807219 917691345365 1095216660735 1103806595329 1317375686787 2195626144645 2753074036095 3311419785987 5519032976645 6586878433935 7465128891793 16557098929935 18764712120593 22395386675379 37325644458965 56294136361779 93823560602965 111976933376895 112855183834753 281470681808895 281479271743489 338565551504259 564275919173765 844437815230467 1407396358717445 1692827757521295 1918538125190801 4222189076152335 4785147619639313 5755614375572403 9592690625954005 14355442858917939 23925738098196565 28778071877862015 71777214294589695 72340172838076673 217020518514230019 361700864190383365 1085102592571150095 1229782938247303441 3689348814741910323 6148914691236517205 18446744073709551615 ]

■ measureTime_stop----------------
--------------------------------
 Execution time:     0. 000 sec
--------------------------------

これは十分現実的な計算量であるが，やはり，これだけの数を目の当たりにすると，もう少し計算量を減らせないかと思う訳である．そこで，[2] より，

$N$ の約数としては $1, p, q, r, pq, qr, pr$ がありますが，もし $T^p=I$ であったとすると $T^{pq}=(T^p)^q=I^q=IT^{pq}=(T^p)^q=I^q=I$ となりますから，$N'=p, q, r$ に対して $T^{N'}=IT^{N'}=I$ のチェックをしなくても，$N'= pq, qr, pr$ のチェックだけを行えば十分であると分かる．[2]

つまり，$N$ の素因数分解を \[ N=a_1^{p_1}\times a_2^{p_2}\times a_3^{p_3}\times\cdots\times a_m^{p_m} \] とするとき，位数の確認は， \begin{eqnarray*} N'=\ &a_1^{p_1-1}&\times a_2^{p_2} &\times a_3^{p_3} &\times\cdots&\times a_m^{p_m},\\ &a_1^{p_1}\ \ \ &\times a_2^{p_2-1}&\times a_3^{p_3} &\times\cdots&\times a_m^{p_m},\\ &a_1^{p_1}\ \ \ &\times a_2^{p_2} &\times a_3^{p_3-1}&\times\cdots&\times a_m^{p_m},\\ & \ & \ & \ \ \ \ \vdots & \ & \ \\ &a_1^{p_1}\ \ \ &\times a_2^{p_2} &\times a_3^{p_3} &\times\cdots&\times a_m^{p_m-1} \end{eqnarray*} に対してのみ行えばよいことになる．つまり，今回の場合は， \begin{eqnarray*} N' =& 3^0\times 5^1\times 17^1\times 257^1\times 641^1\times 65537^1\times 6700417^1&,\\ & 3^1\times 5^0\times 17^1\times 257^1\times 641^1\times 65537^1\times 6700417^1&,\\ & 3^1\times 5^1\times 17^0\times 257^1\times 641^1\times 65537^1\times 6700417^1&,\\ & 3^1\times 5^1\times 17^1\times 257^0\times 641^1\times 65537^1\times 6700417^1&,\\ & 3^1\times 5^1\times 17^1\times 257^1\times 641^0\times 65537^1\times 6700417^1&,\\ & 3^1\times 5^1\times 17^1\times 257^1\times 641^1\times 65537^0\times 6700417^1&,\\ & 3^1\times 5^1\times 17^1\times 257^1\times 641^1\times 65537^1\times 6700417^0& \end{eqnarray*} の 7 個に \begin{eqnarray*} 2^{64}-1 = 3^1\times 5^1\times 17^1\times 257^1\times 641^1\times 65537^1\times 6700417^1 \end{eqnarray*} を加えた 8 個だけ計算すればよいことになる．(ただし，計算する数は減ったものの，一つ一つの値が大きくなったため，小さな乗算で位数が判明する場合は，逆に計算が遅くなるため，実際に測定する必要がある．計測の結果としては，少なくとも $n=64$ の $T = (I + L^a)(I + R^b)$ について，$58.615 [sec]/ 7.483 [sec] \approx 7.8$ より，7.8 倍程高速になっているが，要素数の減少率 $128/8=16$ 倍程ではない．)
この結果を使用し，$n=32, 64$ について $T = (I + L^a)(I + R^b)$ と $T = (I + R^b)(I + L^a)$ が位相 $2^n−1$ を取り得る $a$, $b$ の組み合わせを総当たりで計算し，存在しないことを確認する．下記に実装と結果を示す．

実装

#define use_sstd_measureTime
#include <sstd/sstd.hpp>

bool isFullPeriod_LR(std::vector<uint64>& divs, uint N, uint a, uint b){
    sstd::bmat I  = sstd::eye(N, N);
    sstd::bmat La = sstd::LxShiftMat(N, a);
    sstd::bmat Rb = sstd::RxShiftMat(N, b);
//  sstd::bmat T = (I + Rb)*(I + La);
    sstd::bmat T = (I + La)*(I + Rb);
//  sstd::printn(T);
    
    for(uint i=0; i<divs.size()-1; i++){
        sstd::bmat Tp = T^(divs[i]); // XORSHIFT
        if(Tp==I){ return false; }
    }
    sstd::bmat Tp = T^(divs[divs.size()-1]); // XORSHIFT
    if(Tp==I){ return true;
    }  else  { return false; }
}
void T_LR(std::vector<uint64>& divs, uint N){
    uint num=0;
    printf("  a,  b\n");
    for(uint a=0; a<N; a++){
        for(uint b=0; b<N; b++){
            if(isFullPeriod_LR(divs, N, a, b)){
                printf("|%2u, %2u", a, b);
                num++;
                if(num%9==0){ printf("|\n"); }
            }
        }
    }
}
int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

    // calculate divisors of 2^32-1.
//  std::vector<uint64> divs32 = sstd::divisor(4294967296-1); // Below line returns the same result of this line, but took a little time to run everytime and takes a little heavy memory (about 2 GByte).
//  std::vector<uint64> divs32 = {1, 3, 5, 15, 17, 51, 85, 255, 257, 771, 1285, 3855, 4369, 13107, 21845, 65535, 65537, 196611, 327685, 983055, 1114129, 3342387, 5570645, 16711935, 16843009, 50529027, 84215045, 252645135, 286331153, 858993459, 1431655765, 4294967295};
    std::vector<uint64> divs32 = {1ull*5ull*17ull*257ull*65537ull,
                                  3ull*1ull*17ull*257ull*65537ull,
                                  3ull*5ull* 1ull*257ull*65537ull,
                                  3ull*5ull*17ull*  1ull*65537ull,
                                  3ull*5ull*17ull*257ull*    1ull,
                                  3ull*5ull*17ull*257ull*65537ull};
    
    // calculate divisors of 2^64-1.
//  std::vector<uint64> divs64 = sstd::divisor(18446744073709551616-1); // impossible in this way // below is calculated by "void cal_divsOf_2_64m1();".
//  std::vector<uint64> divs64 = {1ull, 3ull, 5ull, 15ull, 17ull, 51ull, 85ull, 255ull, 257ull, 641ull, 771ull, 1285ull, 1923ull, 3205ull, 3855ull, 4369ull, 9615ull, 10897ull, 13107ull, 21845ull, 32691ull, 54485ull, 65535ull, 65537ull, 163455ull, 164737ull, 196611ull, 327685ull, 494211ull, 823685ull, 983055ull, 1114129ull, 2471055ull, 2800529ull, 3342387ull, 5570645ull, 6700417ull, 8401587ull, 14002645ull, 16711935ull, 16843009ull, 20101251ull, 33502085ull, 42007935ull, 42009217ull, 50529027ull, 84215045ull, 100506255ull, 113907089ull, 126027651ull, 210046085ull, 252645135ull, 286331153ull, 341721267ull, 569535445ull, 630138255ull, 714156689ull, 858993459ull, 1431655765ull, 1708606335ull, 1722007169ull, 2142470067ull, 3570783445ull, 4294967295ull, 4294967297ull, 5166021507ull, 8610035845ull, 10712350335ull, 10796368769ull, 12884901891ull, 21474836485ull, 25830107535ull, 29274121873ull, 32389106307ull, 53981843845ull, 64424509455ull, 73014444049ull, 87822365619ull, 146370609365ull, 161945531535ull, 183538269073ull, 219043332147ull, 365072220245ull, 439111828095ull, 439125228929ull, 550614807219ull, 917691345365ull, 1095216660735ull, 1103806595329ull, 1317375686787ull, 2195626144645ull, 2753074036095ull, 3311419785987ull, 5519032976645ull, 6586878433935ull, 7465128891793ull, 16557098929935ull, 18764712120593ull, 22395386675379ull, 37325644458965ull, 56294136361779ull, 93823560602965ull, 111976933376895ull, 112855183834753ull, 281470681808895ull, 281479271743489ull, 338565551504259ull, 564275919173765ull, 844437815230467ull, 1407396358717445ull, 1692827757521295ull, 1918538125190801ull, 4222189076152335ull, 4785147619639313ull, 5755614375572403ull, 9592690625954005ull, 14355442858917939ull, 23925738098196565ull, 28778071877862015ull, 71777214294589695ull, 72340172838076673ull, 217020518514230019ull, 361700864190383365ull, 1085102592571150095ull, 1229782938247303441ull, 3689348814741910323ull, 6148914691236517205ull, 18446744073709551615ull}; // Below line is able to get same result.
    std::vector<uint64> divs64 = {1ull*5ull*17ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*1ull*17ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*5ull* 1ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*  1ull*641ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*257ull*  1ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*257ull*641ull*    1ull*6700417ull,
                                  3ull*5ull*17ull*257ull*641ull*65537ull*      1ull,
                                  3ull*5ull*17ull*257ull*641ull*65537ull*6700417ull};
    
    printf("□ calculate LR 32bits\n");
    T_LR(divs32, 32); printf("\n");
    
    printf("□ calculate LR 64bits\n");
    T_LR(divs64, 64); printf("\n");

    printf("\n■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe.exe
■ measureTime_start---------------

□ calculate LR 32bits
  a,  b

□ calculate LR 64bits
  a,  b
| 7,  9| 9,  7

■ measureTime_stop----------------
--------------------------------
 Execution time:     7. 841 sec
--------------------------------

計算結果は上記のようになる．また，$T = (I + R^b)(I + L^a)$ は $T = (I + L^a)(I + R^b)$ と対象なので，結果は同じとなる．[1] では $n=32, 64$ について，一つの組み合わせも存在しないと指摘されていたが，実際に計算してみると，$n=64$ については，$7, 9$ の組み合わせで存在していることが分かる．これには，1) 私の実装ミス，2) [1]の間違い，3) 既に訂正や指摘がなされている，4) 実際に使用すると疑似乱数生成器としての性質が悪かったため無いと記載された，など，様々な理由が考えられる．問題の解決には追加の検証を必要とするが，少なくとも，私の行った計算においては，$T = (I + L^a)(I + R^b)$ または，$T = (I + R^b)(I + L^a)$ を疑似乱数生成器とするハイパーパラメータは存在する．

　[追記2] では「論文の 2 ページ目に n が 32 と 64 の場合には T=(I+La)(I+Rb)では (a,b) が見つからなかったと書いてあるが，n が 64 の場合 (7,9) は条件を満たしている．」としている．

However, there are many choices for $a, b, c$ for which the matrices $T = (I + L^a)(I + R^b)(I + L^c)$ have order $2^n-1$, when $n = 32$ or $n = 64$. There are $81$ triples $(a, b, c)$, $a < c$, for which the $32\times 32$ binary matrix $T = (I + L^a)(I + R^b)(I + R^c)$ ※1 has order $2^{32}-1$:

| 1, 3,10| 1, 5,16| 1, 5,19| 1, 9,29| 1,11, 6| 1,11,16| 1,19, 3| 1,21,20| 1,27,27|

| 2, 5,15| 2, 5,21| 2, 7, 7| 2, 7, 9| 2, 7,25| 2, 9,15| 2,15,17| 2,15,25| 2,21, 9|

| 3, 1,14| 3, 3,26| 3, 3,28| 3, 3,29| 3, 5,20| 3, 5,22| 3, 5,25| 3, 7,29| 3,13, 7|

| 3,23,25| 3,25,24| 3,27,11| 4, 3,17| 4, 3,27| 4, 5,15| 5, 3,21| 5, 7,22| 5, 9,7 |

| 5, 9,28| 5, 9,31| 5,13, 6| 5,15,17| 5,17,13| 5,21,12| 5,27, 8| 5,27,21| 5,27,25|

| 5,27,28| 6, 1,11| 6, 3,17| 6,17, 9| 6,21, 7| 6,21,13| 7, 1, 9| 7, 1,18| 7, 1,25|

| 7,13,25| 7,17,21| 7,25,12| 7,25,20| 8, 7,23| 8,9,23 | 9, 5,1 | 9, 5,25| 9,11,19|

| 9,21,16|10, 9,21|10, 9,25|11, 7,12|11, 7,16|11,17,13|11,21,13|12, 9,23|13, 3,17|

|13, 3,27|13, 5,19|13,17,15|14, 1,15|14,13,15|15, 1,29|17,15,20|17,15,23|17,15,26|[1]

訳:
しかしながら，$n=32, 64$ のとき，行列 $T = (I + L^a)(I + R^b)(I + L^c)$ は，位数 $2^n-1$ となる多くの $a, b, c$ の組み合わせが存在する．位数 $2^{32}-1$ となる $32\times 32$ バイナリ行列 $T=(I+L^a)(I+R^b)(I+L^c)$ ※1 には，$a < c$ を満たす 3 要素 $(a,b,c)$ の組み合わせは，$81$ 通りある．

訳注:
※1: $T = (I + L^a)(I + R^b)(I + R^c)$ は $T = (I + L^a)(I + R^b)(I + L^c)$ の誤植．($T = (I + L^a)(I + R^b)(I + R^c)$ も位数 $2^n-1$ を満たす擬似乱数生成器としてのパラメータは持つが，論文中で計算されているパラメータは， $T = (I + L^a)(I + R^b)(I + L^c)$ についてのパラメータである．)

次に，$n=32, 64$ かつ $T = (I + L^a)(I + R^b)(I + L^c)$ について，位数 $2^n-1$ を取る整数 $a, b, c$ の組み合わせを総当りで計算する．まず，$n=32$ について，実際に上記の表と同じ結果を得るかを検証する．これは以前に [3] で行った計算と同じである．

実装

#define use_sstd_measureTime
#include <sstd/sstd.hpp>

bool isFullPeriod_LRL(std::vector<uint64>& divs, uint N, uint a, uint b, uint c){
    sstd::bmat I  = sstd::eye(N, N);
    sstd::bmat La = sstd::LxShiftMat(N, a);
    sstd::bmat Rb = sstd::RxShiftMat(N, b);
    sstd::bmat Lc = sstd::LxShiftMat(N, c);
    sstd::bmat T = (I + La)*(I + Rb)*(I + Lc);
//  sstd::printn(T);
    
    for(uint i=0; i<divs.size()-1; i++){
        sstd::bmat Tp = T^(divs[i]); // XORSHIFT
        if(Tp==I){ return false; }
    }
    sstd::bmat Tp = T^(divs[divs.size()-1]); // XORSHIFT
    if(Tp==I){ return true;
    }  else  { return false; }
}
void T_LRL(std::vector<uint64>& divs, uint N){
    uint num=0;
    printf("  a,  b,  c\n");
    for(uint a=0; a<N; a++){
        for(uint b=0; b<N; b++){
            for(uint c=0; c<N; c++){
                if(a<c && isFullPeriod_LRL(divs, N, a, b, c)){
                    printf("|%2u, %2u, %2u", a, b, c);
                    num++;
                    if(num%9==0){ printf("|\n"); }
                }
            }
        }
    }
}
int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

    // calculate divisors of 2^32-1.
//  std::vector<uint64> divs32 = sstd::divisor(4294967296-1); // Below line returns the same result of this line, but took a little time to run everytime and takes a little heavy memory (about 2 GByte).
//  std::vector<uint64> divs32 = {1, 3, 5, 15, 17, 51, 85, 255, 257, 771, 1285, 3855, 4369, 13107, 21845, 65535, 65537, 196611, 327685, 983055, 1114129, 3342387, 5570645, 16711935, 16843009, 50529027, 84215045, 252645135, 286331153, 858993459, 1431655765, 4294967295};
    std::vector<uint64> divs32 = {1ull*5ull*17ull*257ull*65537ull,
                                  3ull*1ull*17ull*257ull*65537ull,
                                  3ull*5ull* 1ull*257ull*65537ull,
                                  3ull*5ull*17ull*  1ull*65537ull,
                                  3ull*5ull*17ull*257ull*    1ull,
                                  3ull*5ull*17ull*257ull*65537ull};
    
    printf("□ calculate LRL 32bits\n");
    T_LRL(divs32, 32); printf("\n");

    printf("\n■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe.exe
■ measureTime_start---------------

□ calculate LRL 32bits
  a,  b,  c
| 1,  3, 10| 1,  5, 16| 1,  5, 19| 1,  9, 29| 1, 11,  6| 1, 11, 16| 1, 19,  3| 1, 21, 20| 1, 27, 27|
| 2,  5, 15| 2,  5, 21| 2,  7,  7| 2,  7,  9| 2,  7, 25| 2,  9, 15| 2, 15, 17| 2, 15, 25| 2, 21,  9|
| 3,  1, 14| 3,  3, 26| 3,  3, 28| 3,  3, 29| 3,  5, 20| 3,  5, 22| 3,  5, 25| 3,  7, 29| 3, 13,  7|
| 3, 23, 25| 3, 25, 24| 3, 27, 11| 4,  3, 17| 4,  3, 27| 4,  5, 15| 5,  3, 21| 5,  7, 22| 5,  9,  7|
| 5,  9, 28| 5,  9, 31| 5, 13,  6| 5, 15, 17| 5, 17, 13| 5, 21, 12| 5, 27,  8| 5, 27, 21| 5, 27, 25|
| 5, 27, 28| 6,  1, 11| 6,  3, 17| 6, 17,  9| 6, 21,  7| 6, 21, 13| 7,  1,  9| 7,  1, 18| 7,  1, 25|
| 7, 13, 25| 7, 17, 21| 7, 25, 12| 7, 25, 20| 8,  7, 23| 8,  9, 23| 9,  5, 14| 9,  5, 25| 9, 11, 19|
| 9, 21, 16|10,  9, 21|10,  9, 25|11,  7, 12|11,  7, 16|11, 17, 13|11, 21, 13|12,  9, 23|13,  3, 17|
|13,  3, 27|13,  5, 19|13, 17, 15|14,  1, 15|14, 13, 15|15,  1, 29|17, 15, 20|17, 15, 23|17, 15, 26|

■ measureTime_stop----------------
--------------------------------
 Execution time:     3. 428 sec
--------------------------------

結果として上記の結果を得る．計算結果は，7 行 7 列目だけ一致していないが，他の値は全て一致しているため，誤植だと思われる．(仮に誤植でないとすると，条件 $a < c$ を満たさない．)

　[追記2] では「2 ページ目の下にある 81 個の (a,b,c) のうち，|9,5,1| は |9,5,14| の間違いである．」としている．

ここで，上記の計算結果を共有する擬似乱数生成器の組み合わせについて考える．

Of those $81$ triples with $a < c$, the triple $(c, b, a)$ also provides a full period $T$ , making $162$ in all. In addition, another $162$ full-period $T$’s arise from taking transposes: use triples $(a, b, c)$ to form $T = (I+R^c)(I+L^b)(I+R^c)$ ※1, for a total of $324$. Then, finally, for each of the $324$ $T$’s of the form $(I+L^a)(I+R^b)(I+L^c)$ or $(I+R^a)(I+L^b)(I+R^c)$, the corresponding matrices $(I+L^a)(I+L^c)(I+R^b)$, and $(I+R^a)(I+R^c)(I+L^b)$, also turn out to have period $2^{32}−1$, making a total of $648$ choices.[1]

訳:
もちろん，$(a, b, c)$ の組み合わせは 81 通りあるが，$(c, a, b)$ の組み合わせもまた位数 $2^n-1$ の $T$ を与える ※2．したがって，162 通りの組み合わせが作られる．加えて，$(a, b, c)$ を組み合わせた $T=(I+R^a)(I+L^b)(I+R^c)$ から ※1， $T$ は，位数が $2^n-1$ となり，もう $162$ 通りの組み合わせが得られるため，合計で $324$ 通りとなる．そして最終的に，$324$ 通りの組み合わせを持つ $(I+L^a)(I+R^b)(I+L^c)$ と $(I+R^a)(I+L^b)(I+R^c)$ が，それぞれ $(I+L^a)(I+L^c)(I+R^b)$ ※3 と $(I+R^a)(I+R^c)(I+L^b)$ に対応し，いずれもまた $2^{32}-1$ の位数を持つため，合計 $648$ 通りとなる．

訳注:
※1. 恐らく，$T=(I+R^a)(I+L^b)(I+R^c)$ を $T=(I+R^c)(I+L^b)(I+R^c)$ と誤植している．
※2. 計算を $a < c$ の範囲に制限していたのは，$a$ と $c$ を入れ替えた値が，ちょうど対称となるため．
※3. $T=(I+L^a)(I+L^c)(I+R^b)$ を C 言語プログラムとして表現すると，

yˆ=y>>b; yˆ=y<<c; yˆ=y<<a;

のように，逆順となることに注意されたい．これは，[2] で示されているように，C 言語プログラムをベクトル表記に変換するとわかり易く，

[y<<a のベクトル表記] = $R^ay$, [$\alpha$^$\beta$ のベクトル表記] = $\alpha+\beta$

より，

[y^=y>>b のベクトル表記] = $y+R^by=(I+R^b)y$ ・・・(1)
[y^=y<<c のベクトル表記] = $y+L^cy=(I+L^c)y$ ・・・(2)
[y^=y<<a のベクトル表記] = $y+L^ay=(I+L^a)y$ ・・・(3)

となる．したがって，式 (3) に式 (2) を，式 2 に式 (1) を，それぞれ代入すると，求める C 言語プログラムのベクトル表記は，

[yˆ=y>>b; yˆ=y<<c; yˆ=y<<a; のベクトル表記] = $(I+L^a)y''$
= $(I+L^a)(I+L^c)y'$
= $(I+L^a)(I+L^c)(I+L^b)y$

となる．(ただし，$y''=(I+L^c)y'$, $y'=(I+R^b)y$．)

In summary, for each of the above 81 triples a, b, c with a < c, any one of these eight lines of C can serve as the basis for a 32-bit xorshift RNG with period 2 32 −1,:[1]

訳:
まとめとして，上記 $81$ 種類の $a$, $b$, $c$ (ただし，$a < c$) の組み合わせは，下記のいずれか 8 行の C 言語プログラムによって，位数 $2^{32}-1$ の 32-bit xorshift RNG (32-bit Xorshift 擬似乱数生成器) を提供することができる．

yˆ=y<<a; yˆ=y>>b; yˆ=y<<c;
yˆ=y<<c; yˆ=y>>b; yˆ=y<<a;
yˆ=y>>a; yˆ=y<<b; yˆ=y>>c;
yˆ=y>>c; yˆ=y<<b; yˆ=y>>a;
yˆ=y<<a; yˆ=y<<c; yˆ=y>>b;
yˆ=y<<c; yˆ=y<<a; yˆ=y>>b;
yˆ=y>>a; yˆ=y>>c; yˆ=y<<b;
yˆ=y>>c; yˆ=y>>a; yˆ=y<<b;

[1]

訳注:
この項目について，追試は行っていないが，より多くのパラメータが必要とされる場合においては，項目の追試と，言及されていない，$T=(I+L^a)(I+L^b)(I+L^c)$，$T=(I+L^a)(I+R^b)(I+R^c)$，$T=(I+R^a)(I+L^b)(I+L^c)$，$T=(I+R^a)(I+R^b)(I+R^c)$ について，調査することを推奨する．

続いて，$n=64$ の場合について計算を行う．

  For 64-bit integers, the following $275$ triples provide $64\times 64$ matrices $T=(I+L^a)(I+R^b)(I+L^c)$ whose order is $2^{64}−1$:[1]

訳: 

  64-bit 整数について，次の $275$ 個の 3 つの数値セットが，$64\times 64$ の行列 $T=(I+L^a)(I+R^b)(I+L^c)$ を提供する．これらの行列の位数は，$2^{64}-1$ である．

| 1, 1,54| 1, 1,55| 1, 3,45| 1, 7, 9| 1, 7,44| 1, 7,46| 1, 9,50| 1,11,35| 1,11,50|

| 1,13,45| 1,15, 4| 1,15,63| 1,19, 6| 1,19,16| 1,23,14| 1,23,29| 1,29,34| 1,35, 5|

| 1,35,11| 1,35,34| 1,45,37| 1,51,13| 1,53, 3| 1,59,14| 2,13,23| 2,31,51| 2,31,53|

| 2,43,27| 2,47,49| 3, 1,11| 3, 5,21| 3,13,59| 3,21,31| 3,25,20| 3,25,31| 3,25,56|

| 3,29,40| 3,29,47| 3,29,49| 3,35,14| 3,37,17| 3,43, 4| 3,43, 6| 3,43,11| 3,51,16|

| 3,53, 7| 3,61,17| 3,61,26| 4, 7,19| 4, 9,13| 4,15,51| 4,15,53| 4,29,45| 4,29,49|

| 4,31,33| 4,35,15| 4,35,21| 4,37,11| 4,37,21| 4,41,19| 4,41,45| 4,43,21| 4,43,31|

| 4,53, 7| 5, 9,23| 5,11,54| 5,15,27| 5,17,11| 5,23,36| 5,33,29| 5,41,20| 5,45,16|

| 5,47,23| 5,53,20| 5,59,33| 5,59,35| 5,59,63| 6, 1,17| 6, 3,49| 6,17,47| 6,23,27|

| 6,27, 7| 6,43,21| 6,49,29| 6,55,17| 7, 5,41| 7, 5,47| 7, 5,55| 7, 7,20| 7, 9,38|

| 7,11,10| 7,11,35| 7,13,58| 7,19,17| 7,19,54| 7,23, 8| 7,25,58| 7,27,59| 7,33, 8|

| 7,41,40| 7,43,28| 7,51,24| 7,57,12| 8, 5,59| 8, 9,25| 8,13,25| 8,13,61| 8,15,21|

| 8,25,59| 8,29,19| 8,31,17| 8,37,21| 8,51,21| 9, 1,27| 9, 5,36| 9, 5,43| 9, 7,18|

| 9,19,18| 9,21,11| 9,21,20| 9,21,40| 9,23,57| 9,27,10| 9,29,12| 9,29,37| 9,37,31|

| 9,41,45|10, 7,33|10,27,59|10,53,13|11, 5,32|11, 5,34|11, 5,43|11, 5,45|11, 9,14|

|11, 9,34|11,13,40|11,15,37|11,23,42|11,23,56|11,25,48|11,27,26|11,29,14|11,31,18|

|11,53,23|12, 1,31|12, 3,13|12, 3,49|12, 7,13|12,11,47|12,25,27|12,39,49|12,43,19|

|13, 3,40|13, 3,53|13, 7,17|13, 9,15|13, 9,50|13,13,19|13,17,43|13,19,28|13,19,47|

|13,21,18|13,21,49|13,29,35|13,35,30|13,35,38|13,47,23|13,51,21|14,13,17|14,15,19|

|14,23,33|14,31,45|14,47,15|15, 1,19|15, 5,37|15,13,28|15,13,52|15,17,27|15,19,63|

|15,21,46|15,23,23|15,45,17|15,47,16|15,49,26|16, 5,17|16, 7,39|16,11,19|16,11,27|

|16,13,55|16,21,35|16,25,43|16,27,53|16,47,17|17,15,58|17,23,29|17,23,51|17,23,52|

|17,27,22|17,45,22|17,47,28|17,47,29|17,47,54|18, 1,25|18, 3,43|18,19,19|18,25,21|

|18,41,23|19, 7,36|19, 7,55|19,13,37|19,15,46|19,21,52|19,25,20|19,41,21|19,43,27|

|20, 1,31|20, 5,29|21, 1,27|21, 9,29|21,13,52|21,15,28|21,15,29|21,17,24|21,17,30|

|21,17,48|21,21,32|21,21,34|21,21,37|21,21,38|21,21,40|21,21,41|21,21,43|21,41,23|

|22, 3,39|23, 9,38|23, 9,48|23, 9,57|23,13,38|23,13,58|23,13,61|23,17,25|23,17,54|

|23,17,56|23,17,62|23,41,34|23,41,51|24, 9,35|24,11,29|24,25,25|24,31,35|25, 7,46|

|25, 7,49|25, 9,39|25,11,57|25,13,29|25,13,39|25,13,62|25,15,47|25,21,44|25,27,27|

|25,27,53|25,33,36|25,39,54|28, 9,55|28,11,53|29,27,37|31, 1,51|31,25,37|31,27,35|

|33,31,43|33,31,55|43,21,46|49,15,61|55, 9,56|
[1]

ここで，上記のリストを再現するプログラムを次に示す．

実装

#define use_sstd_measureTime
#include <sstd/sstd.hpp>

void T_LRL(std::vector<uint64>& divs, uint N){
    // This function is omitted, because of the above one is same.
}
int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

    // calculate divisors of 2^64-1.
//  std::vector<uint64> divs64 = sstd::divisor(18446744073709551616-1); // impossible in this way // below is calculated by "void cal_divsOf_2_64m1();".
//  std::vector<uint64> divs64 = {1ull, 3ull, 5ull, 15ull, 17ull, 51ull, 85ull, 255ull, 257ull, 641ull, 771ull, 1285ull, 1923ull, 3205ull, 3855ull, 4369ull, 9615ull, 10897ull, 13107ull, 21845ull, 32691ull, 54485ull, 65535ull, 65537ull, 163455ull, 164737ull, 196611ull, 327685ull, 494211ull, 823685ull, 983055ull, 1114129ull, 2471055ull, 2800529ull, 3342387ull, 5570645ull, 6700417ull, 8401587ull, 14002645ull, 16711935ull, 16843009ull, 20101251ull, 33502085ull, 42007935ull, 42009217ull, 50529027ull, 84215045ull, 100506255ull, 113907089ull, 126027651ull, 210046085ull, 252645135ull, 286331153ull, 341721267ull, 569535445ull, 630138255ull, 714156689ull, 858993459ull, 1431655765ull, 1708606335ull, 1722007169ull, 2142470067ull, 3570783445ull, 4294967295ull, 4294967297ull, 5166021507ull, 8610035845ull, 10712350335ull, 10796368769ull, 12884901891ull, 21474836485ull, 25830107535ull, 29274121873ull, 32389106307ull, 53981843845ull, 64424509455ull, 73014444049ull, 87822365619ull, 146370609365ull, 161945531535ull, 183538269073ull, 219043332147ull, 365072220245ull, 439111828095ull, 439125228929ull, 550614807219ull, 917691345365ull, 1095216660735ull, 1103806595329ull, 1317375686787ull, 2195626144645ull, 2753074036095ull, 3311419785987ull, 5519032976645ull, 6586878433935ull, 7465128891793ull, 16557098929935ull, 18764712120593ull, 22395386675379ull, 37325644458965ull, 56294136361779ull, 93823560602965ull, 111976933376895ull, 112855183834753ull, 281470681808895ull, 281479271743489ull, 338565551504259ull, 564275919173765ull, 844437815230467ull, 1407396358717445ull, 1692827757521295ull, 1918538125190801ull, 4222189076152335ull, 4785147619639313ull, 5755614375572403ull, 9592690625954005ull, 14355442858917939ull, 23925738098196565ull, 28778071877862015ull, 71777214294589695ull, 72340172838076673ull, 217020518514230019ull, 361700864190383365ull, 1085102592571150095ull, 1229782938247303441ull, 3689348814741910323ull, 6148914691236517205ull, 18446744073709551615ull}; // Below line is able to get same result.
    std::vector<uint64> divs64 = {1ull*5ull*17ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*1ull*17ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*5ull* 1ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*  1ull*641ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*257ull*  1ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*257ull*641ull*    1ull*6700417ull,
                                  3ull*5ull*17ull*257ull*641ull*65537ull*      1ull,
                                  3ull*5ull*17ull*257ull*641ull*65537ull*6700417ull};

    printf("□ calculate LRL 64bits\n");
    T_LRL(divs64, 64); printf("\n");
    
    printf("\n■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe.exe
■ measureTime_start---------------

□ calculate LRL 64bits
  a,  b,  c
| 0,  7,  9| 0,  9,  7| 1,  1, 54| 1,  1, 55| 1,  3, 45| 1,  7,  9| 1,  7, 44| 1,  7, 46| 1,  9, 50|
| 1, 11, 35| 1, 11, 50| 1, 13, 45| 1, 15,  4| 1, 15, 63| 1, 19,  6| 1, 19, 16| 1, 23, 14| 1, 23, 29|
| 1, 29, 34| 1, 35,  5| 1, 35, 11| 1, 35, 34| 1, 45, 37| 1, 51, 13| 1, 53,  3| 1, 59, 14| 2, 13, 23|
| 2, 31, 51| 2, 31, 53| 2, 43, 27| 2, 47, 49| 3,  1, 11| 3,  5, 21| 3, 13, 59| 3, 21, 31| 3, 25, 20|
| 3, 25, 31| 3, 25, 56| 3, 29, 40| 3, 29, 47| 3, 29, 49| 3, 35, 14| 3, 37, 17| 3, 43,  4| 3, 43,  6|
| 3, 43, 11| 3, 51, 16| 3, 53,  7| 3, 61, 17| 3, 61, 26| 4,  7, 19| 4,  9, 13| 4, 15, 51| 4, 15, 53|
| 4, 29, 45| 4, 29, 49| 4, 31, 33| 4, 35, 15| 4, 35, 21| 4, 37, 11| 4, 37, 21| 4, 41, 19| 4, 41, 45|
| 4, 43, 21| 4, 43, 31| 4, 53,  7| 5,  9, 23| 5, 11, 54| 5, 15, 27| 5, 17, 11| 5, 23, 36| 5, 33, 29|
| 5, 41, 20| 5, 45, 16| 5, 47, 23| 5, 53, 20| 5, 59, 33| 5, 59, 35| 5, 59, 63| 6,  1, 17| 6,  3, 49|
| 6, 17, 47| 6, 23, 27| 6, 27,  7| 6, 43, 21| 6, 49, 29| 6, 55, 17| 7,  5, 41| 7,  5, 47| 7,  5, 55|
| 7,  7, 20| 7,  9, 38| 7, 11, 10| 7, 11, 35| 7, 13, 58| 7, 19, 17| 7, 19, 54| 7, 23,  8| 7, 25, 58|
| 7, 27, 59| 7, 33,  8| 7, 41, 40| 7, 43, 28| 7, 51, 24| 7, 57, 12| 8,  5, 59| 8,  9, 25| 8, 13, 25|
| 8, 13, 61| 8, 15, 21| 8, 25, 59| 8, 29, 19| 8, 31, 17| 8, 37, 21| 8, 51, 21| 9,  1, 27| 9,  5, 36|
| 9,  5, 43| 9,  7, 18| 9, 19, 18| 9, 21, 11| 9, 21, 20| 9, 21, 40| 9, 23, 57| 9, 27, 10| 9, 29, 12|
| 9, 29, 37| 9, 37, 31| 9, 41, 45|10,  7, 33|10, 27, 59|10, 53, 13|11,  5, 32|11,  5, 34|11,  5, 43|
|11,  5, 45|11,  9, 14|11,  9, 34|11, 13, 40|11, 15, 37|11, 23, 42|11, 23, 56|11, 25, 48|11, 27, 26|
|11, 29, 14|11, 31, 18|11, 53, 23|12,  1, 31|12,  3, 13|12,  3, 49|12,  7, 13|12, 11, 47|12, 25, 27|
|12, 39, 49|12, 43, 19|13,  3, 40|13,  3, 53|13,  7, 17|13,  9, 15|13,  9, 50|13, 13, 19|13, 17, 43|
|13, 19, 28|13, 19, 47|13, 21, 18|13, 21, 49|13, 29, 35|13, 35, 30|13, 35, 38|13, 47, 23|13, 51, 21|
|14, 13, 17|14, 15, 19|14, 23, 33|14, 31, 45|14, 47, 15|15,  1, 19|15,  5, 37|15, 13, 28|15, 13, 52|
|15, 17, 27|15, 19, 63|15, 21, 46|15, 23, 23|15, 45, 17|15, 47, 16|15, 49, 26|16,  5, 17|16,  7, 39|
|16, 11, 19|16, 11, 27|16, 13, 55|16, 21, 35|16, 25, 43|16, 27, 53|16, 47, 17|17, 15, 58|17, 23, 29|
|17, 23, 51|17, 23, 52|17, 27, 22|17, 45, 22|17, 47, 28|17, 47, 29|17, 47, 54|18,  1, 25|18,  3, 43|
|18, 19, 19|18, 25, 21|18, 41, 23|19,  7, 36|19,  7, 55|19, 13, 37|19, 15, 46|19, 21, 52|19, 25, 20|
|19, 41, 21|19, 43, 27|20,  1, 31|20,  5, 29|21,  1, 27|21,  9, 29|21, 13, 52|21, 15, 28|21, 15, 29|
|21, 17, 24|21, 17, 30|21, 17, 48|21, 21, 32|21, 21, 34|21, 21, 37|21, 21, 38|21, 21, 40|21, 21, 41|
|21, 21, 43|21, 41, 23|22,  3, 39|23,  9, 38|23,  9, 48|23,  9, 57|23, 13, 38|23, 13, 58|23, 13, 61|
|23, 17, 25|23, 17, 54|23, 17, 56|23, 17, 62|23, 41, 34|23, 41, 51|24,  9, 35|24, 11, 29|24, 25, 25|
|24, 31, 35|25,  7, 46|25,  7, 49|25,  9, 39|25, 11, 57|25, 13, 29|25, 13, 39|25, 13, 62|25, 15, 47|
|25, 21, 44|25, 27, 27|25, 27, 53|25, 33, 36|25, 39, 54|28,  9, 55|28, 11, 53|29, 27, 37|31,  1, 51|
|31, 25, 37|31, 27, 35|33, 31, 43|33, 31, 55|43, 21, 46|49, 15, 61|55,  9, 56

■ measureTime_stop----------------
--------------------------------
 Execution time:   292. 842 sec
--------------------------------

このとき，0 を含む始めの 2 つの値

| 0,  7,  9| 0,  9,  7|

は，シフト数が 0 のため，パラメータとしては不的確である．このため，プログラム注で除去しても問題ない．しかしながら，この計算結果は，先に示した $T = (I + R^b)(I + L^a)$ や $T = (I + L^a)(I + R^b)$ が，$n=64$ において，擬似乱数生成器となりうるパラメータを持つことを示唆する．

As with the $32$-bit case, a selection of any one of the $275$ a, b, c choices for $64$-bit sequences, and any one of the above eight lines of C code, will provide, for $64$-bit words, a xorshift RNG with period $2^{64}−1$, for a total of $8\times 275=2200$ choices.[1]

訳:
$32$-bit の場合と同様に，$64$-bit における $275$ 通りの $a, b, c$ の選び方と，$64$-bits ワードにおける，上記の 8 行の C 言語プログラムコードは，位数 $2^{64}-1$ かつ組み合わせ総数 $8\times 275=2200$ 通りの選択肢を持つ xorshift RNG (xorshift 擬似乱数生成器) を提供する．

Here is a basic 32-bit xorshift C procedure that takes a 32-bit seed value y:[1]

訳:
ここに，基本的な $32$-bit xorshift の C 言語関数がある．これは $32$-bit のシード値を持つ．

unsigned long xor(){
static unsigned long y=2463534242;
yˆ=(y<<13); y=(y>>17); return (yˆ=(y<<5)); }

It uses one of my favorite choices, $[a, b, c] = [13, 17, 5]$, and will pass almost all tests of randomness, except the binary rank test in Diehard [2]. (A long period xorshift RNG necessarily uses a nonsingular matrix transformation, so every successive n vectors must be linearly independent, while truly random binary vectors will be linearly independent only some 30% of the time.) Although I have only tested a few of them, any one of the 648 choices above is likely to provide a very fast, simple, high quality RNG. [1]

訳:
上記の実装は，私 ([1]の著者) のお気に入りの選択肢，$[a, b, c] = [13, 17, 5]$ を使用しており，Diehard [2] による乱雑性試験のうち，バイナリ・ランク試験以外，全て通過している．(長周期の xorshift 擬似乱数生成器は，必然的に正則行列変換を用いる．したがって，真に乱雑なバイナリベクトルが，時間のわずか $30$ % しか線形独立ではないとしても，全ての連続する $n$ 個のベクトルは，線形独立でなければならない．) 私 ([1]の著者) は，それらのいくつかをテストしただけですが，$648$ の選択肢の内どれを選んでも非常に高速で，簡単かつ，高い品質の擬似乱数生成器が提供されると考えられます．

For C compilers that have $64$-bit integers, the following will provide an excellent period $2^{64}−1$ RNG, given a $64$-bit seed x:

訳:
$64$-bit の整数を取り扱うことのできる C コンパイラでは，次の $64$-bit seed 値を $x$ として与えた (C 言語プログラムが) とても優れた $2^{64}-1$ の位数を持った擬似乱数生成器を与えます．

unsigned long long xor64(){
static unsigned long long x=88172645463325252LL;
xˆ=(x<<13); xˆ=(x>>7); return (xˆ=(x<<17));

but any of the above 2200 choices is likely to do as well.[1]

訳:
その他，上記の 2200 の選択肢のいずれの場合も，同様に行う可能性が高い．

3.1 Binary vector spaces of dimension $n=96, 128, 160...$.
3.1 大きさ $n=96, 128, 160, ...$ のバイナリ・ベクトル空間

この項目では，$32$ or $64$-bit 幅の擬似乱数生成器が，より長い周期を持つためのアルゴリズムを示します．多くの場合 CPU は $32$ or $64$-bit より大きな整数をハードウェアで計算できないため，単純に行列 $T$ のサイズを大きくした場合，高速に XOR 演算や Shift 演算を実行できません．また，例えば，$32$-bit 幅・位数 $2^{128}-1$ の擬似乱数生成器が必要となった場合，これまでの手法では $128$-bit 幅・位数 $2^{128}-1$ の擬似乱数生成器を設計することとなり，これは $96$-bit 分の不要な bit 幅を生成します．そこで，32 or 64-bit までの XOR 演算や Shift 演算を組み合わせることで，より位数の長い擬似乱数生成器を設計します．(計算上は，32 or 64-bit 幅の計算により，$n$-bit 幅・$2^n-1$ の擬似乱数生成器となりますが，論文中で示された例では，結局，行列下位 bit の 32 or 64-bit 幅分しか取り出していません．必要であれば，これらの値を返すことで，bit 幅を増やすことができます．ただし，乱雑性試験などは行う必要があるかもしれません．)

While it is convenient to use a $32$-bit computer word to represent an element of a vector space of dimension $32$, or dimension $64$ for compilers that allow $64$-bit integers, to get longer xorshift periods we need methods for representing elements of vector spaces of higher dimensions. A good way to do this is to allow, say, $1\times 96$ vectors made up of $32$-bit components $(x, y, z)$ or $1\times 128$ vectors with $32$-bit components $(x, y, z, w)$, etc. We are then faced with the problem of choosing matrices T that define linear transformations over such vector spaces.[1]

訳:
ベクトル $32$ や $64$ のサイズに代表される大きさの要素は，$32$-bit ワードや $64$-bit ワードを許すコンピュータにおいて便利である一方，より長い位数の xorshift を得るため，我々には高次のベクトル空間における要素を表現する方法が必要である．これを行う良い方法の一つは，例えば，$1\times 96$ のベクトルを $32$-bit 要素 $(x, y, z)$ で，あるいは，$1\times 128$ のベクトルを $32$-bit 要素 $(x, y, z, w)$ で，構成できるようにすることなどだ．このとき我々は，このようなベクトル空間を超えて線形変換を定義する行列 $T$ を，決定する問題に直面する．

A natural choice is to make T a companion matrix in block form—that is, for n = 64, 96, 128,
\[ T = \left( \begin{array}{ccc} 0 & A \\ I & B \end{array} \right),\ \ \ \ \ \ A = \left( \begin{array}{ccc} 0 & 0 & A \\ I & 0 & C \\ 0 & I & B \end{array} \right),\ \ \ \ \ \ A = \left( \begin{array}{ccc} 0 & 0 & 0 & A \\ I & 0 & 0 & C \\ 0 & I & 0 & D \\ 0 & 0 & I & B \end{array} \right). \][1]

訳:
自然な選択は，$n=64,96,128$ について，$T$ の行列形状を，ブロック形状
\[ T = \left( \begin{array}{cc} 0 & A \\ I & B \end{array} \right),\ \ \ \ \ \ A = \left( \begin{array}{ccc} 0 & 0 & A \\ I & 0 & C \\ 0 & I & B \end{array} \right),\ \ \ \ \ \ A = \left( \begin{array}{cccc} 0 & 0 & 0 & A \\ I & 0 & 0 & C \\ 0 & I & 0 & D \\ 0 & 0 & I & B \end{array} \right). \][1]
とすることである．

Then, for example, $(x, y, z)T = (y, z, x A + yC + z B)$, and we can seek $32\times 32$ matrices $A, B, C$ so the $32$-bit operations $xA, yC, zB$ are easy and $T$ has order $2^{96}−1$ in the group of $96\times 96$ nonsingular binary matrices. (I put the last column of blocks as $A, B$ or $A, C, B$ or $A, C, D, B$ because it turns out that there are full period choices, $2^{64}−1, 2^{96}−1, 2^{128}−1$, by just choosing $A=(I+L^a)(I+R^b)$ and $B=(I+R^c)$, the other blocks all the zero matrix.) Thus, for suitable choices of the triple $[a, b, c]$, these matrices all have the required orders, respectively, $2^{64}−1, 2^{96}−1, 2^{128}−1$: \[ \left( \begin{array}{cc} 0 & (I+L^a)(I+R^b) \\ I & (I+R^c) \end{array} \right),\ \ \left( \begin{array}{ccc} 0 & 0 & (I+L^a)(I+R^b) \\ I & 0 & 0 \\ 0 & I & (I+R^c) \end{array} \right),\ \ \left( \begin{array}{cccc} 0 & 0 & 0 & (I+L^a)(I+R^b) \\ I & 0 & 0 & 0 \\ 0 & I & 0 & 0 \\ 0 & 0 & I & (I+R^c) \end{array} \right). \] [1]

訳:
したがって，例えば，$(x, y, z)T = (y, z, x A + yC + z B)$ について，我々は，$32$-bit の計算 $xA, yC, zB$ が簡単に，そして，$96\times 96$ の正則バイナリ行列 $T$ として $2^{96}-1$ の位数を持つように，$32\times 32$ の行列 $A, B, C$ を探すことができる．(私 ([1]の著者) は，最後の列に $A, B$，$A, C, B$，または，$A, C, D, B$ のブロックを設置した．なぜなら，これらは，単に $A=(I+L^a)(I+R^b)$，$B=(I+R^c)$ と選択し，他の区画を全てゼロ埋めするだけで，位数 $2^{64}−1, 2^{96}−1, 2^{128}−1$ を満たすからである．) したがって，3 つの変数 $[a, b, c]$ の適切な選択によって，これらの行列は全て，$2^{64}−1, 2^{96}−1, 2^{128}−1$ の必要とされた次数をそれぞれ持つ．

Some (four) choices for the triple $[a, b, c]$ are: For $n = 64, [10, 13, 10], [8, 9, 22], [2, 7, 3], [23, 3, 24]$; for $n = 96, [10, 5, 26], [13, 19, 3], [1, 17, 2], [10, 1, 26]$; for $n = 128, [5, 14, 1], [15, 4, 21], [23, 24, 3], [5, 12, 29]$. In each case, the order of the block matrix $T$ is $2^n−1$, and the essences of C procedures for generating sequences of the required length are, with $x,y,z,w$ static unsigned longs and a temporary $t$:

t=(xˆ(x<<a)); x=y; return y=(yˆ(y>>c))ˆ(tˆ(t>>b));     (period 2 64 −1).
t=(xˆ(x<<a)); x=y; y=z; return z=(zˆ(z>>c))ˆ(tˆ(t>>b));     (period $2^96 −1$).
t=(xˆ(x<<a)); x=y; y=z; z=w; return w=(wˆ(w>>c))ˆ(tˆ(t>>b));     (period $2^128−1$).

[1]

訳:
3 つの変数 $[a, b, c]$ について，いくつかの (4つの) 選択肢は，$n = 64$ について $[10, 13, 10], [8, 9, 22], [2, 7, 3], [23, 3, 24]$，$n = 96$ について $[10, 5, 26], [13, 19, 3], [1, 17, 2], [10, 1, 26]$，$n = 128$ について $[5, 14, 1], [15, 4, 21], [23, 24, 3], [5, 12, 29]$ である．それぞれの状況において，ブロック行列 $T$ の位数は，$2^n-1$ である．そして，要求された長さの数値を生成する C 言語関数の要旨は，

t=(xˆ(x<<a)); x=y; return y=(yˆ(y>>c))ˆ(tˆ(t>>b));     (period $2^{64}−1$).
t=(xˆ(x<<a)); x=y; y=z; return z=(zˆ(z>>c))ˆ(tˆ(t>>b));     (period $2^{96}−1$).
t=(xˆ(x<<a)); x=y; y=z; z=w; return w=(wˆ(w>>c))ˆ(tˆ(t>>b));     (period $2^{128}−1$).

[1]
である．このとき，$x, y, z, w$ は static unsigned long で，$t$ は一時変数である．

ここでは，上記で提案された行列のパラメータを計算します．まず，$n=64$ 行列($32\times 32$ 行列を $4$ つ組み合わせたもの) について，
\[ T = \left( \begin{array}{cc} 0 & (I+L^a)(I+R^b) \\ I & (I+R^c) \end{array} \right) \] は，C 言語表記で，

t=(xˆ(x<<a)); x=y; return y=(yˆ(y>>c))ˆ(tˆ(t>>b)); (period $2^{64}−1$).

とされている．これをベクトル表記に戻すと， \begin{align*} & \ t=(xˆ(x<<a)); \ x=y; \ \ \ \ \ \ {\rm return} \ y=(yˆ(y>>c))ˆ(tˆ(t>>b)); \ \ \ \ (period 2^{64}−1).\\ \Leftrightarrow \ & \ t=(I+L^a)x, \ \ \ \ \ \ \ x=y_{k-1}, \ \ {\rm return} \ y_k=(I+R^c)y_{k-1}+(I+R^b)t \\ \Leftrightarrow \ & \ t=(I+L^a)y_{k-2}, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\rm return} \ y_k=(I+R^c)y_{k-1}+(I+R^b)t\\ \Leftrightarrow \ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\rm return} \ y_k=(I+R^c)y_{k-1}+(I+R^b)(I+L^a)y_{k-2} \\ \Leftrightarrow \ & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {\rm return} \ y_k=(I+R^c)y_{k-1}+\beta_{k-1} \end{align*} ($\because \beta_{k-1}\equiv(I+R^b)(I+L^a)y_{k-2}$)
となる．これは，行列表記で， \[ \left( \begin{array}{c} \beta_k\\ y_k \end{array} \right) = \left( \begin{array}{cc} 0 & (I+L^a)(I+R^b) \\ I & (I+R^c) \end{array} \right) \left( \begin{array}{c} \beta_{k-1}\\ y_{k-1} \end{array} \right) \] と表現され，$y_{k}$ はそのステップで出力 (return) される擬似乱数を表し，$\beta$ は内部変数を表す．位数が $2^n$ で初期値に戻る条件は，$k=2^n-1$ で行列 $T$ が単位乗列となる，すなわち， \begin{align*} \left( \begin{array}{c} \beta_{2^n-1}\\ y_{2^n-1} \end{array} \right) & = \left( \begin{array}{cc} 0 & (I+L^a)(I+R^b) \\ I & (I+R^c) \end{array} \right) \cdots \left( \begin{array}{cc} 0 & (I+L^a)(I+R^b) \\ I & (I+R^c) \end{array} \right) \left( \begin{array}{c} \beta_0\\ y_0 \end{array} \right)\\ & = \left( \begin{array}{cc} 0 & (I+L^a)(I+R^b) \\ I & (I+R^c) \end{array} \right)^{2^n-1} \left( \begin{array}{c} \beta_0\\ y_0 \end{array} \right)\\ & = T^{2^n-1} \left( \begin{array}{c} \beta_0\\ y_0 \end{array} \right)\\ & = I \left( \begin{array}{c} \beta_0\\ y_0 \end{array} \right) \end{align*} となることであるから，これまでと同様に， \[ T^{2^n-1} = I \] を満たすことである．これを満たすパラメータを計算すると，

実装

bool isFullPeriod_ZA_IB_64(std::vector<uint64>& divs64, uint N, uint a, uint b, uint c){
    sstd::bmat Z  = sstd::zeros(N, N);
    sstd::bmat I  = sstd::eye  (N, N);
    sstd::bmat La = sstd::LxShiftMat(N, a);
    sstd::bmat Rb = sstd::RxShiftMat(N, b);
    sstd::bmat Rc = sstd::RxShiftMat(N, c);
    sstd::bmat T = Z << (I + La)*(I + Rb) &&
                   I <<      (I + Rc);
//  sstd::printn(T);
    
    for(uint i=0; i<divs64.size()-1; i++){
        sstd::bmat Tp = T^(divs64[i]); // XORSHIFT
        if(Tp==I){ return false; }
    }
    sstd::bmat Tp = T^(divs64[divs64.size()-1]); // XORSHIFT
    if(Tp==I){ return true;
    }  else  { return false; }
}
void T_ZA_IB_64(std::vector<uint64>& divs64){
    uint N=32;
    
    uint num=0;
    printf("  a,  b,  c\n");
    for(uint a=0; a<N; a++){
        for(uint b=0; b<N; b++){
            for(uint c=0; c<N; c++){
                if(a<c && isFullPeriod_ZA_IB_64(divs64, N, a, b, c)){
                    printf("|%2u, %2u, %2u", a, b, c);
                    num++;
                    if(num%9==0){ printf("|\n"); }
                }
            }
        }
    }
}

int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

//  calculate divisors of 2^64-1.
//  std::vector<uint64> divs64 = sstd::divisor(18446744073709551616-1); // impossible in this way // below is calculated by "void cal_divsOf_2_64m1();".
//  std::vector<uint64> divs64 = {1ull, 3ull, 5ull, 15ull, 17ull, 51ull, 85ull, 255ull, 257ull, 641ull, 771ull, 1285ull, 1923ull, 3205ull, 3855ull, 4369ull, 9615ull, 10897ull, 13107ull, 21845ull, 32691ull, 54485ull, 65535ull, 65537ull, 163455ull, 164737ull, 196611ull, 327685ull, 494211ull, 823685ull, 983055ull, 1114129ull, 2471055ull, 2800529ull, 3342387ull, 5570645ull, 6700417ull, 8401587ull, 14002645ull, 16711935ull, 16843009ull, 20101251ull, 33502085ull, 42007935ull, 42009217ull, 50529027ull, 84215045ull, 100506255ull, 113907089ull, 126027651ull, 210046085ull, 252645135ull, 286331153ull, 341721267ull, 569535445ull, 630138255ull, 714156689ull, 858993459ull, 1431655765ull, 1708606335ull, 1722007169ull, 2142470067ull, 3570783445ull, 4294967295ull, 4294967297ull, 5166021507ull, 8610035845ull, 10712350335ull, 10796368769ull, 12884901891ull, 21474836485ull, 25830107535ull, 29274121873ull, 32389106307ull, 53981843845ull, 64424509455ull, 73014444049ull, 87822365619ull, 146370609365ull, 161945531535ull, 183538269073ull, 219043332147ull, 365072220245ull, 439111828095ull, 439125228929ull, 550614807219ull, 917691345365ull, 1095216660735ull, 1103806595329ull, 1317375686787ull, 2195626144645ull, 2753074036095ull, 3311419785987ull, 5519032976645ull, 6586878433935ull, 7465128891793ull, 16557098929935ull, 18764712120593ull, 22395386675379ull, 37325644458965ull, 56294136361779ull, 93823560602965ull, 111976933376895ull, 112855183834753ull, 281470681808895ull, 281479271743489ull, 338565551504259ull, 564275919173765ull, 844437815230467ull, 1407396358717445ull, 1692827757521295ull, 1918538125190801ull, 4222189076152335ull, 4785147619639313ull, 5755614375572403ull, 9592690625954005ull, 14355442858917939ull, 23925738098196565ull, 28778071877862015ull, 71777214294589695ull, 72340172838076673ull, 217020518514230019ull, 361700864190383365ull, 1085102592571150095ull, 1229782938247303441ull, 3689348814741910323ull, 6148914691236517205ull, 18446744073709551615ull}; // Below line is able to get same result.
    std::vector<uint64> divs64 = {1ull*5ull*17ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*1ull*17ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*5ull* 1ull*257ull*641ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*  1ull*641ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*257ull*  1ull*65537ull*6700417ull,
                                  3ull*5ull*17ull*257ull*641ull*    1ull*6700417ull,
                                  3ull*5ull*17ull*257ull*641ull*65537ull*      1ull,
                                  3ull*5ull*17ull*257ull*641ull*65537ull*6700417ull};
    
    printf("□ calculate T = (0 << A) && (I << B)\n");
    T_ZA_IB_64(divs64); printf("\n");
    
    printf("\n■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe 
■ measureTime_start---------------

□ calculate T = (0 << A) && (I << B)
  a,  b,  c
| 1,  2, 11| 2,  7,  3| 2,  7, 21| 3, 30,  5| 4,  3, 31| 5,  4, 23| 5,  9, 23| 5, 13, 21| 5, 27, 18|
| 5, 27, 21| 6,  1, 31| 6, 11,  8| 7,  3, 14| 7,  5, 20| 7,  6, 19| 7, 21,  8| 8,  9, 22| 8,  9, 23|
| 9, 11, 17|10,  5, 26|10,  5, 27|10,  7, 12|10,  7, 19|11,  3, 27|11, 21, 17|12,  1, 23|12, 21, 19|
|13,  1, 25|13,  1, 27|13,  6, 21|13, 19, 20|14, 13, 19|15,  1, 18|15,  3, 20|17,  1, 18|21,  4, 23|
|23,  3, 24|23,  9, 26

■ measureTime_stop----------------
--------------------------------
 Execution time:    46. 435 sec
--------------------------------

となることが分かる．(論文中の $[10, 13, 10]$ の組み合わせだけ，該当するパラメータが存在しなかった．これはどちらかの計算が間違っていることを示唆する．)

同様に $n=96$ について計算する．($n=64$ 以降は，計算途中で使用する一部の変数のサイズが，64-bit を超えるため，現行の CPU ではハードウェア的に計算することができない．そこで，GMP [8] を利用してこれを計算した．以降の計算を実行するためには，システムに GMP をインストールする必要がある．) 次のようなコードで計算を行うと，計算結果は，
実装

#define use_sstd_measureTime
#include <sstd/sstd.hpp>
#include <gmp.h>
#include <gmpxx.h> // for C++
#include "bmat_multi.hpp"

bool isFullPeriod(sstd::bmat& T, const sstd::bmat& I, std::vector<mpz_class>& divs){
    for(uint i=0; i<divs.size()-1; i++){
        sstd::bmat Tp = sstd::pow(T, divs[i]); // XORSHIFT multi
        if(Tp==I){ return false; }
    }
    sstd::bmat Tp = sstd::pow(T, divs[divs.size()-1]); // XORSHIFT multi
    if(Tp==I){ return true;
    }  else  { return false; }
}

bool isFullPeriod_96(std::vector<mpz_class>& divs, uint N, uint a, uint b, uint c){
    sstd::bmat Z  = sstd::zeros(N, N);
    sstd::bmat I  = sstd::eye  (N, N);
    sstd::bmat La = sstd::LxShiftMat(N, a);
    sstd::bmat Rb = sstd::RxShiftMat(N, b);
    sstd::bmat Rc = sstd::RxShiftMat(N, c);
    sstd::bmat T = Z << Z << (I + La)*(I + Rb) &&
                   I << Z <<         Z         &&
                   Z << I <<      (I + Rc)     ;
//  sstd::printn(T);
    
    return isFullPeriod(T, sstd::eye(96, 96), divs);
}
void matT_96(std::vector<mpz_class>& divs96){
    uint N=32;
    
    uint num=0;
    printf("  a,  b,  c\n");
    for(uint a=0; a<N; a++){
        for(uint b=0; b<N; b++){
            for(uint c=0; c<N; c++){
                if(a<c && isFullPeriod_96(divs96, N, a, b, c)){
//              if(isFullPeriod_96(divs, N, a, b, c)){
                    printf("|%2u, %2u, %2u", a, b, c);
                    num++;
                    if(num%9==0){ printf("|\n"); }
                }
            }
        }
    }
}

int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

    mpz_class forCast = 1;
    std::vector<mpz_class> divs96 = {forCast * 1*1*5*7*13*17*97*193*241*257*673*65537*22253377,
                                     forCast * 3*1*5*7*13*17*97*193*241*257*673*65537*22253377,
                                     forCast * 3*3*1*7*13*17*97*193*241*257*673*65537*22253377,
                                     forCast * 3*3*5*1*13*17*97*193*241*257*673*65537*22253377,
                                     forCast * 3*3*5*7* 1*17*97*193*241*257*673*65537*22253377,
                                     forCast * 3*3*5*7*13* 1*97*193*241*257*673*65537*22253377,
                                     forCast * 3*3*5*7*13*17* 1*193*241*257*673*65537*22253377,
                                     forCast * 3*3*5*7*13*17*97*  1*241*257*673*65537*22253377,
                                     forCast * 3*3*5*7*13*17*97*193*  1*257*673*65537*22253377,
                                     forCast * 3*3*5*7*13*17*97*193*241*  1*673*65537*22253377,
                                     forCast * 3*3*5*7*13*17*97*193*241*257*  1*65537*22253377,
                                     forCast * 3*3*5*7*13*17*97*193*241*257*673*    1*22253377,
                                     forCast * 3*3*5*7*13*17*97*193*241*257*673*65537*       1,
                                     
                                     forCast * 3*3*5*7*13*17*97*193*241*257*673*65537*22253377};
    
    printf("□ calculate T = Z << Z << (I + La)*(I + Rb)  &&  I << Z << Z  &&  Z << I << (I + Rc)\n");
    matT_96(divs96); printf("\n");

    printf("\n■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe 
■ measureTime_start---------------

□ calculate T = Z << Z << (I + La)*(I + Rb)  &&  I << Z << Z  &&  Z << I << (I + Rc)
  a,  b,  c
| 1,  5,  4| 1, 17,  2| 1, 29, 15| 5,  3,  6| 5,  9, 18| 5, 23, 11| 6,  3,  8| 8,  1, 21| 8,  7, 25|
| 9, 23, 29|10,  1, 26|10,  1, 27|10,  5, 26|10,  5, 27|10, 13, 16|11,  5, 18|11,  5, 30|11, 21, 27|
|12, 13, 17|13,  3, 23|13,  5, 25|13, 19, 24|14,  7, 19|14, 13, 27|22,  5, 23|23,  1, 24

■ measureTime_stop----------------
--------------------------------
 Execution time:   376. 272 sec
--------------------------------

となる．(このとき，パラメータ $[13, 19, 3]$ は存在しなかった．)

同様に $n=128$ について計算する．次のようなコードで計算を行うと，計算結果は，
実装

#define use_sstd_measureTime
#include <sstd/sstd.hpp>
#include <gmp.h>
#include <gmpxx.h> // for C++
#include "bmat_multi.hpp"

bool isFullPeriod(sstd::bmat& T, const sstd::bmat& I, std::vector<mpz_class>& divs){
    // This function is omitted, because of the above one is same.
}

bool isFullPeriod_128(std::vector<mpz_class>& divs, uint N, uint a, uint b, uint c){
    sstd::bmat Z  = sstd::zeros(N, N);
    sstd::bmat I  = sstd::eye  (N, N);
    sstd::bmat La = sstd::LxShiftMat(N, a);
    sstd::bmat Rb = sstd::RxShiftMat(N, b);
    sstd::bmat Rc = sstd::RxShiftMat(N, c);
    sstd::bmat T = Z << Z << Z << (I + La)*(I + Rb) &&
                   I << Z << Z <<         Z         &&
                   Z << I << Z <<         Z         &&
                   Z << Z << I <<      (I + Rc)     ;
//  sstd::printn(T);
    
    return isFullPeriod(T, sstd::eye(128, 128), divs);
}
void matT_128(std::vector<mpz_class>& divs){
    uint N=32;
    
    uint num=0;
    printf("  a,  b,  c\n");
    for(uint a=0; a<N; a++){
        for(uint b=0; b<N; b++){
            for(uint c=0; c<N; c++){
                if(a<c && isFullPeriod_128(divs, N, a, b, c)){
//              if(isFullPeriod_128(divs, N, a, b, c)){
                    printf("|%2u, %2u, %2u", a, b, c);
                    num++;
                    if(num%9==0){ printf("|\n"); }
                }
            }
        }
    }
}

int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

    mpz_class forCast = 1;
    std::vector<mpz_class> divs128 = {forCast * 1*5*17*257*641*65537*274177*6700417*67280421310721,
                                      forCast * 3*1*17*257*641*65537*274177*6700417*67280421310721,
                                      forCast * 3*5* 1*257*641*65537*274177*6700417*67280421310721,
                                      forCast * 3*5*17*  1*641*65537*274177*6700417*67280421310721,
                                      forCast * 3*5*17*257*  1*65537*274177*6700417*67280421310721,
                                      forCast * 3*5*17*257*641*    1*274177*6700417*67280421310721,
                                      forCast * 3*5*17*257*641*65537*     1*6700417*67280421310721,
                                      forCast * 3*5*17*257*641*65537*274177*      1*67280421310721,
                                      forCast * 3*5*17*257*641*65537*274177*6700417*             1,
                                      
                                      forCast * 3*5*17*257*641*65537*274177*6700417*67280421310721};
    
    printf("□ calculate T = Z << Z << Z << (I + La)*(I + Rb)  &&  I << Z << Z << Z  &&  Z << I << Z << Z  &&  Z << Z << I << (I + Rc)\n");
    matT_128(divs128); printf("\n");

    printf("\n■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe 
■ measureTime_start---------------

□ calculate T = Z << Z << Z << (I + La)*(I + Rb)  &&  I << Z << Z << Z  &&  Z << I << Z << Z  &&  Z << Z << I << (I + Rc)
  a,  b,  c
| 1,  3, 12| 1,  3, 15| 2,  1, 21| 2, 21,  6| 3,  2, 21| 4,  1,  5| 5, 12, 29| 6,  5, 17| 6, 11, 21|
| 6, 11, 25| 7, 11, 19| 7, 11, 20| 7, 12, 11| 8, 11, 14| 9, 13, 17|10, 11, 12|10, 11, 23|11,  5, 12|
|11,  5, 24|11,  5, 26|11,  8, 19|11, 10, 21|13,  3, 25|14,  3, 23|14, 13, 19|15,  4, 21|17,  7, 21|
|18, 13, 19|21,  2, 23|27,  5, 31|29,  3, 30

■ measureTime_stop----------------
--------------------------------
 Execution time:   874. 070 sec
--------------------------------

となる．(このとき，パラメータ $[5, 14, 1]$ および $[23, 24, 3]$ は存在しなかった．)

同様に $n=160$ について計算する．次のようなコードで計算を行うと，計算結果は，
実装

#define use_sstd_measureTime
#include <sstd/sstd.hpp>
#include <gmp.h>
#include <gmpxx.h> // for C++
#include "bmat_multi.hpp"

bool isFullPeriod_160(std::vector<mpz_class>& divs, uint N, uint a, uint b, uint c){
    sstd::bmat Z  = sstd::zeros(N, N);
    sstd::bmat I  = sstd::eye  (N, N);
    sstd::bmat La = sstd::LxShiftMat(N, a);
    sstd::bmat Rb = sstd::RxShiftMat(N, b);
    sstd::bmat Rc = sstd::RxShiftMat(N, c);
    sstd::bmat T = Z << Z << Z << Z << (I + La)*(I + Rb) &&
                   I << Z << Z << Z <<         Z         &&
                   Z << I << Z << Z <<         Z         &&
                   Z << Z << I << Z <<         Z         &&
                   Z << Z << Z << I <<      (I + Rc)     ;
//  sstd::printn(T);
    
    return isFullPeriod(T, sstd::eye(160, 160), divs);
}
void matT_160(std::vector<mpz_class>& divs){
    uint N=32;
    
    uint num=0;
    printf("  a,  b,  c\n");
    for(uint a=0; a<N; a++){
        for(uint b=0; b<N; b++){
            for(uint c=0; c<N; c++){
                if(a<c && isFullPeriod_160(divs, N, a, b, c)){
//              if(isFullPeriod_160(divs, N, a, b, c)){
                    printf("|%2u, %2u, %2u", a, b, c);
                    num++;
                    if(num%9==0){ printf("|\n"); }
                }
            }
        }
    }
}

int main(){
    printf("■ measureTime_start---------------\n\n"); time_m timem; sstd::measureTime_start(timem);

    mpz_class forCast = 1;
    std::vector<mpz_class> divs160 = {forCast * 1*5*5*11*17*31*41*257*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*1*1*11*17*31*41*257*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*1*11*17*31*41*257*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*5* 1*17*31*41*257*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*5*11* 1*31*41*257*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*5*11*17* 1*41*257*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*5*11*17*31* 1*257*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*5*11*17*31*41*  1*61681*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*5*11*17*31*41*257*    1*65537*414721*4278255361*44479210368001,
                                      forCast * 3*5*5*11*17*31*41*257*61681*    1*414721*4278255361*44479210368001,
                                      forCast * 3*5*5*11*17*31*41*257*61681*65537*     1*4278255361*44479210368001,
                                      forCast * 3*5*5*11*17*31*41*257*61681*65537*414721*         1*44479210368001,
                                      forCast * 3*5*5*11*17*31*41*257*61681*65537*414721*4278255361*             1,
                                      
                                      forCast * 3*5*5*11*17*31*41*257*61681*65537*414721*4278255361*44479210368001};

    printf("□ calculate n=160\n");
    matT_160(divs160); printf("\n");

    printf("\n■ measureTime_stop----------------\n"); sstd::measureTime_stop(timem);
    return 0;
}

実行結果

$ ./exe 
■ measureTime_start---------------

□ calculate n=160
  a,  b,  c
| 1,  1, 20| 1,  1, 21| 1,  3,  4| 1, 13,  7| 2,  1,  4| 2,  3,  7| 2,  5, 27| 3, 23, 13| 4,  9,  5|
| 5,  3, 27| 5,  3, 29| 6, 11, 20| 6, 11, 31| 7, 11, 20| 8,  9, 14|10,  5, 11|13, 15, 17|16,  5, 18|


■ measureTime_stop----------------
--------------------------------
 Execution time:  2675. 855 sec
--------------------------------

となる．(このとき，パラメータ $[7, 13, 6]$ は存在しなかった．)

以降，省略．

4 Summary

省略．

Summary

実際に Xorshift 擬似乱数生成器のハイパーパラメータを計算すると，論文と一致しないパラメータが多数出現した．一致するパラメータについては，ある程度信頼できると考えられると同時に，複数プログラムにより，妥当性を検証する必要がある．

Future work

■[2] において指摘されているように，

Google Chrome の JavaScript エンジン V8 で実際に採用されたアルゴリズムは xorshift を改良した xorshift+ の128ビット版 (xorshift128+)[2]

であるため，今後 xorshift128+ に関する論文 [9] を確認する．

■乱雑性試験を行う．

Acknowledgments.

本記事を投稿するにあたり，パラメータ計算の一部で $2^{128}-1$ および $2^{160}-1$ の素因数分解を求めることが必要であった．kivantium さんには，この計算に非常に手こずりっていたところを助けて頂いた．改めて感謝申し上げる．

WolframAlpha is your friend! https://t.co/hjJyCvwifl)
— きばん (@kivantium) 2018年5月31日

Source code

本投稿で使用したソースコードを下記に示す．

github.com/admiswalker/~/Xorshift_Ver00.03_00

Appendix

本投稿で使用したソースコードの説明については，上記を直接参照するか，[3] または，bmat の簡単な使い方について，doc/sstd/bmat を参照するとよい．

References

[1] Marsaglia, George. 2003. Xorshift RNGs. Journal of Statistical Software. 8, 14, 1--6.
[2] Google Chromeが採用した、擬似乱数生成アルゴリズム「xorshift」の数理
[3] バイナリ行列演算ライブラリの開発と Xorshift 疑似乱数生成器のパラメータ計算
[4] proofwiki.org - Prime Factors of 2^64 - 1
[5] A Primality Test
[6] 約数 - Wikipedia
[7] Marsaglia, George and Tsay, L. H., 1985, Matrices and the structure of random number sequences, Linear Algebra and its Applications, 67, 147–156.
[8] The GNU Multiple Precision Arithmetic Library - 2018年06月02日閲覧
[9] Vigna, S. (2017). Further scramblings of Marsaglia’s xorshift generators. Journal of Computational and Applied Mathematics, 315, 175-181. - 2018年06月02日閲覧

追記1. 本題からは外れるが，「Xorshift128+の32個の出力列が与えられた場合にその先を予測させる問題」に関する興味深い記事を発見したため，リンクする．- SMT Solver無しでXorshift128+を解く - 2018年06月20日閲覧
追記2. 元論文[1]の誤植及び計算ミスについては，次のリンクにおいても指摘されていることを確認した．良い乱数・悪い乱数 - 2018年09月08日閲覧．(更新履歴によると，少なくとも 2013 年には発見されていた問題．「良い乱数・悪い乱数で Xorshift の間違いを追加しました。(2013/5/1)」 - 2018年09月08日閲覧)

ADMIS_Walker のぶろぐ

2018年6月2日

論文の紹介と検証：Marsaglia, G. 2003. Xorshift RNGs.

0 件のコメント:

コメントを投稿