Yes, thanks for taking the time to look at this material.
(2017-09-15, 12:53 AM)Max_B Wrote: [ -> ]The opening inference once again from Bancels latest 2017 paper you mentioned is that these noise-based devices are true random number generators... when it's not exactly clear that they are... see my previous comments.
I think Bancel is just using that phrase to distinguish devices based on unpredictable physical processes from pseudo-random number generators that use deterministic algorithms. I don't think there's any implication that the "true" RNGs behave ideally - in the sense that the probability of producing a 0 is equal to the probability of producing a 1, or that each bit is statistically independent of the preceding bits.
(2017-09-15, 12:53 AM)Max_B Wrote: [ -> ]I'm rather frustratingly back to square one again, without knowing the precise way they get from noise A), to raw-data B) to to some sort of a final number C) which is timestamped every second, and uploaded to the GCP.
If I understand correctly, the process is as follows, in general terms:
(1) The noise is used to generate a stream of bits.
(2) An XOR mask is applied to remove bias.
(3) For each second, the first 200 bits are extracted and added up (which if the RNGs were behaving ideally would produce a binomially distributed random variable with mean 100 and variance 50), and that is what initially goes into the database.
(4) Periodically, the values in the database are renormalised based on the long-term measured variance for each device, to try to make the variance equal to the ideal value (I'm not sure this additional processing is necessarily a good thing. Maybe it would be better to keep the values produced by step (3), and to bear in mind when analysing them that the variance may depart slightly from the ideal value.)
More specifically, mostly the devices are of two kinds, called Orion and Mindsong.
In Orion, the bitstream is produced from two bitstreams generated by noise, by XORing them together. (If there is a small bias in each of the the two bitstreams, the bias in the combined bitstream is reduced to something on the order of the square of the individual biases.) The XOR mask is an alternating string of 0s and 1s (hence the reference you saw to flipping every other bit).
In Mindsong, the XORing is built into the device, and uses a patented 560-bit mask. As I mentioned before, this doesn't work particularly well when applied to only 200-bits, because it produces a spurious increase in the variance of the sum of the 200 bits.
Anyhow, it's really only the sums of 200 bits for each second that are important for the GCP analysis, not the individual bits. The non-ideal behaviour of the individual bits does affect the sums, because although the mean will be 100 thanks to the XORing, the frequency distribution of the values (including the value of the variance) will depart slightly from the ideal binomial distribution. And in principle values of the sums of 200 bits for successive seconds for the same device could be slightly correlated. But there is no reason for the sums of 200 bits for different devices to be correlated.
(2017-09-15, 12:53 AM)Max_B Wrote: [ -> ]I don't really know what to make of it... is it really random, or is some signal embeded in the data... or is it just a statistical wet dream, and there isn't anything really there other than masses of data one can find all sorts of stuff in...
That is a danger for the post hoc analyses, and it seems that Peter Bancel eventually concluded his findings of structure in the data were inconclusive for this reason. There was also some post hoc analysis of the data for 9-11-2001 which was similarly criticised. But it shouldn't be a problem for the formal series of pre-registered hypotheses.