Psience Quest

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

(2017-09-14, 04:33 PM)Max_B Wrote: [ -> ]XORing doesn't do that in practice, can't do it, not on a noise-based RNG source...

No matter what the source. It could be the Fibonacci sequence in binary, or a drawing of an elephant converted to digital form. If you apply a balanced XOR mask and average over all the possible positions of the mask relative to the bitstream, you will end up with equal numbers of 0s and 1s. It's just a matter of arithmetic.

(2017-09-14, 05:04 PM)Max_B Wrote: [ -> ]I'm sorry Chris, we have a fundamental disagreement as to whether noise-based RNG devices are true random number generators...

I'm not making a general statement about the characteristics of the devices. I'm making a specific statement that XORing with a balanced mask corrects exactly for an imbalance between the numbers of 0s and 1s in the bitstream.

For example, consider a series of 10 bits, like this:
1110101101 (containing 7 1s and 3 0s)
There are more 1s than 0s there, and it was generated by a human rather than a random number generator, so there are probably too many 10s and 01s in there.

And consider a simple alternating mask like this:
0101010101...

There are two possible positions for the mask relative to that series of 10 bits:
0101010101 and 1010101010

When they're applied to the series we get:
1011111000 (containing 4 0s and 6 1s) and 0100000111 (containing 6 0s and 4 1s)

Averaging over the two positions of the mask, the expected number of 0s and 1s is 5 of each. The bias of the input has been exactly cancelled out. And that is bound to happen provided there are equal numbers of 0s and 1s in the mask.

I am NOT saying that if there are correlations between successive bits of the input that those will be eliminated by the XORing. Just that the bias will be eliminated, so that when a string of the bits are added up (as they were in the GCP) the mean of their sum will have the correct value, despite the bias.

[I only hope I got the arithmetic right there!]

(2017-09-14, 07:07 PM)Max_B Wrote: [ -> ]OK, so you seem to accept that correlations amongst bits can't necessarily be removed by XORing the output from these noise-based devices... that's been confusing me.

But it's going to take a great deal of effort for me to look at the processes involved in how the GCP gathers it's data, in sufficient detail to actually understand their process, and go further than my claim that these noise-based devices have a bias problem, particularly when used as a measuring device...

However somethings is clearly not right in how I'm understanding things... Your Bancel quote says things, that seem - to me - to be at odds with what Koc & Stipcevic's RNG paper says (and you say?) with regards to the limitations of XORing the biased output from noise based RNG's...

Koc & Stipcevic say...

"...if the raw bits exhibit strong correlations, simple procedures [XORing or Von Neuman] may not be sufficient to eliminate correlations among bits which can even be enhanced by simple de-biasing procedures or changed from short range to long range ones."

Your Bancel quote says...

"Inherent device biases can give rise to spurious correlations among the RNGs and they should be eliminated if measured correlations are to be attributed to a GC effect. Accordingly, the devices employ an XOR operation on the bits, which is a standard procedure for removing biases in RNG bit streams."

So I'm confused, one author seems to suggest that bias within the bits from noise-based RNG's can be eliminated by XORing, the other says XORing is not sufficient to eliminate bias. What am I misunderstanding here...?

I need to at least get this straight before I can go further...

Yes, that is a bit confusing, because while "bias" is used in the same sense in both those quotations (unequal probabilities of 1s and 0s), "correlations" is referring to two different things. In the first, it refers to correlations between successive bits produced by the same device, but in the second it refers to correlations between the sums of 200 bits produced by different devices at the same time (that is, the essence of what was observed in the GCP).

So the first is saying that removing the bias, using simple methods such as XORing, may not eliminate correlations between successive bits (and may even enhance it).

But the second is concerned that if there is a similar bias in two different devices, then it may shift the mean of that sum of 200 bits away from the ideal value of 100 for both of them, and that would mimic a correlation between the two devices. Therefore the bias has to be removed by XORing, to prevent that happening.

Does that make sense?

(2017-09-14, 10:30 PM)Max_B Wrote: [ -> ]I get what your saying.

Although I don't yet understand the relevance to the overall GCP process, of using the mean of the sum of every 200 XORed bits (you say 100 x 0's and 100 x 1's), so that the mean must by it's very nature always add up to 100, (or the specific details of how that occurs, which might be important).

Is there a detailed explanation of the GCP theory, and the exact process they use anywhere?

There's an outline of the way the data from the RNGs are processed on the GCP website here:
http://noosphere.princeton.edu/gcpdata.html
and also an overview with more on the hypotheses here:
http://noosphere.princeton.edu/science2.html

For more detail on Peter Bancel's analysis probably the best thing is his most recent paper:
https://www.researchgate.net/publication...xploration
which has additional technical material here:
https://www.researchgate.net/publication...al_details

Thanks.

I'll try to reply later, but will be a bit busy morning and early afternoon.

In the meantime, if the problem you've been having with the thread is related to one I had before, this extra message may solve it.

(2017-09-14, 04:33 PM)Max_B Wrote: [ -> ] but the authors of the paper - I keep mentioning - claim that it is theoretically impossible to prove that the output which has originated from such a noise-based RNG is truly random, because you can't ever know everything about what is hidden within the noise - i.e. the cause of the noise is not well understood.

Having done some sound engineering I can back you up on this if I am understanding the conversation right. You can take a noise sample and use it to remove the noise across the piece you are working on but there is always some residue due to hidden frequencies.

(2017-09-15, 12:53 AM)Max_B Wrote: [ -> ]Thanks... I'm not getting anywhere here fast... I've been reading older papers all night, it's now 1:47am here, and I'm no clearer to understanding the process, and I'm going to bed.

The opening inference once again from Bancels latest 2017 paper you mentioned is that these noise-based devices are true random number generators... when it's not exactly clear that they are... see my previous comments.

I'm rather frustratingly back to square one again, without knowing the precise way they get from noise A), to raw-data B) to to some sort of a final number C) which is timestamped every second, and uploaded to the GCP.

It sort of seems like the experiment assumes true randomness is the case, but the encryption lectures I've been listing to tonight, all seem to mirror the comments I've read elsewhere, that there is no actual way to prove whether the RNG data is really random or not... all I've gathered you can do, is conduct as many tests as you can on the device to try and show that it's not random... the question almost seems a philosophical.

Some of the GCP communications don't seem very interested in the devices, other than they are true noise-based random number generators... but that still seems like a sticky point to me... one of the devices seems to use dual noise-data streams and XOR's both streams together, and then it's mention in a personal communication that every second bit is flipped... I don't really know what to make of it... is it really random, or is some signal embeded in the data... or is it just a statistical wet dream, and there isn't anything really there other than masses of data one can find all sorts of stuff in...

I have a sore head.

Hey Max-

Good for you for digging in and really trying to get to the details!

I'll join you if I get a little extra time. Got my hands full next day or so.

Yes, thanks for taking the time to look at this material.

(2017-09-15, 12:53 AM)Max_B Wrote: [ -> ]The opening inference once again from Bancels latest 2017 paper you mentioned is that these noise-based devices are true random number generators... when it's not exactly clear that they are... see my previous comments.

I think Bancel is just using that phrase to distinguish devices based on unpredictable physical processes from pseudo-random number generators that use deterministic algorithms. I don't think there's any implication that the "true" RNGs behave ideally - in the sense that the probability of producing a 0 is equal to the probability of producing a 1, or that each bit is statistically independent of the preceding bits.

(2017-09-15, 12:53 AM)Max_B Wrote: [ -> ]I'm rather frustratingly back to square one again, without knowing the precise way they get from noise A), to raw-data B) to to some sort of a final number C) which is timestamped every second, and uploaded to the GCP.

If I understand correctly, the process is as follows, in general terms:
(1) The noise is used to generate a stream of bits.
(2) An XOR mask is applied to remove bias.
(3) For each second, the first 200 bits are extracted and added up (which if the RNGs were behaving ideally would produce a binomially distributed random variable with mean 100 and variance 50), and that is what initially goes into the database.
(4) Periodically, the values in the database are renormalised based on the long-term measured variance for each device, to try to make the variance equal to the ideal value (I'm not sure this additional processing is necessarily a good thing. Maybe it would be better to keep the values produced by step (3), and to bear in mind when analysing them that the variance may depart slightly from the ideal value.)

More specifically, mostly the devices are of two kinds, called Orion and Mindsong.

In Orion, the bitstream is produced from two bitstreams generated by noise, by XORing them together. (If there is a small bias in each of the the two bitstreams, the bias in the combined bitstream is reduced to something on the order of the square of the individual biases.) The XOR mask is an alternating string of 0s and 1s (hence the reference you saw to flipping every other bit).

In Mindsong, the XORing is built into the device, and uses a patented 560-bit mask. As I mentioned before, this doesn't work particularly well when applied to only 200-bits, because it produces a spurious increase in the variance of the sum of the 200 bits.

Anyhow, it's really only the sums of 200 bits for each second that are important for the GCP analysis, not the individual bits. The non-ideal behaviour of the individual bits does affect the sums, because although the mean will be 100 thanks to the XORing, the frequency distribution of the values (including the value of the variance) will depart slightly from the ideal binomial distribution. And in principle values of the sums of 200 bits for successive seconds for the same device could be slightly correlated. But there is no reason for the sums of 200 bits for different devices to be correlated.

(2017-09-15, 12:53 AM)Max_B Wrote: [ -> ]I don't really know what to make of it... is it really random, or is some signal embeded in the data... or is it just a statistical wet dream, and there isn't anything really there other than masses of data one can find all sorts of stuff in...

That is a danger for the post hoc analyses, and it seems that Peter Bancel eventually concluded his findings of structure in the data were inconclusive for this reason. There was also some post hoc analysis of the data for 9-11-2001 which was similarly criticised. But it shouldn't be a problem for the formal series of pre-registered hypotheses.

(2017-09-15, 06:49 PM)Chris Wrote: [ -> ]Yes, thanks for taking the time to look at this material.

I think Bancel is just using that phrase to distinguish devices based on unpredictable physical processes from pseudo-random number generators that use deterministic algorithms. I don't think there's any implication that the "true" RNGs behave ideally - in the sense that the probability of producing a 0 is equal to the probability of producing a 1, or that each bit is statistically independent of the preceding bits.

If I understand correctly, the process is as follows, in general terms:
(1) The noise is used to generate a stream of bits.
(2) An XOR mask is applied to remove bias.
(3) For each second, the first 200 bits are extracted and added up (which if the RNGs were behaving ideally would produce a binomially distributed random variable with mean 100 and variance 50), and that is what initially goes into the database.
(4) Periodically, the values in the database are renormalised based on the long-term measured variance for each device, to try to make the variance equal to the ideal value (I'm not sure this additional processing is necessarily a good thing. Maybe it would be better to keep the values produced by step (3), and to bear in mind when analysing them that the variance may depart slightly from the ideal value.)

More specifically, mostly the devices are of two kinds, called Orion and Mindsong.

In Orion, the bitstream is produced from two bitstreams generated by noise, by XORing them together. (If there is a small bias in each of the the two bitstreams, the bias in the combined bitstream is reduced to something on the order of the square of the individual biases.) The XOR mask is an alternating string of 0s and 1s (hence the reference you saw to flipping every other bit).

In Mindsong, the XORing is built into the device, and uses a patented 560-bit mask. As I mentioned before, this doesn't work particularly well when applied to only 200-bits, because it produces a spurious increase in the variance of the sum of the 200 bits.

Anyhow, it's really only the sums of 200 bits for each second that are important for the GCP analysis, not the individual bits. The non-ideal behaviour of the individual bits does affect the sums, because although the mean will be 100 thanks to the XORing, the frequency distribution of the values (including the value of the variance) will depart slightly from the ideal binomial distribution. And in principle values of the sums of 200 bits for successive seconds for the same device could be slightly correlated. But there is no reason for the sums of 200 bits for different devices to be correlated.

That is a danger for the post hoc analyses, and it seems that Peter Bancel eventually concluded his findings of structure in the data were inconclusive for this reason. There was also some post hoc analysis of the data for 9-11-2001 which was similarly criticised. But it shouldn't be a problem for the formal series of pre-registered hypotheses.

Chris-
Super summary of how these things work. That's how I read it too.

The only thing I'd add is that the typical failure mechanism is in the power supply and with component drift. Component drift is mostly dealt with by the XOR process because it typically causes bias (offset) from the mean. Until they fail outright that is.

When a component fails or power supply goes off the rails the failure is very easy to see (they showed some plots where the failure is quite obvious) and the device is simply taken off-line until it can be replaced.

There is a bit more manual monitoring and tweaking of the data streams than I would like to see, but it seem that this is the nature of the beast.

(2017-09-15, 07:36 PM)jkmac Wrote: [ -> ]Chris-
Super summary of how these things work. That's how I read it too.

The only thing I'd add is that the typical failure mechanism is in the power supply and with component drift. Component drift is mostly dealt with by the XOR process because it typically causes bias (offset) from the mean. Until they fail outright that is.

When a component fails or power supply goes off the rails the failure is very easy to see (they showed some plots where the failure is quite obvious) and the device is simply taken off-line until it can be replaced.

There is a bit more manual monitoring and tweaking of the data streams than I would like to see, but it seem that this is the nature of the beast.

Ah, yes, between steps (3) and (4) I should also have mentioned that outlying values (45 or more away from the mean - or about 6.4 standard deviations) are removed, together with data considered to be "unstable" - amounting to 0.02% of otherwise valid data. The instability criteria are described in the section "Data vetting and normalisation" here:
https://www.researchgate.net/publication...al_details

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Chris

Chris

Chris

Chris

Chris

Brian

jkmac

Chris

jkmac

Chris