The Global Consciousness Project

#102 · Chris 2017-09-15, 08:02 AM Unregistered

Thanks.

I'll try to reply later, but will be a bit busy morning and early afternoon.

In the meantime, if the problem you've been having with the thread is related to one I had before, this extra message may solve it.

Brian · Brian 2017-09-15, 08:22 AM

(2017-09-14, 04:33 PM)Max_B Wrote: but the authors of the paper - I keep mentioning - claim that it is theoretically impossible to prove that the output which has originated from such a noise-based RNG is truly random, because you can't ever know everything about what is hidden within the noise - i.e. the cause of the noise is not well understood.

Having done some sound engineering I can back you up on this if I am understanding the conversation right. You can take a noise sample and use it to remove the noise across the piece you are working on but there is always some residue due to hidden frequencies.

jkmac · jkmac 2017-09-15, 10:02 AM

(2017-09-15, 12:53 AM)Max_B Wrote: Thanks... I'm not getting anywhere here fast... I've been reading older papers all night, it's now 1:47am here, and I'm no clearer to understanding the process, and I'm going to bed.

The opening inference once again from Bancels latest 2017 paper you mentioned is that these noise-based devices are true random number generators... when it's not exactly clear that they are... see my previous comments.

I'm rather frustratingly back to square one again, without knowing the precise way they get from noise A), to raw-data B) to to some sort of a final number C) which is timestamped every second, and uploaded to the GCP.

It sort of seems like the experiment assumes true randomness is the case, but the encryption lectures I've been listing to tonight, all seem to mirror the comments I've read elsewhere, that there is no actual way to prove whether the RNG data is really random or not... all I've gathered you can do, is conduct as many tests as you can on the device to try and show that it's not random... the question almost seems a philosophical.

Some of the GCP communications don't seem very interested in the devices, other than they are true noise-based random number generators... but that still seems like a sticky point to me... one of the devices seems to use dual noise-data streams and XOR's both streams together, and then it's mention in a personal communication that every second bit is flipped... I don't really know what to make of it... is it really random, or is some signal embeded in the data... or is it just a statistical wet dream, and there isn't anything really there other than masses of data one can find all sorts of stuff in...

I have a sore head.

Hey Max-

Good for you for digging in and really trying to get to the details!

I'll join you if I get a little extra time. Got my hands full next day or so.

#106 · Chris 2017-09-15, 06:49 PM Unregistered

Yes, thanks for taking the time to look at this material.

(2017-09-15, 12:53 AM)Max_B Wrote: The opening inference once again from Bancels latest 2017 paper you mentioned is that these noise-based devices are true random number generators... when it's not exactly clear that they are... see my previous comments.

I think Bancel is just using that phrase to distinguish devices based on unpredictable physical processes from pseudo-random number generators that use deterministic algorithms. I don't think there's any implication that the "true" RNGs behave ideally - in the sense that the probability of producing a 0 is equal to the probability of producing a 1, or that each bit is statistically independent of the preceding bits.

(2017-09-15, 12:53 AM)Max_B Wrote: I'm rather frustratingly back to square one again, without knowing the precise way they get from noise A), to raw-data B) to to some sort of a final number C) which is timestamped every second, and uploaded to the GCP.

If I understand correctly, the process is as follows, in general terms:
(1) The noise is used to generate a stream of bits.
(2) An XOR mask is applied to remove bias.
(3) For each second, the first 200 bits are extracted and added up (which if the RNGs were behaving ideally would produce a binomially distributed random variable with mean 100 and variance 50), and that is what initially goes into the database.
(4) Periodically, the values in the database are renormalised based on the long-term measured variance for each device, to try to make the variance equal to the ideal value (I'm not sure this additional processing is necessarily a good thing. Maybe it would be better to keep the values produced by step (3), and to bear in mind when analysing them that the variance may depart slightly from the ideal value.)

More specifically, mostly the devices are of two kinds, called Orion and Mindsong.

In Orion, the bitstream is produced from two bitstreams generated by noise, by XORing them together. (If there is a small bias in each of the the two bitstreams, the bias in the combined bitstream is reduced to something on the order of the square of the individual biases.) The XOR mask is an alternating string of 0s and 1s (hence the reference you saw to flipping every other bit).

In Mindsong, the XORing is built into the device, and uses a patented 560-bit mask. As I mentioned before, this doesn't work particularly well when applied to only 200-bits, because it produces a spurious increase in the variance of the sum of the 200 bits.

Anyhow, it's really only the sums of 200 bits for each second that are important for the GCP analysis, not the individual bits. The non-ideal behaviour of the individual bits does affect the sums, because although the mean will be 100 thanks to the XORing, the frequency distribution of the values (including the value of the variance) will depart slightly from the ideal binomial distribution. And in principle values of the sums of 200 bits for successive seconds for the same device could be slightly correlated. But there is no reason for the sums of 200 bits for different devices to be correlated.

(2017-09-15, 12:53 AM)Max_B Wrote: I don't really know what to make of it... is it really random, or is some signal embeded in the data... or is it just a statistical wet dream, and there isn't anything really there other than masses of data one can find all sorts of stuff in...

That is a danger for the post hoc analyses, and it seems that Peter Bancel eventually concluded his findings of structure in the data were inconclusive for this reason. There was also some post hoc analysis of the data for 9-11-2001 which was similarly criticised. But it shouldn't be a problem for the formal series of pre-registered hypotheses.

jkmac · jkmac 2017-09-15, 07:36 PM

(2017-09-15, 06:49 PM)Chris Wrote: Yes, thanks for taking the time to look at this material.

I think Bancel is just using that phrase to distinguish devices based on unpredictable physical processes from pseudo-random number generators that use deterministic algorithms. I don't think there's any implication that the "true" RNGs behave ideally - in the sense that the probability of producing a 0 is equal to the probability of producing a 1, or that each bit is statistically independent of the preceding bits.

If I understand correctly, the process is as follows, in general terms:
(1) The noise is used to generate a stream of bits.
(2) An XOR mask is applied to remove bias.
(3) For each second, the first 200 bits are extracted and added up (which if the RNGs were behaving ideally would produce a binomially distributed random variable with mean 100 and variance 50), and that is what initially goes into the database.
(4) Periodically, the values in the database are renormalised based on the long-term measured variance for each device, to try to make the variance equal to the ideal value (I'm not sure this additional processing is necessarily a good thing. Maybe it would be better to keep the values produced by step (3), and to bear in mind when analysing them that the variance may depart slightly from the ideal value.)

More specifically, mostly the devices are of two kinds, called Orion and Mindsong.

In Orion, the bitstream is produced from two bitstreams generated by noise, by XORing them together. (If there is a small bias in each of the the two bitstreams, the bias in the combined bitstream is reduced to something on the order of the square of the individual biases.) The XOR mask is an alternating string of 0s and 1s (hence the reference you saw to flipping every other bit).

In Mindsong, the XORing is built into the device, and uses a patented 560-bit mask. As I mentioned before, this doesn't work particularly well when applied to only 200-bits, because it produces a spurious increase in the variance of the sum of the 200 bits.

Anyhow, it's really only the sums of 200 bits for each second that are important for the GCP analysis, not the individual bits. The non-ideal behaviour of the individual bits does affect the sums, because although the mean will be 100 thanks to the XORing, the frequency distribution of the values (including the value of the variance) will depart slightly from the ideal binomial distribution. And in principle values of the sums of 200 bits for successive seconds for the same device could be slightly correlated. But there is no reason for the sums of 200 bits for different devices to be correlated.

That is a danger for the post hoc analyses, and it seems that Peter Bancel eventually concluded his findings of structure in the data were inconclusive for this reason. There was also some post hoc analysis of the data for 9-11-2001 which was similarly criticised. But it shouldn't be a problem for the formal series of pre-registered hypotheses.

Chris-
Super summary of how these things work. That's how I read it too.

The only thing I'd add is that the typical failure mechanism is in the power supply and with component drift. Component drift is mostly dealt with by the XOR process because it typically causes bias (offset) from the mean. Until they fail outright that is.

When a component fails or power supply goes off the rails the failure is very easy to see (they showed some plots where the failure is quite obvious) and the device is simply taken off-line until it can be replaced.

There is a bit more manual monitoring and tweaking of the data streams than I would like to see, but it seem that this is the nature of the beast.

#108 · Chris 2017-09-15, 08:07 PM Unregistered

(2017-09-15, 07:36 PM)jkmac Wrote: Chris-
Super summary of how these things work. That's how I read it too.

The only thing I'd add is that the typical failure mechanism is in the power supply and with component drift. Component drift is mostly dealt with by the XOR process because it typically causes bias (offset) from the mean. Until they fail outright that is.

When a component fails or power supply goes off the rails the failure is very easy to see (they showed some plots where the failure is quite obvious) and the device is simply taken off-line until it can be replaced.

There is a bit more manual monitoring and tweaking of the data streams than I would like to see, but it seem that this is the nature of the beast.

Ah, yes, between steps (3) and (4) I should also have mentioned that outlying values (45 or more away from the mean - or about 6.4 standard deviations) are removed, together with data considered to be "unstable" - amounting to 0.02% of otherwise valid data. The instability criteria are described in the section "Data vetting and normalisation" here:
https://www.researchgate.net/publication...al_details

#110 · Chris 2017-09-15, 10:31 PM Unregistered

(2017-09-15, 08:53 PM)Max_B Wrote: Sorry Chris, 1-4 that is pretty much what the paper says... but it means absolutely nothing to me, and doesn't make any sense... remember you are talking to a layman... the rest means nothing either. I'm interested in how bias from a noise-based device could get into the data analysis presented by GCP?

As far as I can see, such a signal does get into the data... because they are certainly not adding 200 long chunks of XORed equally distributed bits (which would always give them sum of 100) and uploading that figure to the GCP database... because that would be pointless... we would simply get a database containing time stamped 100's every second from every RNG in the network. A database containing billions of the number: 100 I'm sure tells them nothing... so despite what you have previously said, this cannot be what they are doing.

Below are three truncated lines of raw Comma Separated Values (CSV) data taken from the GCP the database... what they contain is not clearly explained... the first line starts with some sort of field headers, the significance of the numbers to the right is unknown, but I suspect they are the reference number for each RNG in the GCP network.

The second and third lines clearly start with a timestamp, date and time corresponding to the headers in the line above, then a truncated stream of numbers which seem to be falling above and below the 100 figure, I assume these are a series of the summed 200 equally biased bit's (100 'zeros' and 100 '1s') that you referred to that would remove all bias from the RNG device.

But what is absolutely clear, is that these numbers cannot be the sum of 200 equally balanced 0's and 1's, because if they were, all we would see is 100 for each Comma Separated Value...

12,"gmtime","Date/Time",1,28,37,100,101,102,105,106,108,110,111,112,114,115,116,119,134,161,226,228,231,1004,1005,1021,1022,1025,1026,...

13,1102294861,2004-12-06 01:01:01,111,106,97,93,93,100,116,103,91,88,94,103,85,94,94,99,100,102,103,97,89,114,91,93,100,96,,100,89,103,...

13,1102294862,2004-12-06 01:01:02,95,105,127,106,94,105,100,100,96,99,88,98,101,107,95,103,106,101,105,102,96,95,94,99,101,107,88,100,...

So whatever you said before Chris, that these 200 bits taken from RNG are equally balanced 0,1's by XORing can't be right... I can see, and anybody looking can see, higher and lower deviations from the '100' we should expect...

So how do we get to these numbers above, which are taken from the GCP database. According to you that is not possible, because the sum of 200 XORed bits should always equal 100?

I think I see the difficulty.

What I was doing before with my example bit sequence was trying to demonstrate that, regardless of the input, after XORing with a balanced mask, the expected numbers of 0s and 1s would become equal. That is, if we average the numbers of 0s and 1s over all the possible positions of the mask, those two average numbers will be equal.

That's not to say that for each of the possible positions of the mask individually, the numbers of 0s and 1s will be equal. They won't in general (and they weren't in the example I made up). So in the GCP data, for a particular bitstream produced by noise and a particular position of the mask relation to that bitstream, in general the numbers of 0s and 1s won't be equal. So in general when 200 bits are added up the answer won't be 100.

But if we consider the average value of the sum of the 200 bits - that is, averaged over all the possible bitstreams and all the possible positions of the mask - then the numbers of 0s and 1s must come out equal. So the average value of the sum is 100, and the XORing has overcome the bias to produce the right average - the average which would be produced if the bitstreams were behaving ideally. (And the average would still be right no matter how badly behaved the input bitstream was. But if the bitstream was very badly behaved, other features of the frequency distribution, such as the variance, wouldn't be close to their ideal values.)

[Edited to add: Your interpretation of the extracts from the database is correct. The first line has a sequence of reference numbers for the RNGs whose data are included in the file. Each of the other lines contains data for one second, and the values are the sums of 200 bits for the RNGs specified in the first line.]

The Global Consciousness Project

Chris

Brian

jkmac

Chris

jkmac

Chris

Chris