The Global Consciousness Project
350 Replies, 49273 Views
This post has been deleted.
Thanks.
I'll try to reply later, but will be a bit busy morning and early afternoon. In the meantime, if the problem you've been having with the thread is related to one I had before, this extra message may solve it.
This post has been deleted.
(2017-09-14, 04:33 PM)Max_B Wrote: but the authors of the paper - I keep mentioning - claim that it is theoretically impossible to prove that the output which has originated from such a noise-based RNG is truly random, because you can't ever know everything about what is hidden within the noise - i.e. the cause of the noise is not well understood. Having done some sound engineering I can back you up on this if I am understanding the conversation right. You can take a noise sample and use it to remove the noise across the piece you are working on but there is always some residue due to hidden frequencies. (2017-09-15, 12:53 AM)Max_B Wrote: Thanks... I'm not getting anywhere here fast... I've been reading older papers all night, it's now 1:47am here, and I'm no clearer to understanding the process, and I'm going to bed.Hey Max- Good for you for digging in and really trying to get to the details! I'll join you if I get a little extra time. Got my hands full next day or so.
Yes, thanks for taking the time to look at this material.
(2017-09-15, 12:53 AM)Max_B Wrote: The opening inference once again from Bancels latest 2017 paper you mentioned is that these noise-based devices are true random number generators... when it's not exactly clear that they are... see my previous comments. I think Bancel is just using that phrase to distinguish devices based on unpredictable physical processes from pseudo-random number generators that use deterministic algorithms. I don't think there's any implication that the "true" RNGs behave ideally - in the sense that the probability of producing a 0 is equal to the probability of producing a 1, or that each bit is statistically independent of the preceding bits. (2017-09-15, 12:53 AM)Max_B Wrote: I'm rather frustratingly back to square one again, without knowing the precise way they get from noise A), to raw-data B) to to some sort of a final number C) which is timestamped every second, and uploaded to the GCP. If I understand correctly, the process is as follows, in general terms: (1) The noise is used to generate a stream of bits. (2) An XOR mask is applied to remove bias. (3) For each second, the first 200 bits are extracted and added up (which if the RNGs were behaving ideally would produce a binomially distributed random variable with mean 100 and variance 50), and that is what initially goes into the database. (4) Periodically, the values in the database are renormalised based on the long-term measured variance for each device, to try to make the variance equal to the ideal value (I'm not sure this additional processing is necessarily a good thing. Maybe it would be better to keep the values produced by step (3), and to bear in mind when analysing them that the variance may depart slightly from the ideal value.) More specifically, mostly the devices are of two kinds, called Orion and Mindsong. In Orion, the bitstream is produced from two bitstreams generated by noise, by XORing them together. (If there is a small bias in each of the the two bitstreams, the bias in the combined bitstream is reduced to something on the order of the square of the individual biases.) The XOR mask is an alternating string of 0s and 1s (hence the reference you saw to flipping every other bit). In Mindsong, the XORing is built into the device, and uses a patented 560-bit mask. As I mentioned before, this doesn't work particularly well when applied to only 200-bits, because it produces a spurious increase in the variance of the sum of the 200 bits. Anyhow, it's really only the sums of 200 bits for each second that are important for the GCP analysis, not the individual bits. The non-ideal behaviour of the individual bits does affect the sums, because although the mean will be 100 thanks to the XORing, the frequency distribution of the values (including the value of the variance) will depart slightly from the ideal binomial distribution. And in principle values of the sums of 200 bits for successive seconds for the same device could be slightly correlated. But there is no reason for the sums of 200 bits for different devices to be correlated. (2017-09-15, 12:53 AM)Max_B Wrote: I don't really know what to make of it... is it really random, or is some signal embeded in the data... or is it just a statistical wet dream, and there isn't anything really there other than masses of data one can find all sorts of stuff in... That is a danger for the post hoc analyses, and it seems that Peter Bancel eventually concluded his findings of structure in the data were inconclusive for this reason. There was also some post hoc analysis of the data for 9-11-2001 which was similarly criticised. But it shouldn't be a problem for the formal series of pre-registered hypotheses. (2017-09-15, 06:49 PM)Chris Wrote: Yes, thanks for taking the time to look at this material.Chris- Super summary of how these things work. That's how I read it too. The only thing I'd add is that the typical failure mechanism is in the power supply and with component drift. Component drift is mostly dealt with by the XOR process because it typically causes bias (offset) from the mean. Until they fail outright that is. When a component fails or power supply goes off the rails the failure is very easy to see (they showed some plots where the failure is quite obvious) and the device is simply taken off-line until it can be replaced. There is a bit more manual monitoring and tweaking of the data streams than I would like to see, but it seem that this is the nature of the beast. (2017-09-15, 07:36 PM)jkmac Wrote: Chris- Ah, yes, between steps (3) and (4) I should also have mentioned that outlying values (45 or more away from the mean - or about 6.4 standard deviations) are removed, together with data considered to be "unstable" - amounting to 0.02% of otherwise valid data. The instability criteria are described in the section "Data vetting and normalisation" here: https://www.researchgate.net/publication...al_details
This post has been deleted.
(2017-09-15, 08:53 PM)Max_B Wrote: Sorry Chris, 1-4 that is pretty much what the paper says... but it means absolutely nothing to me, and doesn't make any sense... remember you are talking to a layman... the rest means nothing either. I'm interested in how bias from a noise-based device could get into the data analysis presented by GCP? I think I see the difficulty. What I was doing before with my example bit sequence was trying to demonstrate that, regardless of the input, after XORing with a balanced mask, the expected numbers of 0s and 1s would become equal. That is, if we average the numbers of 0s and 1s over all the possible positions of the mask, those two average numbers will be equal. That's not to say that for each of the possible positions of the mask individually, the numbers of 0s and 1s will be equal. They won't in general (and they weren't in the example I made up). So in the GCP data, for a particular bitstream produced by noise and a particular position of the mask relation to that bitstream, in general the numbers of 0s and 1s won't be equal. So in general when 200 bits are added up the answer won't be 100. But if we consider the average value of the sum of the 200 bits - that is, averaged over all the possible bitstreams and all the possible positions of the mask - then the numbers of 0s and 1s must come out equal. So the average value of the sum is 100, and the XORing has overcome the bias to produce the right average - the average which would be produced if the bitstreams were behaving ideally. (And the average would still be right no matter how badly behaved the input bitstream was. But if the bitstream was very badly behaved, other features of the frequency distribution, such as the variance, wouldn't be close to their ideal values.) [Edited to add: Your interpretation of the extracts from the database is correct. The first line has a sequence of reference numbers for the RNGs whose data are included in the file. Each of the other lines contains data for one second, and the values are the sums of 200 bits for the RNGs specified in the first line.] |
« Next Oldest | Next Newest »
|
Users browsing this thread: 1 Guest(s)