Chris
2017-09-15, 10:31 PM
(2017-09-15, 08:53 PM)Max_B Wrote: [ -> ]Sorry Chris, 1-4 that is pretty much what the paper says... but it means absolutely nothing to me, and doesn't make any sense... remember you are talking to a layman... the rest means nothing either. I'm interested in how bias from a noise-based device could get into the data analysis presented by GCP?
As far as I can see, such a signal does get into the data... because they are certainly not adding 200 long chunks of XORed equally distributed bits (which would always give them sum of 100) and uploading that figure to the GCP database... because that would be pointless... we would simply get a database containing time stamped 100's every second from every RNG in the network. A database containing billions of the number: 100 I'm sure tells them nothing... so despite what you have previously said, this cannot be what they are doing.
Below are three truncated lines of raw Comma Separated Values (CSV) data taken from the GCP the database... what they contain is not clearly explained... the first line starts with some sort of field headers, the significance of the numbers to the right is unknown, but I suspect they are the reference number for each RNG in the GCP network.
The second and third lines clearly start with a timestamp, date and time corresponding to the headers in the line above, then a truncated stream of numbers which seem to be falling above and below the 100 figure, I assume these are a series of the summed 200 equally biased bit's (100 'zeros' and 100 '1s') that you referred to that would remove all bias from the RNG device.
But what is absolutely clear, is that these numbers cannot be the sum of 200 equally balanced 0's and 1's, because if they were, all we would see is 100 for each Comma Separated Value...
12,"gmtime","Date/Time",1,28,37,100,101,102,105,106,108,110,111,112,114,115,116,119,134,161,226,228,231,1004,1005,1021,1022,1025,1026,...
13,1102294861,2004-12-06 01:01:01,111,106,97,93,93,100,116,103,91,88,94,103,85,94,94,99,100,102,103,97,89,114,91,93,100,96,,100,89,103,...
13,1102294862,2004-12-06 01:01:02,95,105,127,106,94,105,100,100,96,99,88,98,101,107,95,103,106,101,105,102,96,95,94,99,101,107,88,100,...
So whatever you said before Chris, that these 200 bits taken from RNG are equally balanced 0,1's by XORing can't be right... I can see, and anybody looking can see, higher and lower deviations from the '100' we should expect...
So how do we get to these numbers above, which are taken from the GCP database. According to you that is not possible, because the sum of 200 XORed bits should always equal 100?
I think I see the difficulty.
What I was doing before with my example bit sequence was trying to demonstrate that, regardless of the input, after XORing with a balanced mask, the expected numbers of 0s and 1s would become equal. That is, if we average the numbers of 0s and 1s over all the possible positions of the mask, those two average numbers will be equal.
That's not to say that for each of the possible positions of the mask individually, the numbers of 0s and 1s will be equal. They won't in general (and they weren't in the example I made up). So in the GCP data, for a particular bitstream produced by noise and a particular position of the mask relation to that bitstream, in general the numbers of 0s and 1s won't be equal. So in general when 200 bits are added up the answer won't be 100.
But if we consider the average value of the sum of the 200 bits - that is, averaged over all the possible bitstreams and all the possible positions of the mask - then the numbers of 0s and 1s must come out equal. So the average value of the sum is 100, and the XORing has overcome the bias to produce the right average - the average which would be produced if the bitstreams were behaving ideally. (And the average would still be right no matter how badly behaved the input bitstream was. But if the bitstream was very badly behaved, other features of the frequency distribution, such as the variance, wouldn't be close to their ideal values.)
[Edited to add: Your interpretation of the extracts from the database is correct. The first line has a sequence of reference numbers for the RNGs whose data are included in the file. Each of the other lines contains data for one second, and the values are the sums of 200 bits for the RNGs specified in the first line.]