Louie Savva's thesis

18 Replies, 2667 Views

Following on from the posts on another thread, I thought Louie Savva's thesis deserved a thread of its own. It's available on the semantic scholar website:
https://pdfs.semanticscholar.org/41be/a5...e6d96e.pdf

From internal evidence, the thesis seems to have been completed in 2014, though the work was done much earlier, under the supervision of Chris French at Goldsmiths. After that, Savva spent a period as a postdoctoral researcher, but then he left the field and became an outspoken critic of parapsychology.

Of the thesis work, he wrote in 2006:
"I tested over five hundred participants in some eleven empirical tests, none of which provide evidence for any kind of paranormal functioning."
http://www.internationalskeptics.com/for...hp?t=70319

The same year he wrote:
"I was finally motivated to finish it [the thesis], in the hope that I might dissuade others who are interested in the question of the paranormal, from pursuing it any further."
https://web.archive.org/web/201902241431...ology.html

I have been looking at it, and I'm still trying to get to grips with parts of it. I plan to post some thoughts on this thread.

In summary, I count ten tests, as follows:

(A) Retrocausal Influences in Electrodermal Activity
(1) Presentiment I
(2) Presentiment II

(B) Retrocausal Influences in Stroop-Based Tasks
(1) Time-Reversed Interference I
(2) Time-Reversed Interference II
(3) Time-Reversed Interference III

(C) Testing Psi-Mediated Timing with a Simple Computer-Based Task
(1) Psi-Timing I
(2) Psi-Timing II
(3) Psi-Timing III

(D) An Indirect Test of an Exceptional Precognitive Claim [David Mandell's pictures]

(E) Insect Death-Avoidance
[-] The following 2 users Like Guest's post:
  • Laird, Ninshub
To deal with this one first - (D) An Indirect Test of an Exceptional Precognitive Claim [David Mandell's pictures]

As discussed on the other thread, this test consisted of judges comparing a selection of David Mandell's purportedly precognitive pictures with (1) the events Mandell believed they referred to and (2) a set of alternative events chosen by Savva, and rating the similarities.

The result was that several of the Mandell events were found to resemble the pictures significantly more closely than the alternative events (and there were no examples where the alternatives were significantly closer than the Mandell pictures). But that doesn't allow us to infer that the pictures were necessarily precognitive. Savva later acknowledged that they were unable to test Mandell's claims empirically:
http://www.internationalskeptics.com/for...stcount=28

So in this case it would more accurate to say that the test produced statistically significant results, but was intrinsically incapable of providing evidence of precognition.
[-] The following 4 users Like Guest's post:
  • Ninshub, Laird, Sciborg_S_Patel, Obiwan
Moving on to (A) Retrocausal Influences in Electrodermal Activity.

In the first study, Presentiment I, participants were presented with five pictures at regular intervals of 7 seconds, including a picture of a spider at a randomly chosen position in the sequence. The other pictures were neutral. Of the 60 participants, 32 were assessed to be afraid of spiders, and 28 not.

Average electrodermal activity was calculated during the period 2.5-0.5 seconds before the appearance of the spider. A statistical test was performed to see if the average activity was significantly higher for those who were afraid of spiders.

The conclusion was that it wasn't significantly higher. Instead it was lower. In fact it was so much lower that it would have been significant with p=0.028 under a one-tailed test in the direction opposite to that expected.

Whatever (if anything) that means, it doesn't seem to be anything to do with a reaction to the spider picture. The variability of electrodermal activity between participants was on the order of 5-10 microsiemens. Strangely, the reaction to the spider pictures (and the other pictures) wasn't looked at, but when they did later do this it was found to be much smaller - an increase of only about 0.5 microsiemens. This meant that the experiment stood virtually no chance of detecting even the normal reaction to the spider picture, let alone any presumably weaker presentiment effect.

Incidentally, it seems to me it was a mistake to have the pictures appearing at regular intervals, because this would allow the participants to anticipate that some kind of picture was about to appear, and there might well be a change in electrodermal activity just before it did. Perhaps for some reason this behaviour might differ for the kind of people who were afraid of spiders. One could get round this by comparing the measurements just before spider pictures and neutral pictures (which apparently they didn't do, maybe because the result was in the opposite direction to that expected). But why not make the length of the gap between pictures random rather than fixed, so that the timing couldn't be anticipated?

(Edited to correct the period over which the measurements were averaged.)
[-] The following 3 users Like Guest's post:
  • Ninshub, Laird, Obiwan
If ersby happens to see this - I read through some of the International Skeptics' thread on Savva linked to here, and I notice you were the one who began that thread and contributed to it throughout. With over ten years gone since then, any change in your perspective?
In the next study, Presentiment II, Savva carried out the same experiment, but made the statistical test more powerful. Instead of averaging over all participants the absolute value of electrodermal activity during the chosen period, he averaged the amount by which it changed relative to the starting point of the chosen period. That is, the measurement at the start of the period was treated as zero. This drastically reduced the variation between participants. This was done for a pre-stimulus period between 2.75 and 1.5 seconds before the appearance of the spider, and also for a post-stimulus period between 2.25 and 4.5 seconds after its appearance (the signal was again effectively zeroed at the start of the post-stimulus period).

The variability of these averages between participants now dropped to 0.3-0.9 microsiemens. But unfortunately this wasn't quite enough to achieve a significant difference between arachnophobes and non-arachnophobes for the ordinary post-stimulus effect of the spider picture. That's despite the effect appearing quite clear to the eye in the plot of the average response for each group as a function of time (the plot is on page 74 of the thesis).

Surprisingly given this, the pre-stimulus test approached significance more nearly, with a p value of 0.057 (note that in the text Savva mistakenly attributed this p value to the post-stimulus effect). That's because although the pre-stimulus difference is smaller than the post-stimulus difference, and despite the fact that it's averaged over a shorter period, the variability between participants is much smaller before the image is displayed.

On the face of it, that would be an interesting "near miss" (Savva commented on the pre-stimulus p value that "it could be argued that there is the suggestion of significance," though he apparently thought it related to the post-stimulus period). However, there is potentially a big problem with the analysis, because it's not indicated how the pre- and post-stimulus periods were chosen. In particular, it's not stated that they were fixed before the data were examined (the pre-stimulus period was different from that used in the previous study). I find it difficult to believe they were fixed without reference to the data, because the start of the pre-stimulus analysis period coincides almost exactly with the minimum of the arachnophobe signal over the whole interval recorded, and the start of the post-stimulus analysis period conicides exactly with the start of the main steep rise of the arachnophobe signal after the picture was shown. If the analysis periods were chosen by looking at the data and selecting periods where the data tended to rise, the statistical tests would be meaningless.
[-] The following 3 users Like Guest's post:
  • Ninshub, Laird, Obiwan
(2019-08-04, 11:47 PM)Will Wrote: If ersby happens to see this - I read through some of the International Skeptics' thread on Savva linked to here, and I notice you were the one who began that thread and contributed to it throughout. With over ten years gone since then, any change in your perspective?

Golly, that's a lot of nostalgia seeing all those names again.

Do you mean a change in perspective regarding the file drawer in the ganzfeld database? Not really. In all my time researching the subject for another eight years after that thread, I only found one other ganzfeld experiment that had been started (at least, it was funded by the SPR) but then vanished completely. There may be dozens of unfinished unsuccessful ganzfeld experiments out there, but I'd need more evidence. So I still think that the file-drawer can't explain the deviation from chance seen in ganzfeld meta-analyses.

On the other hand, my points about those meta-analyses on page six still stand. None of the ganzfeld meta-analyses have a coherent inclusion criteria across the whole database. I don't think that's an explanation either, but it is a flaw.
[-] The following 2 users Like ersby's post:
  • Ninshub, Will
(2019-08-05, 04:19 PM)ersby Wrote: Golly, that's a lot of nostalgia seeing all those names again.

Do you mean a change in perspective regarding the file drawer in the ganzfeld database? Not really. In all my time researching the subject for another eight years after that thread, I only found one other ganzfeld experiment that had been started (at least, it was funded by the SPR) but then vanished completely. There may be dozens of unfinished unsuccessful ganzfeld experiments out there, but I'd need more evidence. So I still think that the file-drawer can't explain the deviation from chance seen in ganzfeld meta-analyses.

On the other hand, my points about those meta-analyses on page six still stand. None of the ganzfeld meta-analyses have a coherent inclusion criteria across the whole database. I don't think that's an explanation either, but it is a flaw.
I suppose I was asking more about your overall assessment of Savva's given reasons for turning on the field, but thank you for this answer as well.


I notice that, on page 6, Savva refers to a blog post dealing with an e-mail exchange he had with Rupert Sheldrake, but the blog has been wiped. That's a shame; I'd like to know what that was about, and about his posts on Dean Radin.
(2019-08-06, 02:44 AM)Will Wrote: I suppose I was asking more about your overall assessment of Savva's given reasons for turning on the field, but thank you for this answer as well.


I notice that, on page 6, Savva refers to a blog post dealing with an e-mail exchange he had with Rupert Sheldrake, but the blog has been wiped. That's a shame; I'd like to know what that was about, and about his posts on Dean Radin.

"Sheldrake Vs Savva" is still available, together with other blog posts, at the Internet Archive:
https://web.archive.org/web/201508280108...savva.html
[-] The following 1 user Likes Guest's post:
  • Will
(2019-08-06, 08:16 AM)Chris Wrote: "Sheldrake Vs Savva" is still available, together with other blog posts, at the Internet Archive:
https://web.archive.org/web/201508280108...savva.html
Thanks. From the way Savva characterized it in the IS thread, I was expecting either something more combative, or more technical about the merits of telephone telepathy experiments.
The next set of studies - Retrocausal Influences in Stroop-Based Tasks - is the most complicated and also one of the most problematical. Something seems to have gone wrong with the statistical analysis, but after a couple of days trying to figure it out I'm still not sure exactly what.

A Stroop task, devised in the 1930s, has to do with the speed at which people can identify a colour from either a solid block of that colour, or the printed name of the colour, or the the name of a different colour printed in ink of the target colour. In the 1980s, Klintman looked at pairs of these tasks - first a solid block of colour, then the name of the colour written in white on a black background - and found as expected that some people were able to do the second task more quickly if the target colour was the same as in the first task (in which case the pair of tasks was described as "congruent"). Unexpectedly, he also apparently found that the congruence of the pair influenced the speed of the first task too. As the participant can't know at the time of the first task whether the pair will be congruent, he interpreted this as evidence of precognition.

The first of Savva's studies, Time-Reversed Interference I, adopted a modified version of Klintman's protocol. Each participant performed a series of 36 trials, each consisting of a pair of tasks, in which the pairs were randomly chosen to be congruent or incongruent with equal probability. Following Klintman, Savva then divided the participants into two groups. In one, known as facilitators, the second task was performed more quickly when the pair was congruent. In the other, known as inhibitors, the opposite was true. It turned out that there were 24 facilitators and 16 inhibitors.

Before the data were analysed, the reaction times for each participant were normalised so that their mean was zero and their standard deviation was 1.

On Klintman's hypothesis, the facilitators would also tend to perform faster on the first task when the pair was congruent and conversely the inhibitors would tend to perform more slowly. The relevant statistic given in the thesis, obtained from an analysis of variance, was not significant, though it came quite close (p=0.07).

That result, like the one from Presentiment II mentioned above, might have been viewed as suggestive. But as Savva explains, another worker, Camfferman, soon after Klintman's work, had suggested a mechanism by which in the circumstances of Savva's experiment a periodical variation in the participant's alertness during the experiment could affect the reaction time for the first task in this way, without the need for precognition. A variation in alertness would cause the two reaction times in each pair to be correlated, and participants whose periods of alertness coincided with congruent trials would tend to be classified as facilitators. Because of the correlation, the first reaction times would also be lower for the congruent pairs.

The second study, Time-Reversed Interference II, did away with the division of participants into facilitators and inhibitors (for some reason Savva found only two inhibitors out of 50 among the second set of participants). Instead, they just did runs of 20 trials each and the relationship between reaction times and congruency was examined.

In this study, the timings for the second task were found to behave as expected (for facilitators), but for the first task the opposite effect was found. The participants did it more quickly when the trial was incongruent. As the result was in the opposite direction to that expected, it was classed as non-significant. But the statistic Savva calculated, t(49)= 2.15, would have been significant with p= 0.018 for a one-tailed test in the opposite direction.

The third and final study, Time-Reversed Interference III, was more complicated. The participants were again divided into two groups, this time arachnophobes and non-arachnophobes (20 and 34 respectively). (Edit: In this study, none of the participants was found to be an inhibitor.) The procedure was modified so that instead of a solid block of colour for the first task, a series of coloured zeroes was displayed, and instead of the name of a colour written in white for the second, there was a word written in a target colour. The word was either neutral or spider-related, with equal probabiity. Again the colours were either congruent or incongruent, with equal probability. Each participant performed 80 trials, each of a pair of tasks.

For this study, Savva did seven statistical tests for each of the two reaction times using analysis of variance. For the first task, he found no significant results (except for one which was strongly in the opposite direction to that expected - his statistic would have given p=0.02 for a one-tailed test in the opposite direction).

Unfortunately there are some anomalies in the results, which suggest there may be problems with Savva's calculated statistics.
[-] The following 2 users Like Guest's post:
  • Ninshub, Laird

  • View a Printable Version
Forum Jump:


Users browsing this thread: 1 Guest(s)