That there is no experimental evidence for psi, especially from meta-analyses [split from Rune Soup: Travelling ...]

sbu · sbu 2025-11-11, 03:41 PM

(2025-11-10, 09:36 AM)Max_B Wrote: If further studies don't replicate Bem's erotic image experiment, they are not direct replications, they are testing some other assumptions. The attempted replication study sbu linked to above was conducted online (not in a laboratory); participants were recruited by some online recruitment service with an average age of 50 years old (not Cornell undergraduates).

Bem's experiment went and grabbed modern erotic images off porn sites that appealed to young males. Cornell reports undergrads over 25 as comprising only 2% of their population, 98% are under 25 usually between ~18-22 y/o. Bem was very specific - he found the young males that did best were high stimulus seekers. Then you've got the experimenter effect, even Dr Carolyn Watt accepts that this effect exists - an online experiment excludes that effect.

sbu also linked to another failed replication some weeks ago, when I looked at that study... they ran the experiment at the same time with all the male and female subjects together in the same room; the recruitment population pool was different from Bems and age was much higher, they also couldn't use the highly erotic images Bem used (no explanation why).

Note that in this meta-analysis by Bem from 2016 https://pmc.ncbi.nlm.nih.gov/articles/PMC4706048/ his standard for an exact replication was simply using his original software:

Quote: For this last variable, each experiment was categorized into one of three categories: an exact replication of one of Bem’s experiments (31 experiments), a modified replication (38 experiments), or an independently designed experiment that assessed the ability to anticipate randomly-selected future events in some alternative way (11 experiments). To qualify as an exact replication, the experiment had to use Bem’s software without any procedural modifications other than translating on-screen instructions and stimulus words into a language other than English if needed.

So the argument that TPP wasn't a proper replication because of population or setting differences doesn't hold up by Bem's own criteria. TPP tested his core hypothesis, and did so under proper scientific conditions that his original work lacked.
What made TPP different: protocols were pre-registered to prevent cherry-picking results. Multiple independent labs ran identical tests to rule out lab-specific effects. Strict blinding prevented experimenter influence. All data, code, and deviations were made public.
These safeguards matter because Bem's original studies didn't have them. Different samples and contexts can affect results, but methodological weaknesses are the more likely explanation for positive findings. When you control for these issues like TPP did, the effects disappear.

~~David001~~ · ~~David001~~ 2025-11-11, 08:40 PM Banned

(2025-11-07, 09:40 PM)sbu Wrote: One can easily imagine such an experiment - but not an experiment showing an effect in the opposite direction when replicated by other scientists. The analogy to psi stops there.

OK my original comment was supposed to be a reference to Dean Radin's presentiment experiment - not Bem's work.

The presentiment effect seems, as I said, to be very hard to disprove, and has indeed been repeated both by Dean Radin himself and, crucially, by others, without a change of sign.

David

sbu · sbu 2025-11-11, 10:08 PM

(2025-11-11, 08:40 PM)David001 Wrote: OK my original comment was supposed to be a reference to Dean Radin's presentiment experiment - not Bem's work.

The presentiment effect seems, as I said, to be very hard to disprove, and has indeed been repeated both by Dean Radin himself and, crucially, by others, without a change of sign.

David

Okay, got it - Dean Radin has conducted an experiment showing a small but statistically significant effect, and the results have been replicated by others. Forgetting everything else in this thread, I ‘buy’ that. As I’ve mentioned elsewhere, I have little knowledge of (or interest in) experiments with such small effect sizes, and one really has to be a professional statistician to assess whether the statistical methods were applied correctly and in line with best practices. Thank you for bringing attention to his new book, but it’s not for me. My interests lie mainly in physics and the philosophy of science and mind, not in this kind of experimental work. That’s what I was trying to ask (though not very clearly) when I wondered whether his new book contained any truly groundbreaking ideas.

#14 · Max_B 2025-11-12, 10:40 AM Unregistered

(2025-11-11, 03:41 PM)sbu Wrote: Note that in this meta-analysis by Bem from 2016 https://pmc.ncbi.nlm.nih.gov/articles/PMC4706048/ his standard for an exact replication was simply using his original software:

So the argument that TPP wasn't a proper replication because of population or setting differences doesn't hold up by Bem's own criteria. TPP tested his core hypothesis, and did so under proper scientific conditions that his original work lacked.
What made TPP different: protocols were pre-registered to prevent cherry-picking results. Multiple independent labs ran identical tests to rule out lab-specific effects. Strict blinding prevented experimenter influence. All data, code, and deviations were made public.
These safeguards matter because Bem's original studies didn't have them. Different samples and contexts can affect results, but methodological weaknesses are the more likely explanation for positive findings. When you control for these issues like TPP did, the effects disappear.

I don't doubt TPP has methodological strengths (pre-registration, multi-lab design, blinding, and open data) over the original paper. These elements could address potential issues like p-hacking.

That said, your link to Bem's 2016 meta-analysis defines "exact replication" narrowly as using his original software, but in the paper Bem still separately analyses modified replications that alter key elements such as stimuli, population, or setting. By this framework, TPP qualifies as an exact replication (direct replication) in terms of software, but functions more as a conceptual replication, emphasizing generalisability over precise fidelity to the original conditions. TPPs null result (or slight anti-psi deviation) provides evidence against a robust precognition effect, but Bem's hypothesis was inherently context-specific: it emerged best under particular conditions involving high-arousal stimuli, a targeted participant demographic, and a specific testing environment. To illustrate some of the the divergences mentioned in my previous post, consider these key comparisons, drawn from your link to Bem's own 2016 meta-analytic observations:

Setting
Bem's original used in-lab, private booths with one-on-one interaction and an experimenter who left after the start. TPP was fully online/remote, without lab control or experimenter presence. Bem notes that online implementations often yield reduced effect sizes potentially due to diminished environmental consistency.

Population
Bem targeted Cornell undergrads (18–22 y/o, screened for high stimulus-seeking males). TPP drew from an average age of 49.2, with mixed online recruits (59.8% male, general public). Bem highlights that non-student or older samples tend to produce null results; his exact replications (primarily with young students) yielded g = 0.08 (p = .0018), while broader samples dilute the effect.

Stimuli
Bem used modern custom images from porn sites (highly explicit and arousing for the target demographic). TPP employed milder erotic images such as artistic nudes plus IAPS neutrals. Bem critiques less explicit stimuli (e.g., in Wagenmakers 2012). The erotic condition drove the largest original effect (d = 0.34 vs. near-zero for neutrals), suggesting arousal is a key factor.

Experimenter Effect
Bem included presence at the start (in his pro-psi lab environment). TPP had none, it was fully automated. Supporting studies (e.g., Dr Carolyn Watt) indicates that experimenter beliefs can influence outcomes; removing this element may alter results in a potentially fragile phenomenon.

Even then, your link to Bem's 2016 Meta-Analysis still showed pooled statistical significance in Bem's 31 software-based exact replications (conducted with young participants in lab settings). The modifications TPP made to Bem's experiment, align with the "modified" replications Bem set aside in his 2016 meta-analysis for their potential to obscure the most robust effect he found.

While some methodological limitations of Bem's original paper could bias towards positive findings, the unreplicated contextual elements in the TPP studies provides a parallel explanation for negative findings.

To resolve this definitively, a study combining Bem's precise setup with TPP's rigor would be ideal, along these sorts of lines:

1,000+ screened male undergrads (18–22 y/o, high sensation-seeking).
Private lab booths, with a pro-psi experimenter present at the start (then absent).
Highly Explicit arousing images (without substitution).
Pre-registered protocol, blinded analysis, and fully open data.

***Laird*** · ***Laird*** 2025-11-12, 08:41 PM Administrator

Hi Max, or should that be Grok, or Max+Grok?

I thought your post made some good points, but some of it needs a fact-check, clarification, or at least a query.

(2025-11-12, 10:40 AM)Max_B Wrote: That said, your link to Bem's 2016 meta-analysis defines "exact replication" narrowly as using his original software, but in the paper Bem still separately analyses modified replications that alter key elements such as stimuli, population, or setting.

I just want to note something here. Further on in the paper (page 12), "exact replication" is defined more specifically in the context of this quote (emphasis added): "By defining an exact replication in our meta-analysis as one that used Bem’s experimental instructions, software, and stimuli, we ensure that the experimental parameters and data analyses are all specified ahead of time."

Thus, using Bem's software is alone insufficient for a replication to qualify as "exact"; the stimuli themselves, plus experimental instructions, also need to be the same.

This is consistent with what you write, except that I'm not sure whether or not "population" and "setting" were part of Bem's instructions, and thus required for an "exact" replication, as you seem to imply.

I have been unable to locate those instructions though; they seem to have been supplied privately, so I can't check this.

In any case, that's a bit of a pedantic by-the-by. More important to clarify is this:

The transparent psi project (TPP) is a replication that was followed up by a replication by the metascientific project to which @sbu linked earlier in the thread (let's abbreviate it as MSP).

You refer in your post to the TPP but I think what you're actually referencing, based especially on some numbers you provide (see below), is the later MSP.

I'll assume, going forward, then, that where you write TPP you actually mean MSP.

Also: Bem's original Feeling the Future paper included nine experiments; both of these replications (TPP and MSP) only attempt to replicate the first one.

In any case, this...

(2025-11-12, 10:40 AM)Max_B Wrote: By this framework, TPP qualifies as an exact replication (direct replication) in terms of software, but functions more as a conceptual replication, emphasizing generalisability over precise fidelity to the original conditions.

...is incorrect. The MSP used the software of the TPP, which, it seems, was not Bem's original software (I'm open to correction/clarification on that if what seems to be is not the case).

However, the MSP paper does note that the TPP procedures on which its own procedures were based "were developed using a consensus-design process involving both sceptics as well as proponents of psi research, including Bem himself" (emphasis added).

Given, then, that Bem himself was involved in developing those procedures, it's probably not all that important whether or not the replication was "exact".

(2025-11-12, 10:40 AM)Max_B Wrote: Bem notes that online implementations often yield reduced effect sizes potentially due to diminished environmental consistency.

He does note this in his 2016 meta-analysis:

"For example, Galak et al. (2012) used their own software to conduct seven of their 11 modified replications in which 87% of the sessions (2,845 of 3,289 sessions) were conducted online, thereby bypassing the controlled conditions of the laboratory. These unsupervised sessions produced an overall effect size of -0.02. Because experiments in a meta-analysis are weighted by sample size, the huge N of these online experiments substantially lowers the mean effect size of the replications: When the online experiments are removed, the mean ES for this protocol rises
to 0.06 [0.00, 0.12]; z = 1.95, p = .05."

I'm not sure though if elsewhere he's noted more strongly as you seem here to say he has that online implementations "often" reduce effect sizes. I'd welcome a reference on that.

(2025-11-12, 10:40 AM)Max_B Wrote: Bem targeted Cornell undergrads (18–22 y/o, screened for high stimulus-seeking males).

No, he didn't screen for high stimulus-seeking males. His experiment number one (again, the only one that TPP and MSP attempt to replicate) was gender-balanced (literally 50-50) and didn't include nor exclude anyone based on stimulus seeking.

What he did do was a post hoc analysis of the correlation of stimulus seeking with performance. He didn't tie this analysis to gender though.

He found for that experiment number one that stimulus seeking was positively correlated with performance (at 0.18), but this sort of positive correlation was also found in only four of his other eight experiments.

Thus, your point here about the difference in populations between the original and the replication(s) holds only for average age and college attendance; gender and stimulus seeking are (presumably at least roughly) the same between both.

(2025-11-12, 10:40 AM)Max_B Wrote: TPP drew from an average age of 49.2, with mixed online recruits (59.8% male, general public).

This is why I think you're confusing TPP with MSP, because those values are correct for MSP, but not for TPP. For TPP, the values are: age range of 18–29 years (no average provided); 32.39% male and 67.52% female.

(2025-11-12, 10:40 AM)Max_B Wrote: Bem highlights that non-student or older samples tend to produce null results

He does? Where?

(2025-11-12, 10:40 AM)Max_B Wrote: his exact replications (primarily with young students)

How did you determine that the exact replications were primarily with young students?

(2025-11-12, 10:40 AM)Max_B Wrote: yielded g = 0.08 (p = .0018)

Those are indeed the correct figures for exact replications from his (et al.) 2016 meta-analysis. I don't though see anything in that meta-analysis that mentions age or student status, let alone in relation to exact replications.

(2025-11-12, 10:40 AM)Max_B Wrote: while broader samples dilute the effect.

Again, I don't see any support for this in that meta-analysis, but maybe he's said it elsewhere or I've missed it.

(2025-11-12, 10:40 AM)Max_B Wrote: Even then, your link to Bem's 2016 Meta-Analysis still showed pooled statistical significance in Bem's 31 software-based exact replications (conducted with young participants in lab settings). The modifications TPP made to Bem's experiment, align with the "modified" replications Bem set aside in his 2016 meta-analysis for their potential to obscure the most robust effect he found.

I'm not sure that I understand what all of this is supposed to mean, but from what I do understand, it seems to be at least misleading:

While the exact replications as a "pool" were indeed statistically significant, so were the modified replications, at a similar level, and I see nothing in the meta-analysis to indicate that Bem et al. "set aside" any modified replications. The only three experiments he excluded from the 93 he found were two that were "severely underpowered: the first had only one participant; the second had nine" and one deemed to be too exploratory because it "rested on several post-hoc analyses".

The rest of your post seems accurate.

***Laird*** · ***Laird*** 2025-11-20, 05:37 PM Administrator

This post is just to note @sbu's failure, having been given plenty of time, to acknowledge fair points and respond to fair questions that have been put to him in this thread:

(2025-11-07, 08:33 AM)Laird Wrote: Which particular meta-analysis (or meta-analyses) are you referring to? Be specific.

We've had no answer to this, even when followed up with:

(2025-11-08, 12:40 PM)Laird Wrote: I asked you to be specific about which meta-analysis or meta-analyses you were referring to. Don't dodge the question.

Additionally, if you are aware of some "heavy lifting", then share it with us.

We've also had no acknowledgement of this:

(2025-11-07, 08:33 AM)Laird Wrote: Multiple meta-analyses have been performed for multiple different types of psi. They certainly don't all combine data from highly heterogeneous studies. The Ganzfeld meta-analyses, for example, don't.

Beyond:

(2025-11-11, 10:08 PM)sbu Wrote: Okay, got it - Dean Radin has conducted an experiment showing a small but statistically significant effect, and the results have been replicated by others. Forgetting everything else in this thread, I ‘buy’ that.

If you had looked around, you would have found that there are plenty of other examples of psi experiments that consistently return significant positive results.

As you say, though, you...

(2025-11-11, 10:08 PM)sbu Wrote: have little knowledge of

...such psi experiments, so why not at least just read Meta-Analysis in Parapsychology on the Psi Encyclopedia, despite that you also say you have little interest in these experiments?

If not, then please do us the courtesy of holding your tongue in future.

You have also not yet acknowledged your hypocritical error that I pointed out to you here:

(2025-11-07, 08:33 AM)Laird Wrote: OK, let's go back there: as I pointed out to you by quoting from a referenced article, precognitive priming according to Etzel Cardeña is "[t]he only paradigm that seems to have had mostly lack of or at best mixed recent replications".

In other words: you are cherry-picking here, the exact deception you mistakenly accuse others of ("mistakenly" because it is inapplicable in that context; in demonstrating that at least one white crow exists, it is not deceptive or fallacious to select the best evidence for white crows).

Do you acknowledge this?