Statistical Significance

Kamarling · Kamarling 2017-08-16, 11:30 PM

(2017-08-16, 11:14 PM)Chris Wrote: No, I'm not saying that if something could have happened by chance then it must have happened by chance, or anything like that. I'm just saying that statistics helps us to judge the likelihood that it happened by chance.

Ok - sorry if I misunderstood, Chris.

Kamarling · Kamarling 2017-08-16, 11:33 PM

(2017-08-16, 11:08 PM)malf Wrote: Here is the problem as "the skeptics" see it... In more or less plain English. Not too much maths:

http://theness.com/neurologicablog/index...ical-sins/

And we are back to what was being discussed in another thread - the problem of repeatability when it comes to psi.

Steven Novella Wrote:The Fix

The problems with overusing the P-value and P-hacking are all fixable. As I discussed in my previous post on the Simmons paper, one important fix is exact replications. Exact replication remove all the degrees of freedom.

#13 · Chris 2017-08-17, 08:04 AM Unregistered

(2017-08-16, 11:08 PM)malf Wrote: Here is the problem as "the skeptics" see it... In more or less plain English. Not too much maths:

http://theness.com/neurologicablog/index...ical-sins/

To be fair, with regard to pre-registration, it's not really something the sceptics are pushing and parapsychologists as a whole are arguing against. I think there's broad agreement in the field that it's a good thing, though getting everyone to do it is another thing.

"P hacking" has become a bit of a sceptical mantra, to the extent that some sceptics will dismiss any work that isn't pre-registered, without bothering to consider whether there was really scope for multiple hypotheses, or multiple statistical tests of the hypothesis.

***Laird*** · ***Laird*** 2017-08-17, 10:13 PM Administrator

(2017-08-17, 08:04 AM)Chris Wrote: "P hacking" has become a bit of a sceptical mantra, to the extent that some sceptics will dismiss any work that isn't pre-registered, without bothering to consider whether there was really scope for multiple hypotheses, or multiple statistical tests of the hypothesis.

Reminiscent of the Skeptiko thread you started on Daryl Bem's work, where (IMO) several "skeptics" unreasonably postulated multiple degrees of freedom.

#15 · Chris 2017-08-17, 10:25 PM Unregistered

(2017-08-17, 10:13 PM)Laird Wrote:
(2017-08-17, 08:04 AM)Chris Wrote: "P hacking" has become a bit of a sceptical mantra, to the extent that some sceptics will dismiss any work that isn't pre-registered, without bothering to consider whether there was really scope for multiple hypotheses, or multiple statistical tests of the hypothesis.

Reminiscent of the Skeptiko thread you started on Daryl Bem's work, where (IMO) several "skeptics" unreasonably postulated multiple degrees of freedom.

Yes, the criticism of Bem's work (not just on that Skeptiko thread) was the main example I was thinking of. For part of the work, there was clearly scope for alternative hypotheses, but not for most of it, as far as I could see. I think if people are going to make that criticism, they should be prepared to say what alternative hypotheses or tests they are thinking of.

***Laird*** · ***Laird*** 2017-08-17, 11:08 PM Administrator

(2017-08-17, 10:25 PM)Chris Wrote: I think if people are going to make that criticism, they should be prepared to say what alternative hypotheses or tests they are thinking of.

Exactly. All I recall seeing in that thread was handwaving.

malf · malf 2017-08-18, 04:17 AM

Bem has condemned himself in his own words:

“I’m all for rigor,” he continued, “but I prefer other people do it. I see its importance—it’s fun for some people—but I don’t have the patience for it.” It’s been hard for him, he said, to move into a field where the data count for so much. “If you looked at all my past experiments, they were always rhetorical devices. I gathered data to show how my point would be made. I used data as a point of persuasion, and I never really worried about, ‘Will this replicate or will this not?’ ”

This is more than 'hand waving' Laird (and I'm sorry, its Novella again, but nobody else writes as well as him in this field):

http://theness.com/neurologicablog/index...-research/

***Laird*** · ***Laird*** 2017-08-18, 04:44 AM Administrator

(2017-08-18, 04:17 AM)malf Wrote: Bem has condemned himself in his own words:

“I’m all for rigor,” he continued, “but I prefer other people do it. I see its importance—it’s fun for some people—but I don’t have the patience for it.” It’s been hard for him, he said, to move into a field where the data count for so much. “If you looked at all my past experiments, they were always rhetorical devices. I gathered data to show how my point would be made. I used data as a point of persuasion, and I never really worried about, ‘Will this replicate or will this not?’ ”

I've bolded what to me seem to be the important parts. My understanding is that he's talking about the experiments he undertook before he "move[d] into a field where the data count for so much", and his expressed attitude (not being worried about replication) was that from before his Feeling the Future experiments - clearly he worried about replication with those experiements, because he provided the full set of materials for others to use!

(2017-08-18, 04:17 AM)malf Wrote: This is more than 'hand waving' Laird

Not really. There's no critique of any actual experimental techniques or analysis there, just a vague insinuation that a lack of rigour led to... I don't know what, you'll have to fill me in.

(2017-08-18, 04:17 AM)malf Wrote: (and I'm sorry, its Novella again, but nobody else writes as well as him in this field):

http://theness.com/neurologicablog/index...-research/

The only serious point in that article is the failed multi-site replication. Two points in response:

Yes, this counts as evidence against, but it has to be balanced with the prior meta-analysis which found a highly significant result.
I get that having to adjust a pre-registered means of analysis is not kosher, but it's not totally meaningless that they found significant results with a different method of analysis - we'll have to see whether the next set of pre-registered experiments with that different method turns up more significant results.

#19 · Chris 2017-08-18, 08:10 AM Unregistered

(2017-08-18, 04:17 AM)malf Wrote: Bem has condemned himself in his own words:

“I’m all for rigor,” he continued, “but I prefer other people do it. I see its importance—it’s fun for some people—but I don’t have the patience for it.” It’s been hard for him, he said, to move into a field where the data count for so much. “If you looked at all my past experiments, they were always rhetorical devices. I gathered data to show how my point would be made. I used data as a point of persuasion, and I never really worried about, ‘Will this replicate or will this not?’ ”

This is more than 'hand waving' Laird (and I'm sorry, its Novella again, but nobody else writes as well as him in this field):

http://theness.com/neurologicablog/index...-research/

I think Laird is clearly right - those rhetorical devices he's referring to were the experiments he did in straight psychology. Otherwise he's just saying he doesn't enjoy the rigorous approach and he's not temperamentally suited to it. But he sees its importance and he's all for it. That's hardly a self-condemnation.

Similarly, sceptics often quote from a paper entitled Bem wrote in 2003, entitled "Writing the Empirical Journal Article":
dbem.ws/WritingArticle.pdf
In that paper he advocates extensive post hoc analysis of experimental data - going on a "fishing expedition". But in the very next sentence (which sceptics rarely quote) he draws a distinction between the "Context of Justification" (relating to how results should be presented) and the "Context of Discovery" (relating to exploratory analysis of experimental data). To my mind the approach he describes isn't sufficiently rigorous, and there are dangers in it (and I would guess he subsequently came to realise that). But in that paper he certainly isn't advocating anything like the "p hacking" strategy he is often accused of.

I think it's possible there are other problems with Bem's "Feeling the Future" study but, as I said, generally I don't see very much scope for "p hacking" using alternative hypotheses or alternative statistical tests. There is scope for it in his Experiment 1 (of 9), but for the other experiments I'd like to see the critics explain what alternative hypotheses or tests they have in mind.

#20 · Chris 2017-08-18, 08:39 AM Unregistered

(2017-08-18, 04:17 AM)malf Wrote: This is more than 'hand waving' Laird (and I'm sorry, its Novella again, but nobody else writes as well as him in this field):

http://theness.com/neurologicablog/index...-research/

Just to comment on one point from that article. It quotes the article on Bem in Slate Magazine, as follows:
"In their conference abstract, though, Bem and his co-authors found a way to wring some droplets of confirmation from the data. After adding in a set of new statistical tests, ex post facto, they concluded that the evidence for ESP was indeed “highly significant.”
[my emphasis]

What the conference abstract actually says is this:
"While the preregistered hypothesis that was assessed on a participant basis did not show a significant psi effect, when the statistical power was increased by using a single trial analysis, the primary hypothesis was highly significant. The results did not support a correlation between study outcome and experimenter expectancy. Overall, these results support the feasibility of a multi-laboratory collaboration and show that single trial analysis might be more appropriate and powerful to process these types of data."

The abstract is clear that the pre-registered test gave a null result, and that the other test which gave a highly significant result was post hoc. So I don't think it's fair to say that the authors concluded that there was strong evidence for psi.

Statistical Significance

Kamarling

Kamarling

Chris

Laird

Chris

Laird

malf

Laird

Chris

Chris