Statistical Significance

65 Replies, 13012 Views

(2018-03-02, 03:12 PM)Chris Wrote: J. E. Kennedy has posted a new paper on his site, written with Caroline Watt, entitled "How to Plan Falsifiable Confirmatory Research".

At the risk of becoming (more of) a bore, something in the paper that I definitely disagree with is the idea that a study designed purely to test whether or not the null hypothesis is true, without specifying a definite alternative hypothesis that can be falsified, should be classified as "exploratory".

The problem with that is that "exploratory" usually means research for which hypotheses are not fixed in advance, that is, research which - because of the flexibility available in picking post hoc hypotheses - is of no evidential value. But there is all the difference in the world between an exploratory study in which measurements are made without any pre-specified hypothesis in mind, on the one hand, and a study specifically designed to test for inconsistency with the null hypothesis, on the other. In the latter case, there is no flexibility in interpretation, and no reason not to assign evidential value to the result.

So I think there are really three different things here (1) exploratory studies, (2) studies designed to investigate whether an effect exists and (3) studies designed to investigate whether a particular model of the effect is adequate. 

If no distinction is made between (1) and (2), psi research becomes very difficult. The point is that, if psi exists, we don't know how it works, and any attempt to model it is extremely speculative. If, for that reason, experimental studies of psi are deemed to be non-evidential, there is a danger of a "Catch 22" situation, in which there can be no experimental evidence of the existence of psi without a theory of psi, and a theory of psi will never be developed because adequate research resources won't be available without experimental evidence of psi.

In reality, if there is experimental data indicating the null hypothesis should be rejected at a "5 sigma" level, I think that should count as evidence.
[-] The following 2 users Like Guest's post:
  • Laird, Typoz
Courtesy of the Anomalist website - here's an article by Christopher Bergland, entitled "Rethinking P-Values: Is "Statistical Significance" Useless?", which summarises the contents of a special issue of the journal The American Statistician on that topic:
https://www.psychologytoday.com/us/blog/...ce-useless

Apparently the purpose of the issue was “to end the practice of using a probability value (p-value) of less than 0.05 as strong evidence against a null hypothesis or a value greater than 0.05 as strong evidence favoring a null hypothesis." (Have people really been treating p>0.05 as strong evidence for the null hypothesis?)

The problem with this kind of thing is that it tends to jump from legitimate advice about how to avoid misusing p values to the wild conclusion that people shouldn't use p values at all. Apparently the editors give a list of five "don't":

Quote:
  1. Don’t base your conclusions solely on whether an association or effect was found to be “statistically significant."
  2. Don’t believe that an association or effect exists just because it was statistically significant.
  3. Don’t believe that an association or effect is absent just because it was not statistically significant.
  4. Don’t believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.
  5. Don’t conclude anything about scientific or practical importance based on statistical significance (or lack thereof).


Of course, those are pretty unexceptionable, with the possible exception of the first - because in some circumstances a measure of statistical significance may be all you have, and a p value may be the most appropriate measure.

But then (at least as summarised by Bergland) they take the huge leap to advocating the benefits of not using p values at all. How on earth does that follow from the measured list of "Don'ts"?

And although the editors reassure readers that the contributions in this issue provide alternative solutions to "the very hard problem of separating signal from noise in data and making decisions under uncertainty", it's noticeable that none of the alternatives is so much as mentioned by Bergland.

It's understandable that people want a change if there are problems with the way things are now, but if no one has a clear plan for an alternative, they may just end up in an almighty mess.
(2017-10-07, 05:31 PM)Chris Wrote: But supposing someone pressed on regardless, and applied the formula to a typical psi experiment, they could make H0 the null hypothesis, in which psi doesn't exist, and H1 some kind of psi hypothesis, and they could dream up prior probabilities P0 and P1 to assign to these two hypotheses. 

Then, as usual, they could characterise the result of the experiment using some variable y, and they could choose a small number (call it a) as a significance level, so that "success" would be a result in which y was so large that such a value would be obtained with probability a if the null hypothesis were true. They could design their experiment by choosing another number less than 1 (call it b) as the power, so that the experiment would be successful with probability b if the psi hypothesis were true.

They could use an equation similar to the one above, with the set of results considered being those for which the experiment was a success. Swapping H0 and H1 in the equation above, they could use it to estimate the probability that the null hypothesis was true despite the experiment being a success (i.e. a "false positive"). In the equation, the probability of the result given H0 would simply be a, and the probability of the result given H1 would simply be b. 

Then after a bit of rearrangement, they would get:

Probability of null given success = 
                                                                        1
                                                          _________________
                                                          1 + b x P1 / (a x P0)

Probability of psi given success =
                                                             b x P1 / (a x P0)
                                                          _________________
                                                          1 + b x P1 / (a x P0)

So to comment on Laird's point above, both the significance level (in other words the p value) and the power - as well as the prior estimates P0 and P1 - influence the estimates of the likelihood of psi and no-psi. In fact, as the power is likely to be a number somewhat less than 1, but not small, whereas the significance level may be quite small, the probabilities are likely to be more sensitive to the significance level than to the power.

(2017-10-07, 06:00 PM)Chris Wrote: Incidentally, the first form of Bayes's Theorem in the last post is helpful in examining the suggestion that "smaller effects increase the likelihood that that positive results are false-positives, even in the setting of very low p-values". If the p value (i.e. a in the equation above) is the same, and the prior probabilities assigned to H0 and H1 are the same, then effect size can influence the likelihood of a false positive only through the power of the experiment (b above). 

If the small-effect-size experiment is designed to have the same power as the large-effect-size experiment, the likelihood of a false positive will not be any greater.

Chris,

Unfortunately, you cannot use the p-value and power to calculate the posterior probability of a hypothesis, given a result R.  The p-value is the probability of the observed result or a more extreme one.  But in Bayesian inference, only the probability of the observed result itself is relevant.

The observed value R will be a realization of some random variable Y.  The correct value for a is the likelihood of R under H0, that is, the value of the probability density function (pdf) of Y specified by H0 at the value R. 

The correct value for b is the marginal likelihood of R under H1, that is, the average value of the likelihood of R averaged over all possible probability density functions under H1. Typically, there are an infinite number of these pdfs. For example if H1 is that the hit rate in a Ganzfeld experiment will not be p=0.25, then there is a pdf under H1 for every value p in the interval [0,1] except for 0.25.  To compute the marginal likelihood, we specify a probability distribution g(p) over the possible values of p, and compute the weighted average of the likelihood of R under each such pdf with weights specified by g(p).  Since g(p) is a continuous function, the marginal likelihood is computed by integration.
(2019-08-09, 10:48 PM)Beasty Wrote: Chris,

Unfortunately, you cannot use the p-value and power to calculate the posterior probability of a hypothesis, given a result R.  The p-value is the probability of the observed result or a more extreme one.  But in Bayesian inference, only the probability of the observed result itself is relevant.

The observed value R will be a realization of some random variable Y.  The correct value for a is the likelihood of R under H0, that is, the value of the probability density function (pdf) of Y specified by H0 at the value R. 

The correct value for b is the marginal likelihood of R under H1, that is, the average value of the likelihood of R averaged over all possible probability density functions under H1. Typically, there are an infinite number of these pdfs. For example if H1 is that the hit rate in a Ganzfeld experiment will not be p=0.25, then there is a pdf under H1 for every value p in the interval [0,1] except for 0.25.  To compute the marginal likelihood, we specify a probability distribution g(p) over the possible values of p, and compute the weighted average of the likelihood of R under each such pdf with weights specified by g(p).  Since g(p) is a continuous function, the marginal likelihood is computed by integration.

I think you have misunderstood the calculation. The probabilities I've considered are the probabilities of obtaining a statistically significant result, not the probabilities of observing a particular value of any variable.

The probability a is the probability of obtaining a significant result under the null hypothesis. So it's not the value of the pdf for any particular value of your variable R, but the integral of the pdf over the range of values of R corresponding to significant results.

The probability b is the power. That is the probability of obtaining a significant result under the psi hypothesis. So that too is an integral of the pdf corresponding to the psi hypothesis, also over the range of values of R corresponding to significant results.

In deriving those equations, no assumption is necessary about what the psi hypothesis is, though of course in any particular application the psi hypothesis will need to be specified mathematically in order to calculate b. It could be a hypothesis of the form you specify, namely a kind of superposition of different values of p. But of course, psi may not work in that simple way, so the hypothesis could be something much more complicated.

(Edit: Just to be clear, I am defining "success" as obtaining a statistically significant result. Not a particular result, but any result in the range defined as statistically significant.)
[-] The following 1 user Likes Guest's post:
  • Laird
(2017-10-07, 05:31 PM)Chris Wrote: But supposing someone pressed on regardless, and applied the formula to a typical psi experiment, they could make H0 the null hypothesis, in which psi doesn't exist, and H1 some kind of psi hypothesis, and they could dream up prior probabilities P0 and P1 to assign to these two hypotheses. 

Then, as usual, they could characterise the result of the experiment using some variable y, and they could choose a small number (call it a) as a significance level, so that "success" would be a result in which y was so large that such a value would be obtained with probability a if the null hypothesis were true. They could design their experiment by choosing another number less than 1 (call it b) as the power, so that the experiment would be successful with probability b if the psi hypothesis were true.

They could use an equation similar to the one above, with the set of results considered being those for which the experiment was a success. Swapping H0 and H1 in the equation above, they could use it to estimate the probability that the null hypothesis was true despite the experiment being a success (i.e. a "false positive"). In the equation, the probability of the result given H0 would simply be a, and the probability of the result given H1 would simply be b. 

Then after a bit of rearrangement, they would get:

Probability of null given success = 
                                                                        1
                                                          _________________
                                                          1 + b x P1 / (a x P0)

Probability of psi given success =
                                                             b x P1 / (a x P0)
                                                          _________________
                                                          1 + b x P1 / (a x P0)

So to comment on Laird's point above, both the significance level (in other words the p value) and the power - as well as the prior estimates P0 and P1 - influence the estimates of the likelihood of psi and no-psi. In fact, as the power is likely to be a number somewhat less than 1, but not small, whereas the significance level may be quite small, the probabilities are likely to be more sensitive to the significance level than to the power.

(2019-08-09, 11:21 PM)Chris Wrote: I think you have misunderstood the calculation. The probabilities I've considered are the probabilities of obtaining a statistically significant result, not the probabilities of observing a particular value of any variable.

The probability a is the probability of obtaining a significant result under the null hypothesis. So it's not the value of the pdf for any particular value of your variable R, but the integral of the pdf over the range of values of R corresponding to significant results.

The probability b is the power. That is the probability of obtaining a significant result under the psi hypothesis. So that too is an integral of the pdf corresponding to the psi hypothesis, also over the range of values of R corresponding to significant results.

In deriving those equations, no assumption is necessary about what the psi hypothesis is, though of course in any particular application the psi hypothesis will need to be specified mathematically in order to calculate b. It could be a hypothesis of the form you specify, namely a kind of superposition of different values of p. But of course, psi may not work in that simple way, so the hypothesis could be something much more complicated.

(Edit: Just to be clear, I am defining "success" as obtaining a statistically significant result. Not a particular result, but any result in the range defined as statistically significant.)

Chris, you stated in a couple of places that a is the p-value.  Is that not what you meant?
(2019-08-09, 11:33 PM)Beasty Wrote: Chris, you stated in a couple of places that a is the p-value.  Is that not what you meant?

a is the specified significance level for the experiment - the value that is commonly set to 0.05. I see I have described it parenthetically above as "in other words the p value," which I can see is confusing. I should have said "the p value for significance" or something like that.

And thank you for raising this, because it's made me realise that I have recently erred in using a p value rather than the significance level in another thread - the one on Louie Savva's thesis. In the following post, the power value should have been recalculated so that it was based on the p value rather than on 0.05 (I think this reduces the "psi probability" to 90%* rather than 94%, but I'll check that as soon as I have a chance):
https://psiencequest.net/forums/thread-l...2#pid30962

(* Edit, on reflection I think it's fairer to use the 0.05 significance level and not recalculate the power. That gives 85%, compared with 94% if the study had been properly powered. Fortunately that's about the same answer obtained by calculating directly the probability of the observed hit rate under psi and no-psi hypotheses.)
[-] The following 1 user Likes Guest's post:
  • Laird

  • View a Printable Version


Users browsing this thread: 2 Guest(s)