Intuitive Assessment of Mortality Based on Facial Characteristics

#1 · Chris 2018-07-31, 02:32 PM Unregistered

Courtesy of the SPR Facebook page, here's a paper from the current number of Explore: The Journal of Science and Healing:

Intuitive Assessment of Mortality Based on Facial Characteristics: Behavioral, Electrocortical, and Machine Learning Analyses
Arnaud Delorme, Alan Pierce, Leena Michel and Dean Radin
Explore 14(4): 262-267 (2018)

Studies of various characteristics of the human face indicate that it contains a wealth of information about health status. Most studies involve objective measurement of facial features as correlated with historical health information. But some individuals also claim to be adept at intuitively gauging mortality based solely upon a quick glance at a person's photograph. To test this claim, we invited 12 such individuals to see if they could tell if a person was alive or dead based solely on a brief examination of his or her photograph. All photos used in the experiment were transformed into a uniform gray scale and counterbalanced across eight categories as follows: gender, age, gaze direction, glasses, head position, smile, hair color, and image resolution. Participants examined 404 photographs displayed on a computer monitor, one photo at a time, each shown for a maximum of 8 seconds. Half of the individuals in the photos were deceased, and half were alive at the time the experiment was conducted. Participants were asked to indicate if they thought the person in a photo was living or deceased by pressing an appropriate button. Overall, mean accuracy on this task was 53.6%, where 50% was expected by chance (P = .005, two tail). Statistically significant accuracy was independently obtained in 5 of the 12 participants. We also collected 32-channel electrocortical recordings and observed a robust difference between images of deceased individuals correctly vs. incorrectly classified in the early event related potential at 100 ms post-stimulus onset. We then applied machine learning techniques to classify the photographs based on 11 image characteristics; both random forest and logistic regression machine learning approaches were used, and both classifiers failed to achieve accuracy above chance level. Our results suggest that some individuals can intuitively assess mortality based on some as-yet unknown features of the face.

Unfortunately the full text doesn't seem to be freely available online.

I don't understand at all why this is presented in terms of assessing whether someone is alive or dead "based on some as-yet unknown features of the face". Surely the idea behind the experiment is that they are determining this by psychical means, and cues from facial features would be one of the sceptical counter-explanations. And surely the machine-learning trial was an attempt to argue against that possibility.

#2 · Chris 2018-07-31, 04:43 PM Unregistered

Although there is a very small different in the reported hit rate, it sounds as though this was a revised version of a paper by the same authors which was published and then retracted in 2016 by the journal Frontiers in Human Neuroscience:
https://www.frontiersin.org/articles/10....00173/full

#3 · Chris 2018-07-31, 05:02 PM Unregistered

(2018-07-31, 04:43 PM)Chris Wrote: Although there is a very small different in the reported hit rate, it sounds as though this was a revised version of a paper by the same authors which was published and then retracted in 2016 by the journal Frontiers in Human Neuroscience:
https://www.frontiersin.org/articles/10....00173/full

The text of the retracted paper can be found here:
https://pdfs.semanticscholar.org/3329/e7...1533055500

Here's a report about it from Retraction Watch:
https://retractionwatch.com/2016/11/07/j...bjections/

And here's a blog post referred to in that report:
http://neurocritic.blogspot.com/2016/08/...s-are.html

The blog post and comments point to several potential flaws in the experimental design:
(1) The photos included pictures of young people from old school yearbooks, and more recent photos of older people. So there was a correlation between the age of the people depicted and how recently the photos had been taken.
(2) The school yearbooks came from two quite different periods, c. 1940 and c. 1966, so there could be clues to the period from the style of clothes and hair.
(3) Each subject was apparently shown equal numbers of dead people and living people, rather than being presented with a random selection. Though this wouldn't tend to raise the hit rate, the variance would potentially differ from that of the binomial distribution. [And the abstract above says the photos were counterbalanced across 8 other categories, which would exacerbate the problem.]

#4 · Chris 2018-07-31, 09:35 PM Unregistered

(2018-07-31, 05:02 PM)Chris Wrote: The blog post and comments point to several potential flaws in the experimental design:
(1) The photos included pictures of young people from old school yearbooks, and more recent photos of older people. So there was a correlation between the age of the people depicted and how recently the photos had been taken.
(2) The school yearbooks came from two quite different periods, c. 1940 and c. 1966, so there could be clues to the period from the style of clothes and hair.
...

Actually, the retracted paper does mention in the results section that "all databases contained the same number of living and deceased individuals" (though that doesn't seem to be explained clearly in the methods section). There were three databases - one from the older school yearbooks, another from the more recent ones, and the third from the photos of older people. So in that case the clues to the age of the photos shouldn't have helped.

#5 · Chris 2018-08-01, 08:19 AM Unregistered

(2018-07-31, 05:02 PM)Chris Wrote: The blog post and comments point to several potential flaws in the experimental design:
...
(3) Each subject was apparently shown equal numbers of dead people and living people, rather than being presented with a random selection. Though this wouldn't tend to raise the hit rate, the variance would potentially differ from that of the binomial distribution. [And the abstract above says the photos were counterbalanced across 8 other categories, which would exacerbate the problem.]

Thinking about it, rather than that fairly subtle statistical flaw, I would be more concerned about the fact that the statistical analysis assumes that the responses of the different subjects to the same set of photos are independent. The headline hit rate of 53.6% wouldn't be statistically significant for a single subject, given that there were 404 photos all together, and the subjects weren't forced to give an answer for all of them. It's only because the results for 12 subjects have been combined that a significant result has been obtained, and that depends on the assumption that the responses of the different subjects are independent. But in reality the different subjects may well have shared biases in judging whether a person in an old photo is now likely to be dead or alive.

That would tend to inflate the significance of the results even if those biases didn't coincide with a real likelihood of the person dying or surviving. But given that the only database that produced significant results was the one that included photos from published obituaries, it may just be that some of the dead people pictured didn't appear to be in the best of health, compared to age-matched people who hadn't died.

To be fair, the authors do conclude that "The most straightforward interpretation of our results is that the participants were sensitive to facial features that indicated impending health problems." The alternative explanation of the difference between the results for the three databases is that the subjects performed better with people who had died more recently. Of course, that idea could be tested by finding out when the people in the two older databases died, and testing whether date of death correlated with hit rate.

#6 · Chris 2018-08-05, 01:00 AM Unregistered

Just for the record, the statistical fallacy here is the same one that would be encountered if someone ran a dream precognition experiment with one randomly selected target and one decoy, asked 10,000 people to say which of the two pictures their dreams matched more closely - and then analysed the success rate as if the experiment consisted of 10,000 independent trials, each with a 50-50 likelihood of success.

Suppose one of the two pictures - picture A - contained something that was very commonly seen in dreams - say 60% of the time - and the other didn't. If picture A happened to be picked as the target, there would be 6,000 hits and 4,000 misses. Analysing that as a hit rate from 10,000 independent trials would produce a phenomenal degree of statistical significance. But that would happen, in the absence of any psi effect, every time picture A was the target - in other words 50% of the time.

Ignoring the dependence between the choices made by different subjects would produce a spuriously significant result, where there was no real significance at all.

(Edit: Or rather, it's a consequence of the fact that there isn't an independent choice of target for different subjects, rather than that the subjects' choices lack independence.)