Why exams intended for humans might not be good benchmarks for LLMs like GPT-4
Ben Dickonson
Ben Dickonson
Quote:For example, one experiment showed that GPT-4 performed very well on Codeforces programming challenges created before 2021, when its training data was gathered. Its performance dropped dramatically on more recent problems. Narayanan found that in some cases, when GPT-4 was provided the title of a Codeforces problem, it could produce the link to the contest where it appeared.
In another experiment, computer scientist Melanie Mitchell tested ChatGPT’s performance on MBA tests, a feat that was widely covered in the media. Mitchell found that the model’s performance on the same problem could vary substantially when the prompt was phrased in slightly different ways.
“LLMs have ingested far more text than is possible for a human; in some sense, they have ‘memorized’ (in a compressed format) huge swaths of the web, of Wikipedia, of book corpora, etc.,” Mitchell told VentureBeat. “When they are given a question from an exam, they can bring to bear all the text they have memorized in this form, and can find the most similar patterns of ‘reasoning’ that can then be adapted to solve the question. This works well in some cases but not in others. This is in part why some forms of LLM prompts work very well while others don’t.”
Quote:“This is not to say that enormous statistical models like LLMs could never reason like humans — I don’t know whether this is true or not, and answering it would require a lot of insight into how LLMs do what they do, and how scaling them up affects their internal mechanisms,” Mitchell said. “This is insight which we don’t have at present.”
What we do know is that such systems make hard-to-predict, non-humanlike errors, and “we have to be very careful when assuming that they can generalize in ways that humans can,” Mitchell said.
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'
- Bertrand Russell
(This post was last modified: 2023-04-09, 11:03 PM by Sciborg_S_Patel. Edited 1 time in total.)
- Bertrand Russell