AI megathread

301 Replies, 12180 Views

(2024-10-14, 02:41 PM)Typoz Wrote: There was a twitter thread asking the question,
"Can Large Language Models (LLMs) truly reason?"
which discussed this paper:
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
...

(2025-01-11, 09:21 PM)Sciborg_S_Patel Wrote: Apple study exposes deep cracks in LLMs’ “reasoning” capabilities
Kyle Orland

(2025-01-12, 09:29 PM)Sciborg_S_Patel Wrote: AI still lacks “common” sense, 70 years later

Gary Marcus

Marcus had some insightful commentary on the Apple Study, backed by additional examples:

Quote:This kind of flaw, in which reasoning fails in light of distracting material, is not new. Robin Jia Percy Liang of Stanford ran a similar study, with similar results, back in 2017 (which Ernest Davis and I quoted in Rebooting AI, in 2019:

[Image: https%3A%2F%2Fsubstack-post-media.s3.ama...35x638.png]

Quote:Another manifestation of the lack of sufficiently abstract, formal reasoning in LLMs is the way in which performance often fall apart as problems are made bigger. This comes from a recent analysis of GPT o1 by Subbarao Kambhapati’s team:

[Image: https%3A%2F%2Fsubstack-post-media.s3.ama...88x532.png]

Quote:We can see the same thing on integer arithmetic. Fall off on increasingly large multiplication problems has repeatedly been observed, both in older models and newer models. (Compare with a calculator which would be at 100%.)

[Image: https%3A%2F%2Fsubstack-post-media.s3.ama...0x396.jpeg]

Even o1 suffers from this:

[Image: https%3A%2F%2Fsubstack-post-media.s3.ama...1x1078.png]

The fact a smaller language model can do this - with "implicit Chain of Thought" is interesting though. Seems like this issue can be solved...but a few years and a few billion dollars to yield something a calculator can do doesn't feel impressive?

Quote:The refuge of the LLM fan is always to write off any individual error. The patterns we see here, in the new Apple study, and the other recent work on math and planning (which fits with many previous studies), and even the anecdotal data on chess, are too broad and systematic for that.
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'

- Bertrand Russell


[-] The following 3 users Like Sciborg_S_Patel's post:
  • Laird, Typoz, Valmar
(2025-01-13, 11:05 PM)Sciborg_S_Patel Wrote: Marcus had some insightful commentary on the Apple Study, backed by additional examples:

AGI versus “broad, shallow intelligence”

Gary Marcus

Quote:When I was pressed to define AGI myself in 2022 I proposed (after consultations with Goertzel and Legg), the folllowing, which I still stand by:
Quote:shorthand for any intelligence ... that is flexible and general, with resourcefulness and reliability comparable to (or beyond) human intelligence

LLMs don’t meet that; as the world has discovered, reliability is not their strong suit; as I have often written here, and as Goertzel also emphasizes, LLMs lack the ability to reliably generalize to novel circumstances. Likewise, the inability of LLMs to do basic fact checking and sanity checking speak to their lack of resourcefulness.

GenAI answers are frequently superficial; they invent things (“hallucinations” or what I would prefer to call "confabulations”), they fail to sanity check their own work, and they regularly make boneheaded errors in reasoning, mathematics and so on.. One never knows when one will get a correct answer or ludicrous response like this one observed by AI researcher Abhijit Mahabal:

[Image: https%3A%2F%2Fsubstack-post-media.s3.ama...2x840.jpeg]

Quote:...The river crossing example and many others shows that LLMs often use the words without a deep understanding what those words mean. As Mahabal noted in email to me, “[at times LLMs] seem quite capable of regurgitating or replicating someone's deep analysis that they have found on the internet, and thereby sound deep”, but that regurgitation is an illusion. Genuine depth is lacking.

For me, “broad but shallow” well captures the current regime.
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'

- Bertrand Russell


[-] The following 2 users Like Sciborg_S_Patel's post:
  • Typoz, Valmar

  • View a Printable Version
Forum Jump:


Users browsing this thread: 2 Guest(s)