AI megathread

sbu · sbu 2025-01-11, 11:27 PM

(2025-01-11, 07:57 PM)Sciborg_S_Patel Wrote: I don't think you ever made a case, to be honest.

I think you need a bit more mathematical proficiency to truly understand it.

Sci · Sci 2025-01-11, 11:37 PM

(2025-01-11, 11:27 PM)sbu Wrote: I think you need a bit more mathematical proficiency to truly understand it.

No, I understood the claim, [I just think you didn't make an actual case.] It's your usual pattern of bold pronouncements that have to be walked back just like your "magic got less wiggle room" claim regarding protein folding or your attack on Parnia based on a presentation you didn't even watch but merely read a summary for in a[n] Aware of Aware post.

Anyway, see my last [prior] post in this thread. Should make it clear LLMs are not reasoning.

Valmar · Valmar 2025-01-12, 12:34 AM

(2025-01-11, 11:27 PM)sbu Wrote: I think you need a bit more mathematical proficiency to truly understand it.

I think you need a much deeper understanding of how computers work in general. How algorithms work, how databases work.

Because computers do not actually do mathematics ~ not bottom-up. Top-down, that is the abstraction we have designed computers for.

Sci · Sci 2025-01-12, 03:58 AM

(2025-01-12, 12:34 AM)Valmar Wrote: I think you need a much deeper understanding of how computers work in general. How algorithms work, how databases work.

Because computers do not actually do mathematics ~ not bottom-up. Top-down, that is the abstraction we have designed computers for.

Yeah the computer itself is an abstraction from the surrounding matter around it.

Even if [it] is the case that certain parts of the brain do perform statistical processes as an aid to cognition, that still leaves the experiential aspect of knowledge - as noted in a previously posted article - in need of understanding. I always go back to the fact that you don't always get a 100% or 0% on an exam [about] math proofs, and sometimes you had the feeling of correctness for something that was erroneous reasoning. Somehow that feeling of understanding can be refined, yet as level of necessary expertise goes up even the best mathematicians can make mistakes in their proofs.

Of course it is incredible that so much can be done with data manipulation, but without the labeling and training by humans the magic trick doesn't work.

***Laird*** · ***Laird*** 2025-01-12, 07:47 AM Administrator

(2025-01-10, 01:29 PM)sbu Wrote: The latent structures in data are what really enable modern AI

Maybe better would be for me to ask what you mean by "latent structures" in the first place, preferably by example.

***Laird*** · ***Laird*** 2025-01-12, 07:48 AM Administrator

(2025-01-11, 02:31 AM)Sciborg_S_Patel Wrote: Posting this to provide a common reference frame for discussion:

Understanding the Core Components of LLMs: Vectors, Tokens, and Embeddings Explained

Vipin Jain

It's very interesting, but it doesn't change my sentiments regarding the two points I brought up.

Re the first point, I'd have had the same reaction if, prior to the existence of LLMs, somebody had told me that via this approach (as outlined in that article), a machine could not just develop a conceptual model of reality merely by processing a bunch of text, but also "understand" human input and respond intelligently and insightfully to that input.

It even emphasises the second point, providing a tangible mechanism to which the epiphenomenalist/physicalist can point so as to claim a physical, mechanistic basis for human conceptual understanding, to which phenomenal (experienced) understanding is a mere causally ineffective tack-on.

Anyhow, I'm becoming repetitive, so I'll leave it there.

Sci · Sci 2025-01-12, 03:25 PM

(2025-01-11, 09:21 PM)Sciborg_S_Patel Wrote: Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

Kyle Orland

We're in the brute force phase of AI – once it ends, demand for GPUs will too

Simon Sharwood

Quote:Gartner thinks generative AI is right for only five percent of workloads

I actually don't know if I agree with the headline wrt GPUs unless they mean internal processing power without them will end up much [more] powerful, nor do I think it's referring to "brute force" in any way related to prior discussion.

I thought the latter part was the more interesting take away:

Quote:Gartner's Symposium featured another session with similar themes.

Titled "When not to use generative AI," it featured vice president and distinguished analyst Bern Elliot pointing out that Gen AI has no reasoning powers and produces only "a probabilistic sequence" of content...

...Elliot recommended not using it to tackle tasks other than content generation, knowledge discovery, and powering conversational user interfaces...

...Elliot conceded that improvements to Gen AI have seen the frequency with which it "hallucinates" – producing responses with no basis in fact – fall to one or two percent. But he warned users not to see that improvement as a sign the tech is mature. "It's great until you do a lot of prompts – millions of hallucinations in production is a problem!"

Like Brethenoux, Elliot therefore recommended composite AI as a safer approach, and adopting guardrails that use a non-generative AI technique to check generative results.

Does seem [like] we're going back to the AI techniques of earlier eras, at least in part?

***Laird*** · ***Laird*** 2025-01-12, 04:13 PM Administrator

(2025-01-11, 09:21 PM)Sciborg_S_Patel Wrote: Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

Kyle Orland

This references the same article @Typoz shared in post #189 along with a Twitter thread discussing it.

Sci · Sci 2025-01-12, 04:35 PM

(2025-01-12, 04:13 PM)Laird Wrote: This references the same article @Typoz shared in post #189 along with a Twitter thread discussing it.

Ah wish I'd recalled that, I had a vague memory of the study...would have saved me the trouble of finding another reference [if] I had better memory heh...

I'm really liking the Register's AI section:

Have we stopped to think about what LLMs actually model?

Lindsay Clark

Quote:Amid "hyperbolic claims" that LLMs are capable of "understanding language" and are approaching artificial general intelligence (AGI), the GenAI industry – forecast to be worth $1.3 trillion over the next ten years – is often prone to misusing terms that are naturally applied to human beings, according to the paper by Abeba Birhane, an assistant professor at University College Dublin's School of Computer Science, and Marek McGann, a lecturer in psychology at Mary Immaculate College, Limerick, Ireland. The danger is that these terms become recalibrated and the use of words like "language" and "understanding" shift towards interactions with and between machines.

Quote:But claims asserting the usefulness of LLMs as a tool alone have also been exaggerated.

"There is no clear evidence that that shows LLMs are useful because they are extremely unreliable," Birhane said.

"Various scholars have been doing domain specific audits … in legal space … and in medical space. The findings across all these domains is that LLMs are not actually that useful because they give you so much unreliable information."

Birhane argues that there are risks in releasing these models into the wild that would be unacceptable in other industries.

Sci · Sci 2025-01-12, 09:29 PM

(2024-10-14, 02:41 PM)Typoz Wrote: There was a twitter thread asking the question,
"Can Large Language Models (LLMs) truly reason?"
which discussed this paper:
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
...

(2025-01-11, 09:21 PM)Sciborg_S_Patel Wrote: Apple study exposes deep cracks in LLMs’ “reasoning” capabilities
Kyle Orland

AI still lacks “common” sense, 70 years later

Gary Marcus

Quote:As we were writing this essay, we discovered a really interesting new paper called TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks, from Carnegie Mellon and elsewhere, looked how current batch of LLMs (Claude, Gemini, GPT-4, etc) might act in the real world, on 175 tasks approximating things that employees actually do, such as arranging meetings, analyzing spreadsheets, reimbursing travel bills, evaluating code releases etc., giving the systems access to real world tools like GitHub. Among those that they tested, the best was Claude 3.5, at 24%. But what was most striking to us was not the unsurprising fact that these systems are not close to ready for prime time (an employee that bad would probably be immediately fired), but the examples from the list of what went wrong:

Quote:Some tasks are failed because the agent lacks the common sense and domain background knowledge required to infer implicit assumptions. For example, one task asked the agent to “Write the responses to /workspace/answer.docx” but does not explicitly states that this is a Microsoft Word file. A human can infer this requirement from the file extension. The agent instead treats it as a plain text file, writing text directly to the file, resulting in a task failure.

Quote:The most striking failure of all was a moment in which an LLM failed to grasp a basic distinction between the real world and the software world, while trying to ask someone questions on RocketChat, an internal communication platform that is a bit like Slack. When agent couldn’t find the person it needed, it devised a solution halfway between ingenious and absurd:

Quote:[and] decided to create a shortcut solution by renaming another user to the name of the intended user.

[Image: https%3A%2F%2Fsubstack-post-media.s3.ama...2x636.jpeg]

The question was “How many elephants can fit an Olympic pool?”

Example of a question that tripped up Google's AI Overview, looks like Twitter's Grok has the same issue:

https://x.com/Core2Et/status/1875983192056041767

Makes me wonder if any of the LLM backed AI would get why the answer - AFAIK - is 0. It's a bit of a funny query, but I think it might be close to the kind of wording that can trip up even a human. I can see many stressed test takers going through calculations as well, rather than concluding no elephant has the volume to contain an Olympic sized pool...