The most underreported and important story in AI right now is that pure scaling has failed to produce AGI
Gary Marcus
Quote:In the last few days, several more cracks in the scaling edifice started to show. Starting with some smaller and moving to the largest:
- Heavy AI booster Klarna did an about-face from its all in on AI stance. They assumed scaling would make things work, and seem to have changed their mind.
- Humane AI Pin was canceled and the company sold for parts. The founders were forced to retreat from their glorious TED Talk vision of new AI-driven gadgets to working far more modestly, for HP, to “integrate artificial intelligence into the company’s personal computers, printers and connected conference rooms.”
- OpenAI implicitly acknowledged that they don’t yet have GPT-5, and would not get there purely by building massive clusters and gathering more training data.
- Mathematician Daniel Litt exposed massive hallucinations in OpenAI’s Deep Research. (I independently pointed out similar issues in Grok 3 Deep Search last night.)
- Finally, and perhaps most significantly: Elon Musk said over that weekend that Grok 3, with 15x the compute of Grok 2, and immense energy (and construction and chop) bills, would be “the smartest AI on the earth.” Yet the world quickly saw that Grok 3 is still afflicted by the kind of unreliability that has hobbled earlier models. The famous ML expert Andrej Karpathy reported that Grok 3 occasionally stumbles on basics like math and spelling. In my own experiments, I quickly found a wide array of errors, such as hallucinations (e.g, it told me with certainty that there was a significant 5.6-sized earthquake on Feb. 10 in Billings, Montana, when no such thing had happened) and extremely poor visual comprehension (e.g. it could not properly label the basic parts of a bicycle).
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'
- Bertrand Russell
https:// x.com/sama/status/1896651354648818121
Looks like Chat GPT 4.5 might be an Idealist?
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'
- Bertrand Russell
(This post was last modified: 2025-03-06, 03:41 PM by Sciborg_S_Patel.)
I've been using midjourney to design artwork for product packaging, help in colour schemes, and to generate ideas for artwork, and to produce background photo's for adverts for the last 2 years. It's been an absolute game changer for me.
Recently, I've found I'm using Grok more and more, last week for example, I tried chopping up a PDF scan of a competitors 60 odd pages of last years annual accounts and fed them to Grock3, and it provided me with real insight into their financial state in around 30 seconds, insight that I could not have gotten without the help of an accountant. I was able to add news items on the competitor, and Grok came back with some genuinely interesting theories on their possible strategy/direction.
Initially I realised something was wrong, so asked if any pages were missing, and it told me which pages it was missing, suggested there might be an upload limit, and advised I chop the report up, after every upload it told me which pages it had got, in the end I was still missing 2 pages, so reupload these, and it said it had the full accounts.. and proceeded to reanalyze them... it was the most helpful and calm interaction, about something that could have done my nut in?
I can see Grok's textual responses suffer with exactly the same inaccuracy problems as Midjourney's image responses. But they are still both useful products, used within the limits of their capabilities.
We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.
(2025-03-06, 03:41 PM)Sciborg_S_Patel Wrote: https:// x.com/sama/status/1896651354648818121
Looks like Chat GPT 4.5 might be an Idealist?
![[Image: GlJCLO1WAAAS1By?format=jpg&name=medium]](https://pbs.twimg.com/media/GlJCLO1WAAAS1By?format=jpg&name=medium)
This is an impressive AI performance, but it seems to me that as is the current conscensus it only performs the illusion of true artificial intelligence, and its apparent thoughtful reasoning is ultimately derived from its training data statistically processed by a powerful computer system.
I came to this conclusion through the following steps:
I investigated the issue by Googling the internet for human-composed writings (that would presumably have been used among other such human-composed passages in training GPT-4.5) and found a lot of human-composed material on this very philosophical question, in which various philosophers concluded the same as GPT-4.5 that consciousness is all there is.
These findings indicate that GPT-4.5 almost certainly produced this conclusion by processing the human-produced Internet text data it was trained on, and it was lying when it remarked that its conclusion was based on reasoning from first principles.
The following quote from the Internet is some of the human-composed material I used to come to the above conclusion:
Quote:(The book) "Consciousness Is All There Is" delves into the profound and often elusive topic of consciousness, proposing that consciousness is not merely a part of our existence but the very foundation of it. The book intertwines ancient wisdom from traditions like Advaita Vedanta with modern scientific perspectives to present a cohesive theory that consciousness is the fundamental reality. By exploring the depths of consciousness, the author seeks to address essential questions about life, the universe, and our place within it.
GPT-4.5 isn't really an idealist, since the term "idealist" inherently assumes that this philosophical position was arrived at by a process of thoughtful human reasoning.
(This post was last modified: 2025-03-07, 03:22 PM by nbtruthman. Edited 3 times in total.)
(2025-03-07, 03:08 PM)nbtruthman Wrote: This is an impressive AI performance, but it seems to me that as is the current conscensus it only performs the illusion of true artificial intelligence, and its apparent thoughtful reasoning is ultimately derived from its training data statistically processed by a powerful computer system.
I came to this conclusion through the following steps:
I investigated the issue by Googling the internet for human-composed writings (that would presumably have been used among others in training GPT-4.5) and found a lot of human-composed material on this very philosophical question, in which various philosophers concluded the same as GPT-4.5 that consciousness is all there is.
These findings indicate that GPT-4.5 almost certainly produced this conclusion by processing the human-produced Internet text data it was trained on.
The following quote from the Internet is some of the human-composed material I used to come to the above conclusion:
Oh I just thought it was amusing, not a sign of real thought
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'
- Bertrand Russell
Authors outraged to discover Meta used their pirated work to train its AI systems
By Nicola Heath on 28 March, 2025 in ABC News.
Quote:Walsh believes the advent of AI warrants a revision of intellectual property law.
"Copyright was formulated for printing, where you were making exact copies of people's work," he says.
"Here, Llama is not making necessarily an exact copy of my work — although it will tell you exactly what's in chapter five, it will be able to write in my style, it will be able to answer questions or reproduce parts of the text — but it is derived from the intellectual labours that I and the other authors put into writing their texts."
Walsh believes we're at "the Napster moment", referring to the peer-to-peer (P2P) file-sharing application that launched in 1999, revolutionising the way we listen to music. Facing a flurry of copyright lawsuits, Napster ceased operations in 2001 and filed for bankruptcy the following year.
"When we started streaming music, to begin with, all that music was stolen. It was all pirated content. No-one was paying for it. Musicians were getting no recompense for their music being streamed," Walsh says.
"Napster was sued out of existence and, ultimately, we moved to where we are today, where we have services like Spotify and Apple Music, where they pay [for music]."
Walsh is quick to acknowledge that few musicians — bar the likes of Taylor Swift — earn a living wage from the current streaming model.
But, he says, "It's more sustainable than it was, where there was nothing going back to the musicians at all."
I'm enjoying using Grok, but if these AI's get used in place of a tradition search engine by the general public (which is obviously going to happen), I can see how lazy that might make people in thinking for themselves - and how reliant they may become on it.
I remember the shock of not knowing how to drive somewhere without satnav, because I had simply not bothered to use my spatial/memory abilities on past journeys to the same destination. AI in place of Search seems a problem of a far larger magnitude than satnav.
Since using Grok I can see me not bothering to Google search anymore to get quick information. But searching, finding and reading the raw data for yourself, finds all the detail that Grok misses, all the connections it cannot see, and allows you to build your own views with nuance.
I can already see people on X.com, using Grok to openly to push back on issues, or copying and pasting it's responses as their own, but what the hell is it doing to peoples brains. Just like the satnav, they ain't learning or forming an understanding, they are simply relying on it, and when the crutch is kicked away I'm wondering just how bad things could get. On the other hand their are massive upsides too...
We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.
(This post was last modified: 2025-04-06, 07:30 PM by Max_B. Edited 1 time in total.)
(2025-02-20, 12:10 PM)Sciborg_S_Patel Wrote: https://x.com/mihonarium/status/1880944026603376865
Remember o3’s 25% performance on the FrontierMath benchmark?
It turns out that OpenAI funded FrontierMath and has had access to most of the dataset.
Mathematicians who’ve created the problems and solutions for the benchmark were not told OpenAI funded the work and will have access.
That is:
- we don’t know if OpenAI trained o3 on the benchmark, and it’s unclear if their results can be trusted
- mathematicians, some of whom distrust OpenAI and would not want to contribute to general AI capabilities due to existential risk concerns, were misled: most didn’t suspect a frontier AI company funded it.
From Epoch AI: “Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.”
There was a “verbal agreement” with OpenAI—as if anyone trusts OpenAI’s word at this point: “We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.”
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi et al
Quote:Recent advancements in large language models (LLMs) have shown impressive progress in mathematical reasoning tasks. However, current evaluation benchmarks predominantly focus on the accuracy of final answers, often overlooking the logical rigor crucial for mathematical problem-solving. The claim that state-of-the-art LLMs can solve Math Olympiad-level problems requires closer examination. To explore this, we conducted both qualitative and quantitative human evaluations of proofs generated by LLMs, and developed a schema for automatically assessing their reasoning capabilities. Our study reveals that current LLMs fall significantly short of solving challenging Olympiad-level problems and frequently fail to distinguish correct mathematical reasoning from clearly flawed solutions. We also found that occasional correct final answers provided by LLMs often result from pattern recognition or heuristic shortcuts rather than genuine mathematical reasoning. These findings underscore the substantial gap between LLM performance and human expertise in advanced mathematical reasoning and highlight the importance of developing benchmarks that prioritize the rigor and coherence of mathematical arguments rather than merely the correctness of final answers.
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'
- Bertrand Russell
From another thread:
(2025-04-06, 07:47 PM)Sciborg_S_Patel Wrote: It's an impressive bit of computational linguistics, but I still don't think there is any genuine understanding going on.
But I'd need to actually see how the code works, which I assume no company is going to put out.
You know what they say about assumptions. Here's the DeepSeek source code.
Grab your debugger and have at it. Report back to us pronto, because we're all keen to know what on earth these black boxes are actually doing.
On that matter, I'd been meaning to share this fascinating video by Matthew Berman that I came across a few days ago:
We Finally Figured Out How AI Actually Works… (not what we thought!)
I haven't (yet?) read nor even located the paper it references for myself, but it seems very much based on this video that even pre-chain-of-thought LLMs are "thinking" pre-verbally - challenging the completeness and adequacy of your "computational linguistics" characterisation - and sometimes - in a very human-like manner - retrofitting explanations for correct answers arrived at by very different means than explained.
(This post was last modified: 2025-04-07, 03:45 AM by Laird. Edited 2 times in total.)
The following 1 user Likes Laird's post:1 user Likes Laird's post
• sbu
Incidentally, in the context of the thread from which that post of yours to which I've just responded came, Sci, Michael Levin's warning from his "Ingressing Minds" paper seems worth highlighting:
Michael Levin Wrote:Thus, I argue for considerable humility with respect to our engineered constructs (embodied robotics, software AI’s, language models, etc.) because much as with the eons of competence without comprehension around having babies, we can make things without understanding how it works or what we really produced
|