AI megathread

Sci · Sci 2025-01-11, 02:31 AM

Posting this to provide a common reference frame for discussion:

Understanding the Core Components of LLMs: Vectors, Tokens, and Embeddings Explained

Vipin Jain

Quote:What are Embeddings?

Embeddings are vectors that contain the semantic context of text. They are generated by embedding models that learn from vast amounts of text data, capturing a token's identity and its relationships with other tokens. This deep understanding of language enables LLMs to perform sentiment analysis, text summarization, and question-answering tasks with human-like comprehension and generation capabilities.

For example, the embedding for the word "apple" would not only represent the word itself but also its associations with concepts like "fruit," "orchard," and "food." Embeddings result from sophisticated training processes where models learn to map tokens to high-dimensional vectors that encapsulate their meanings and contexts.

Quote:How Embeddings Work

Embeddings work by placing tokens in a high-dimensional space where similar tokens are located close to each other. This spatial representation allows LLMs to understand and generate text with contextual and semantic accuracy. For example, "king" and "queen" would be close in the embedding space, reflecting their semantic relationship.

Generating embeddings involves training the model on large text corpora. During training, the model learns to adjust the positions of tokens in the embedding space based on their co-occurrence and contextual usage. This training process enables the model to capture complex relationships and nuances in language.

Valmar · Valmar 2025-01-11, 03:36 AM

(2025-01-11, 02:31 AM)Sciborg_S_Patel Wrote: Posting this to provide a common reference frame for discussion:

Understanding the Core Components of LLMs: Vectors, Tokens, and Embeddings Explained

Vipin Jain

It's slightly annoying that even this explanation, they unwittingly smuggle in concepts of "understanding" and "semantics" even though the tokens explicitly have none of that in any sense. The AI salesman have distorted definitions so much that even experts are unknowingly confused.

Sci · Sci 2025-01-11, 04:26 AM

(2025-01-11, 03:36 AM)Valmar Wrote: It's slightly annoying that even this explanation, they unwittingly smuggle in concepts of "understanding" and "semantics" even though the tokens explicitly have none of that in any sense. The AI salesman have distorted definitions so much that even experts are unknowingly confused.

Worst might be “hallucination”, because it allows them to market their position on AI cognition instead of just calling them “errors”.

I actually think it would be more impressive and elucidating to talk about the human ingenuity behind AI rather than supposed cognition. I used to love the game Age of Mythology so when one of the programmers wrote about how the AI worked I felt a gratitude and admiration for how they try to fake the presence of a human player. (Admittedly the best players could easily kill off multiple AI players in AoM, but I never got to that level.)

What I guess we’d call weakly emergent outcomes are still highly impressive - as Penrose notes Mandelbrot didn’t even realize the full fractal nature of the Mandelbrot Set, he thought it was something going wrong with his computer display. Of course perhaps the most famous weakly emergent programming outcome might be Conway’s Game of Life.

I also like when Game AI can do what feels like novel behavior. I don’t think the imps who circled around me in Diablo were conscious, but it was cool how when I tried to attack one the attack animation gave the other time to strike me from behind.

sbu · sbu 2025-01-11, 11:08 AM

(2025-01-10, 07:59 PM)Sciborg_S_Patel Wrote: Ideally we'll have the ability to actually trace execution in the coming years, though I don't even know if LLM technology will be profitable given the potential diminishing returns and massive power consumption. Ideally the former pops the bubble, as I definitely don't want untrustworthy, and often mismanaged software companies to be in charge of power plants:

Why Big Tech is turning to nuclear to power its energy-intensive AI ambitions

In your hurricane of straw-man attacks - you are overlooking the many open-source LLMs (roughly on par with ChatGPT) available for execution tracing. Do report back with your findings!

Sci · Sci 2025-01-11, 07:04 PM

(2025-01-11, 11:08 AM)sbu Wrote: In your hurricane of straw-man attacks - you are overlooking the many open-source LLMs (roughly on par with ChatGPT) available for execution tracing. Do report back with your findings!

Lol…I thought since it was just statistics you were going to give us a demonstration by hand of how it works?

Anyway if you’re talking about Anthropic’s - or Inspectus’ - way of looking into LLMs, that’s still pretty coarse grained IMO. Though I do this think video from Anthropic further shows how nothing metaphysically significant is going on:

As has been pointed out a few times over, whether output is deliberate sabotage or an actual error would need a human to decide, because humans are the ones who project meaning on the physical movements of a computer.

sbu · sbu 2025-01-11, 07:34 PM

I rest my case here. I do enjoy debating with you.

Sci · Sci 2025-01-11, 07:57 PM

(2025-01-11, 07:34 PM)sbu Wrote: I rest my case here. I do enjoy debating with you.

I don't think you ever made a case, to be honest.

Sci · Sci 2025-01-11, 08:21 PM

The Artificial Intelligence illusion: How invisible workers fuel the "automated" economy

Uma Rani, Rishabh Kumar Dhir

Quote:From self-driving cars to virtual assistants, the AI industry thrives on data. This data needs to be meticulously labelled, categorised, and annotated. This requires human intelligence and labour – both of which still cannot be replaced by machines. Such tasks are often outsourced to crowdworkers on digital labour platforms or to Artificial Intelligence-Business Process Outsourcing (AI-BPO) companies. These platforms fragment complex tasks into microtasks and offer small payments for each completed task. Crowdworkers, whom are also known as invisible workers because they often work behind the scenes, are essential for training AI algorithms on several functions, such as text prediction and recognition of objects.

Similarly, virtual assistants, marketed as autonomous tools, often rely on invisible workers who may be transcribing audio, verifying the virtual assistant's understanding, or even performing tasks like scheduling meetings that AI may struggle with. Even sophisticated large language models with impressive capabilities rely heavily on human trainers to fine-tune their responses and mitigate biases, toxicity, and disturbing content. As a result, workers are routinely exposed to graphic violence, hate speech, child exploitation and other objectionable material. Such constant exposure can take a toll on their mental health and trigger post-traumatic stress disorder, depression, and reduced ability to feel empathy.

Sci · Sci 2025-01-11, 09:21 PM

(2025-01-11, 08:21 PM)Sciborg_S_Patel Wrote: The Artificial Intelligence illusion: How invisible workers fuel the "automated" economy

Uma Rani, Rishabh Kumar Dhir

Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

Kyle Orland

Quote:...The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding "seemingly relevant but ultimately inconsequential statements" to the questions. For this "GSM-NoOp" benchmark set (short for "no operation"), a question about how many kiwis someone picks across multiple days might be modified to include the incidental detail that "five of them [the kiwis] were a bit smaller than average."

Adding in these red herrings led to what the researchers termed "catastrophic performance drops" in accuracy compared to GSM8K, ranging from 17.5 percent to a whopping 65.7 percent, depending on the model tested. These massive drops in accuracy highlight the inherent limits in using simple "pattern matching" to "convert statements to operations without truly understanding their meaning," the researchers write.

In the example with the smaller kiwis, for instance, most models try to subtract the smaller fruits from the final total because, the researchers surmise, "their training datasets included similar examples that required conversion to subtraction operations." This is the kind of "critical flaw" that the researchers say "suggests deeper issues in [the models'] reasoning processes" that can't be helped with fine-tuning or other refinements...