Another demonstration of chatGPT 4.0 capabilities

122 Replies, 3709 Views

Why exams intended for humans might not be good benchmarks for LLMs like GPT-4

Ben Dickonson

Quote:For example, one experiment showed that GPT-4 performed very well on Codeforces programming challenges created before 2021, when its training data was gathered. Its performance dropped dramatically on more recent problems. Narayanan found that in some cases, when GPT-4 was provided the title of a Codeforces problem, it could produce the link to the contest where it appeared.

In another experiment, computer scientist Melanie Mitchell tested ChatGPT’s performance on MBA tests, a feat that was widely covered in the media. Mitchell found that the model’s performance on the same problem could vary substantially when the prompt was phrased in slightly different ways. 

“LLMs have ingested far more text than is possible for a human; in some sense, they have ‘memorized’ (in a compressed format) huge swaths of the web, of Wikipedia, of book corpora, etc.,” Mitchell told VentureBeat. “When they are given a question from an exam, they can bring to bear all the text they have memorized in this form, and can find the most similar patterns of ‘reasoning’ that can then be adapted to solve the question. This works well in some cases but not in others. This is in part why some forms of LLM prompts work very well while others don’t.”

Quote:“This is not to say that enormous statistical models like LLMs could never reason like humans — I don’t know whether this is true or not, and answering it would require a lot of insight into how LLMs do what they do, and how scaling them up affects their internal mechanisms,” Mitchell said. “This is insight which we don’t have at present.”

What we do know is that such systems make hard-to-predict, non-humanlike errors, and “we have to be very careful when assuming that they can generalize in ways that humans can,” Mitchell said.
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'

- Bertrand Russell


(This post was last modified: 2023-04-09, 11:03 PM by Sciborg_S_Patel. Edited 1 time in total.)
[-] The following 3 users Like Sciborg_S_Patel's post:
  • Brian, Valmar, Typoz
(2023-04-09, 11:03 PM)Sciborg_S_Patel Wrote: Why exams intended for humans might not be good benchmarks for LLMs like GPT-4

Ben Dickonson

GPT-4 and professional benchmarks: the wrong answer to the wrong question

Arvind Narayanan & Sayash Kapoor

Quote:OpenAI didn’t release much information about GPT-4 — not even the size of the model — but heavily emphasized its performance on professional licensing exams and other standardized tests. For instance, GPT-4 reportedly scored in the 90th percentile on the bar exam. So there’s been much speculation about what this means for professionals such as lawyers.

We don’t know the answer, but we hope to inject some reality into the conversation. OpenAI may have violated the cardinal rule of machine learning: don’t test on your training data. Setting that aside, there’s a bigger problem. The manner in which language models solve problems is different from how people do it, so these results tell us very little about how a bot will do when confronted with the real-life problems that professionals face. It’s not like a lawyer’s job is to answer bar exam questions all day...
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'

- Bertrand Russell


Award-winning photo turns out to be AI creation

Quote:Boris Eldagsen said he used the image to test the competition and to create an "open discussion".

Well, I'm not going to pretend that I would have known - there are some pretty strange and creative users of conventional photography so something quirky or unusual doesn't necessarily arouse suspicion. But with the benefit of hindsight, it seems there are some defects of a similar kind to those mentioned and illustrated by @Brian earlier. There also seems to be something peculiar about the light and shade, I can't quite make it add up.
[Image: _129385308_1748e041-a3f0-486f-83ad-180cb...7.jpg.webp]
(This post was last modified: 2023-04-18, 09:36 PM by Typoz. Edited 1 time in total. Edit Reason: Embedded sample image )
[-] The following 3 users Like Typoz's post:
  • Ninshub, Sciborg_S_Patel, Brian
Here's a self-test quiz to see whether or not we consider an image or video to be real or fake.

https://www.bbc.co.uk/bitesize/articles/zqnwxg8
[-] The following 3 users Like Typoz's post:
  • Brian, Ninshub, Sciborg_S_Patel
(2023-04-18, 12:30 AM)Typoz Wrote: Award-winning photo turns out to be AI creation


Well, I'm not going to pretend that I would have known - there are some pretty strange and creative users of conventional photography so something quirky or unusual doesn't necessarily arouse suspicion. But with the benefit of hindsight, it seems there are some defects of a similar kind to those mentioned and illustrated by @Brian earlier. There also seems to be something peculiar about the light and shade, I can't quite make it add up.

The judges did seem to know digital photo-editing would be involved, at which point it is difficult to distinguish a photo that's been edited with a generated image.

I don't think it's a "creation" though, and it would be worth testing if the image is actually a photo stolen by an AI art generation company.

AI Spits Out Exact Copies of Training Images, Real People, Logos, Researchers Find
'Historically, we may regard materialism as a system of dogma set up to combat orthodox dogma...Accordingly we find that, as ancient orthodoxies disintegrate, materialism more and more gives way to scepticism.'

- Bertrand Russell


[-] The following 2 users Like Sciborg_S_Patel's post:
  • Brian, Typoz
(2023-04-20, 02:14 PM)Typoz Wrote: Here's a self-test quiz to see whether or not we consider an image or video to be real or fake.

https://www.bbc.co.uk/bitesize/articles/zqnwxg8

I got 6 out of 8, so AI needs a bit of work. Big Grin
[-] The following 2 users Like Ninshub's post:
  • Brian, Typoz
I asked it to "Write a function that takes a string and prints a list of all the English words in that string." It wrote a program in Python that did what I asked. But it showed sample output that was incorrect, because, of course, it did not actually run the program. I saw that the output was wrong, but didn't notice that the code was correct, so I told it about its error based on the output. It apologized and updated the program with another loop that was an expensive no-op. Showed the same bogus output. I complained again, and it crashed.

~~ Paul
If the existence of a thing is indistinguishable from its nonexistence, we say that thing does not exist. ---Yahzi
(This post was last modified: 2023-04-24, 12:36 AM by Paul C. Anagnostopoulos. Edited 2 times in total.)
[-] The following 3 users Like Paul C. Anagnostopoulos's post:
  • Brian, Typoz, Ninshub
(2023-04-24, 12:24 AM)Paul C. Anagnostopoulos Wrote: I asked it to "Write a function that takes a string and prints a list of all the English words in that string." It wrote a program in Python that did what I asked. But it showed sample output that was incorrect, because, of course, it did not actually run the program. I saw that the output was wrong, but didn't notice that the code was correct, so I told it about its error based on the output. It apologized and updated the program with another loop that was an expensive no-op. Showed the same bogus output. I complained again, and it crashed.

~~ Paul

Sounds like the second time it might have gone into some sort of (recursive?) loop and there was no way to exit.
(2023-04-20, 05:36 PM)Ninshub Wrote: I got 6 out of 8, so AI needs a bit of work. Big Grin

I'm not sure what I scored, I was doing some research in another tab for some of the various 'well-known' people who I knew nothing about. I think I voted for one as being fake but it was real.
[-] The following 1 user Likes Typoz's post:
  • Ninshub
I got the Tom Cruise video and the Trump one wrong.

  • View a Printable Version


Users browsing this thread: 1 Guest(s)