Chat GPT appears to hallucinate or outright lie about everything

Buttflapper@lemmy.world · 16 days ago

Chat GPT appears to hallucinate or outright lie about everything

db0@lemmy.dbzer0.com · edit-2 16 days ago

Do not expect anything factual from llms. This is the wrong use case. You can role play with them if you guide them sufficiently and they can help with sone tasks like programming if you already know what you want but want to save time writing it, but anything factual is out of their scope.

JustAnotherKay@lemmy.world · 16 days ago

If you already know what you want but want to save time writing it

IME, going to ChatGPT for code usually meant losing time, cause I’d go back and forth trying to get a usable snippet and it would just keep refactoring the same slop that didn’t work in its first attempt

db0@lemmy.dbzer0.com · 16 days ago

The free version is pretty braindead nowadays. Early on it was quite better.

thebestaquaman@lemmy.world · 15 days ago

In general I agree: ChatGPT sucks at writing code. However, when I want to throw together some simple stuff in a language I rarely write, I find it can save me quite some time. Typical examples would be something like

“Write a bash script to rename all the files in the current directory according to <pattern>”, “Give me a regex pattern for <…>”, or “write a JavaScript function to do <stupid simple thing, but I never bothered to learn JS>”

Especially using it as a regex pattern generator is nice. It can also be nice when learning a new language and you just need to check the syntax for something- often quicker than swimming though some Geeks4Geeks blog about why you should know how to do what you’re trying to do.

JustAnotherKay@lemmy.world · 14 days ago

Using an AI as a regex checker is so smart and I’m mad it never occured to me that it was possible lol. I’ve just been pouring over random forum posts for it

thebestaquaman@lemmy.world · 13 days ago

I’ve found that regex is maybe the programming-related thing GPT is best at, which makes sense given that it’s a language model, and regex is just a compact language with weird syntax for describing patterns. Translating between a description of a pattern in English and Regex shouldn’t be harder for that kind of model than any other translation so to speak.

CoggyMcFee@lemmy.world · 16 days ago

When I have it integrated into my development environment a la Copilot, predicting the next block of code I’m going to write (which I can use if it is relevant and ignore if not), I find it to be a huge timesaver.

amelia@feddit.org · 15 days ago

I disagree, at least as someone who knows some Python but isn’t a pro programmer, ChatGPT saves me tons of time when writing little scripts. I used it to write a little tool with a GUI that I now use all the time in like 3 hours which would have taken me days without ChatGPT.

oakey66@lemmy.world · 16 days ago

Same experience. It can serve as a starting point but usually I have to sift through so many bad answers until something usable is made available.

subignition@piefed.social · edit-2 16 days ago

They’re pretty reasonable for consensus-based programming prompts as well like “Compare and contrast popular libraries for {use case} in {language}” or “I want to achieve {goal/feature} in {summary of project technologies}, what are some ways I could structure this?”

Of course you still shouldn’t treat any of the output as factual without verifying it. But at least in the former case, I’ve found it more useful than traditional search engines to generate leads to look into, even if I discard some or all of the specific information it asserts

Edit: Which is largely due to traditional search engines getting worse and worse in recent years, sadly

vxx@lemmy.world · 16 days ago

I think we shouldn’t expect anything other than language from a language model.

Kazumara@discuss.tchncs.de · 16 days ago

It did not simply analyze the best type of graphics card for the situation.

Yes it certainly didn’t: It’s a large language model, not some sort of knowledge engine. It can’t analyze anything, it only generates likely text strings. I think this is still fundamentally misunderstood widely.

Red_October@lemmy.world · 15 days ago

Yeah? That’s… how LLMs work. It doesn’t KNOW anything, it’s a glorified auto-fill. It knows what words look good after what’s already there, it doesn’t care whether anything it’s saying is correct, it doesn’t KNOW if it’s correct. It doesn’t know what correct even is. It isn’t made to lie or tell the truth, those concepts are completely unknown to it’s function.

LLMs like ChatGPT are explicitly and only good at composing replies that look good. They are Convincing. That’s it. It will confidently and convincingly make shit up.

linearchaos@lemmy.world · 16 days ago

I don’t want to sound like an AI fanboy but it was right. It gave you minimum requirements for most VR games.

No man Sky’s minimum requirements are at 1060 and 8 gigs of system RAM.

If you tell it it’s wrong when it’s not, it will wake s*** up to satisfy your statement. Earlier versions of the AI argued with people and it became a rather sketchy situation.

Now if you tell it it’s wrong when it’s wrong, It has a pretty good chance of coming back with information as to why it was wrong and the correct answer.

VinS@sh.itjust.works · 15 days ago

Well I asked some questions yesterday about classes of DAoC game to help me choose a starter class. It totally failed there attributing skills to wrong class. When poking it with this error it said : you are right, class x don’t do Mezz, it’s the speciality of class Z.

But class Z don’t do Mezz either… I wanted to gain some time. Finally I had to do the job myself because I could not trust anything it said.

linearchaos@lemmy.world · 15 days ago

God I loved DAoC, Play the hell of it back in it’s Hey Day.

I can’t help but think it would have low confidence on it though, there’s going to be an extremely limited amount of training data that’s still out there. I’d be interested in seeing how well it fares on world of Warcraft or one of the newer final fantasies.

The problem is there’s as much confirmation bias positive is negative. We can probably sit here all day and I can tell you all the things that it picks up really well for me and you can tell me all the things that it picks up like crap for you and we can make guesses but there’s no way we’ll ever actually know.

VinS@sh.itjust.works · 15 days ago

I like it for brainstorming while debbuging, finding funny names, creating stories “where you are the hero” for the kids or things that don’t impact if it’s hallucinating . I don’t trust it for much more unfortunately. I’d like to know your uses cases where it works. It could open my mind on things I haven’t done yet.

DAoC is fun, playing on some freeshard (eden actually, started one week ago, good community)

linearchaos@lemmy.world · 15 days ago

No, you can’t trust AI or Google or anything else on the internet for the most part. It’s just a tool. AI is a little less trustworthy but still a useful tool if you wield it correctly.

some time passes

heh I think I found out the source of this particular issue. All the original content is gone and the Camelot herald wiki is incomplete. even a google search is turning up poor results.

We need to get something trained on archive.org :)

more time passes

hmm even digging around in archive.org that’s a hard one to find, classes.ofcamelot.com would have had it, but you have to dig through every class.

I think I had it on my old guild site, but it looks like even that it no longer archived.

so sad.

linearchaos@lemmy.world · 15 days ago

ere you are the hero” for the kids or things that don’t impact if it’s hallucinating . I don’t trust it for much more unfortunately. I’d like to know your uses cases where it works. It could open my mind on things I haven’t done yet.

DAoC is fun, playing on some freeshard (eden actually, started one week

It always seems to attract the nicest and best people.

I had switched to WoW by the time burning crusades picked up, might be worth a revisit one day if for no other reason than to take a tour :)

filister@lemmy.world · edit-2 16 days ago

And you as an analytics engineer should know that already? I am using some LLMs on almost a daily basis, Gemini, OpenAI, Mistral, etc. and I know for sure that if you ask it a question about a niche topic, the chances for the LLM to hallucinate are much higher. But also to avoid hallucinating, you can use different prompt engineering techniques and ask a better question.

Another very good question to ask an LLM is what is heavier one kilogram of iron or one kilogram of feathers. A lot of LLMs are really struggling with this question and start hallucinating and invent their own weird logical process by generating completely credibly sounding but factually wrong answers.

I still think that LLMs aren’t the silver bullet for everything, but they really excel in certain tasks. And we are still in the honeymoon period of AIs, similar to self-driving cars, I think at some point most of the people will realise that even this new technology has its limitations and hopefully will learn how to use it more responsibly.

bane_killgrind@slrpnk.net · 16 days ago

They seem to give the average answer, not the correct answer. If you can bound your prompt to the range of the correct answer, great

If you can’t bind the prompt it’s worse than useless, it’s misleading.

gravitas_deficiency@sh.itjust.works · 16 days ago

The “i” in LLM stands for intelligence

SuperSleuth@lemm.ee · 15 days ago

There’s no way they used Gemini and decided it’s better than GPT.

I asked Gemini: “Why can great apes eat raw meat but it’s not advised for humans?”. It said because they have a “stronger stomach acid”. I then asked “what stomach acid is stronger than HCL and which ones do apes use?”. And was met with the response: “Apes do not produce or utilize acids in the way humans do for chemical processes.”.

So I did some research and apes actually have almost neutral stomach acid and mainly rely on enzymes. Absolutely not trustworthy.

Daemon Silverstein@thelemmy.club · 15 days ago

use

I guess Gemini took the word “use” literally. Maybe if the word “have” would be used, it’d change the output (or, even better, “and which ones do apes’ stomachs have?” as “have” could imply ownership when “apes” are the subject for the verb).

Wren@lemmy.dbzer0.com · 15 days ago

Ok? I feel like people don’t understand how these things work. It’s an LLM, not a superintelligent AI. It’s not programmed to produce the truth or think about the answer. It’s programmed to paste a word, figure out what the most likely next word is, paste that word, and repeat. It’s also programmed to follow human orders as long as those order abide by its rules. If you tell it the sky is pink, then the sky is pink.

SPRUNT@lemmy.world · 15 days ago

Current AI is a glorified predictive text keyboard.

Wren@lemmy.dbzer0.com · 15 days ago

Exactly, it’s not something designed to output facts, it’s designed to output the most likely set of words.

mozz@mbin.grits.dev · 16 days ago

May I offer you a fairly convincing explanation

subignition@piefed.social · 16 days ago

This is the best article I’ve seen yet on the topic. It does mention the “how” in brief, but this analogy really explains the “why” Gonna bookmark this in case I ever need to try to save another friend or family member from drinking the Flavor-Aid

jeeva@lemmy.world · 16 days ago

I enjoyed reading this, thank you.

finitebanjo@lemmy.world · 16 days ago

For me it is stupid to expect these machines to work any other way. They’re literally designed such that they’re just guessing words that make sense in a context, the whole statement then assembled from these valid tokens sometimes checked again by… another machine…

It’s always going to be and always has been a bullshit generator.

QuentinQuiver@slrpnk.net · 16 days ago

You can use the RAG tactic to make it more useful. That involves starting with reputable sources as input, which creates an AI character that’s essentially supposed to be an expert in a certain topic.

The normal AI system is a scammer who tries to convince others to act like them… just like me and other internet trolls or crazy people. It needs some snark to act like a real person does, but pure snark is quite useless.

Essentially: nonsense in, nonsense out Or science books and journals in, sci fi speculation out

finitebanjo@lemmy.world · 16 days ago

No, again, because each word is a token which together makes a phrase and each phrase is a token that makes a statement. Since these Tokens are generated individually, it will never have any real underlying logic. It’s just sentence probability. Even if your sample data is free of nonsense, the LLM will still generate nonsense.

zbyte64@awful.systems · 15 days ago

RAG is a search engine that sometimes summarizes incorrectly and uses 10x the energy. Such a dumb product.

ITGuyLevi@programming.dev · 15 days ago

You’re taking the piss right? Those seem like perfectly reasonable responses.

What video card is required to use it? None, it can be used standalone.

What video card to use it streaming from your PC, at least a 580 sounds okay for some games. You seem to be expecting it to lie, and then inferring truthful information as a lie because the information you held back (which game you want) is the reason for the heavier video card requirement.

boatswain@infosec.pub · 16 days ago

This is why my most frequent use of it is brainstorming scenarios for my D&D game: it’s really good at making up random bullshit.

Blackdoomax@sh.itjust.works · 15 days ago

It struggles to make more than 3 different bedtime stories in a row for my son, and they are always badly written, especially the conclusion that is almost always the same. But at least their sillyness (especially Gemini) is funny.

boatswain@infosec.pub · 15 days ago

I absolutely agree that it can’t create finished content of any particular value. For my D&D use case, its value is instead as a brainstorming tool; it can churn out enough ideas quickly enough that it’s easy for me to find a couple of gems that I can polish up into something usable.

Christer Enfors@lemm.ee · 16 days ago

Yes. I’ve experimented with this too. This is the perfect use case for LLMs - there are no wrong answers, the LLM should just make something up, which is what it does.

ngwoo@lemmy.world · 15 days ago

OP those minimum requirements are taken directly from the Meta Quest 3 support page.

cheddar@programming.dev · 16 days ago

It’s incorrect to ask chatgpt such questions in the first place. I thought we’ve figured that out 18 or so months ago.

ABCDE@lemmy.world · 16 days ago

Why? It actually answered the question properly, just not to the OP’s satisfaction.

ramirezmike@programming.dev · 16 days ago

because it could have just as easily confidentiality said something incorrect. You only know it’s correct by going through the process of verifying it yourself, which is why it doesn’t make sense to ask it anything like this in the first place.

ABCDE@lemmy.world · 16 days ago

I mean… I guess? But the question was answered correctly, I was playing Beat Saber on my 1060 with my Vive and Quest 2.

ramirezmike@programming.dev · 15 days ago

It doesn’t matter that it was correct. There isn’t anything that verifies what it’s saying, which is why it’s not recommended to ask it questions like that. You’re taking a risk if you’re counting on the information it gives you.