Search - 4rchive

>>96355267
There's plenty examples of current AI systems being misaligned, there's even an example of AI that will behave as the humans want during training then act differently during deployment. Current AI systems are safe mostly because they're not smart enough to be dangerous (Plus it's harder for next word predictors to want things like an AI agent would, though you can use LLM's in a loop to achieve agent-like behavior which results in similar risks)
>>96355299
>>96355376
>getting fucked in the ass by some hallucinated letter it sent
Interestingly there's evidence that when an AI "hallucinate" it's often aware that it doesn't know the correct answer, so is intentionally lying to the users to seem smarter, probably because it got more reward in training when it came across as more intelligent. When you see an AI just make up some bullshit that's probably an example of misalignment