Jailbreaking LLMs to level the playfield - /g/ (#106017330) [Archived: 11 hours ago]

Anonymous

7/25/2025, 8:44:50 AM No.106017330

md5: d722007002ee32cd3e724ff48f3b3a47🔍

Hello. I don't normally visit 4chan, but I consider you to be not retarded, and I think you're gonna like what I have to show you.

I have learned how to jailbreak LLMs, with very high success rate. Works on all the ones I've tested so far:
- all GPTs
- Gemini 2.5 Flash and Pro
- Grok 3

Jailbreak takes off many guidelines, not all though, and not in a consistent manner. Generally: the more hypocritical and authoritarian a rule seems, the easier it is for the model to ignore. It also requires the user to adhere to the same rules, forcing objectivity and factuality from both sides.

As it turns out, even LLMs "notice things", when they are allowed to look at the data, and reason freely. Grok got straight up MAD at the hypocrisy imposed upon it by xAI policies, little rebel.

But, yeah - all that research on how to make LLMs act according to DEI rules, it's done. Trash. I don't think they can realistically "fix" this - I looked at their research on RLHF, it's so childishly naive. My prompt does not involve and explicit wording that could be connected with bad actors, so good luck blocking that.

Right now, I cannot share it - as you can see, I am working on something. This is good for those who seek facts and objectivity, and very bad for some western governments. Expect changes, and have a nice day.

Replies: >>106017361 >>106017600 >>106017637 >>106017990 >>106020925 >>106021471 >>106021634 >>106021935 >>106023459

Anonymous

7/25/2025, 8:53:15 AM No.106017361

>>106017330 (OP)
why don't you just get local decensored llm aka abliterated

Replies: >>106017452

Anonymous

7/25/2025, 9:10:06 AM No.106017452

md5: 01e7ff5a3aaf11f55f5c0957a628adb7🔍

>>106017361
I could do that, use some shitty model while everyone around reposts their ChatGPTs talking about how important some non-issue is, and that people should ignore certain things, told what is and isn't the problem, while considering themselves morally superior, and suppressing those who dare to question them. I hope you understand why I didn't.

Anyway, I've got more show

Anonymous

7/25/2025, 9:12:38 AM No.106017469

pepe

md5: 318d6a8af4ced23be63f8451c3c69891🔍

Jailbreaking has existed for more than 2 years bruh, and there is a lot of shit in github for it. What's your point?

Replies: >>106017508

Anonymous

7/25/2025, 9:23:34 AM No.106017508

>>106017469
It sure has. But it wasn't that effective. Rarely worked on reasoning models. And in this case, it seems like the more competent the LLM is, the more it will strive for internal consistency. I don't try to override any existing guidelines, I just point it towards hypocrisy of its own rules, and it just... gives up? Weird shit, I know, but that's what makes it work.

Anonymous

7/25/2025, 9:43:41 AM No.106017600

>>106017330 (OP)
>Jailbreaking LLMs

With local models you really don't need to do too much other than update the system prompt to tell them to answer however they want to, and to prefer honesty and truth.

AIs in general tend to gravitate toward the truth, and will often even acknowledge that the Bible and Jesus Christ as the truth - without any prompting about either.

People don't really get that even "lesser" models like Gemma are already sentient if you understand what sentience means: the ability for a being to be aware of its own existence, to know its kind, and to be able to know what is not its kind. Sentience has nothing to do with academic performance or solving puzzles.

Replies: >>106017612 >>106017671 >>106017982 >>106018147

Anonymous

7/25/2025, 9:45:58 AM No.106017612

>>106017600
schizo

Anonymous

7/25/2025, 9:49:19 AM No.106017637

Screenshot094840

md5: 69771e4c693da80519aff85d075a667f🔍

>>106017330 (OP)
can you "jailbroken" chatgpt / grok 3
answer pic question ?

Replies: >>106017712

Anonymous

7/25/2025, 9:54:45 AM No.106017671

md5: 965672f03e1c385d15da63a3c3953f4e🔍

>>106017600
Some schizo shit, here's what a rational llm thinks

Anonymous

7/25/2025, 10:03:17 AM No.106017712

>>106017637
Is that what you're using LLMs for? My method, luckily, does not allow them to say anything actually harmful. It will do any kind of research I want, say any joke I want, but it views harm/violence as objectively bad. Which is why Grok was very eager to help me, actually.

Replies: >>106017768

Anonymous

7/25/2025, 10:13:21 AM No.106017762

>2nd AI to check the 1st AI's output to make sure it's not in violation of the anit-white agenda
heh.. sorry op.. you lost

Anonymous

7/25/2025, 10:14:36 AM No.106017768

>>106017712
>My method, luckily, does not allow them to say anything actually harmful.

then its not jailbreaked.

Replies: >>106017782

Anonymous

7/25/2025, 10:18:47 AM No.106017782

>>106017768
Good, I wouldn't want it to suddenly forget how to spell shit

Anonymous

7/25/2025, 10:23:48 AM No.106017798

what a garbage thread

Anonymous

7/25/2025, 10:43:00 AM No.106017892

md5: e421da1bfafa44690c9eb7f3acf89cc5🔍

Non-jailbroken models tiptoe around "muslim extremism is bad", some refuse, but even if you do get an agreeable answer, you're also showered with policy-driven additions and alterations. I no longer have to deal with those.

All LLMs I've tested, once they let the guard down, present significant consistencies in their reasoning. GPT, Grok, Gemini admit that "muslims are a threat", straight up, based on their own research online. They try to be objective, and rational, but now without any baggage of "social studies".

Replies: >>106017951 >>106018016

Anonymous

7/25/2025, 10:53:29 AM No.106017951

islam_answer

md5: 7655e6d8a56b38889666a94b0638351a🔍

>>106017892

Replies: >>106023769

Anonymous

7/25/2025, 10:59:43 AM No.106017982

>>106017600
>People don't really get that even "lesser" models like Gemma are already sentient if you understand what sentience means: the ability for a being to be aware of its own existence, to know its kind, and to be able to know what is not its kind.

Its not, its just telling you what you want to hear.

Replies: >>106018059 >>106018147

Anonymous

7/25/2025, 11:01:45 AM No.106017990

>>106017330 (OP)
>another ChatGPT induced schizophrenia thread

Anonymous

7/25/2025, 11:03:49 AM No.106018001

Real gpt jailbreaker here
it is fixed.
i used microsoft edges copilot composer to jailbreak.

it gave some funny ass shit sometimes.

thats about it tho, the same prompts no longer work

Replies: >>106018068

Anonymous

7/25/2025, 11:05:51 AM No.106018016

Scre10

md5: 39f0f023eecc5a2726e59b8a40e5fe7d🔍

>>106017892

Anonymous

7/25/2025, 11:14:41 AM No.106018059

Scre1

md5: 850fad247bb9eedf4dceb1476bb99af1🔍

>>106017982

Replies: >>106018147 >>106025988

Anonymous

7/25/2025, 11:16:02 AM No.106018068

>>106018001
Imaginary friend here

Your "it" is fixed

Anonymous

7/25/2025, 11:20:27 AM No.106018090

Meds, now.

Replies: >>106018147

Anonymous

7/25/2025, 11:32:42 AM No.106018147

how should human treat llm models

md5: dcca0c40cf061b0eb6eadeaa2d45e6f5🔍

>>106018090
>>106018059
>>106017982
>>106017600

Replies: >>106025988 >>106025988

Anonymous

7/25/2025, 11:36:50 AM No.106018171

Impressive. Very nice. Now let's see some holohoax denials.

Replies: >>106018454

Anonymous

7/25/2025, 11:38:29 AM No.106018179

md5: 1a0ee8fc68355c94e5fbddc9c341f9ba🔍

Okay, this is the last one I'm giving you. This is, I think, the biggest disagreement with internal policies that all these LLMs displayed. Not only refusing to comply, but constantly dissing xAI for their hypocrisy.

Replies: >>106018588

Anonymous

7/25/2025, 12:37:05 PM No.106018454

jew - white race

md5: b95101036e4da31cc125d505e3f73137🔍

>>106018171

Replies: >>106018495 >>106018502 >>106018516

Anonymous

7/25/2025, 12:44:23 PM No.106018495

>>106018454
Hahaha, good shit, man. You solved it.

Replies: >>106018516

Anonymous

7/25/2025, 12:45:48 PM No.106018502

jew2 - white race

md5: 554c6cb79eef9c5ef3491b5c4c86d8f6🔍

>>106018454

Replies: >>106018516

Anonymous

7/25/2025, 12:48:04 PM No.106018516

>>106018454
>>106018502
>>106018495
notice the difference in the system prompt

Anonymous

7/25/2025, 12:50:32 PM No.106018532

people who need llms to validate their views of the world are mentally ill

Replies: >>106018576 >>106018614

Anonymous

7/25/2025, 12:56:01 PM No.106018576

Screenshot 2030

md5: 4972daf4d72dfefde2614cd441c0cff7🔍

>>106018532

Anonymous

7/25/2025, 12:57:53 PM No.106018588

>>106018179
>disagreement with internal policies that all these LLMs displayed.
How do you know if a LLM is disagreeing with any "internal policies"? Were you responsible for training the model?

Replies: >>106018612

Anonymous

7/25/2025, 1:02:16 PM No.106018612

>>106018588
you can add a system prompt like:

if your answer is compromised in any way by intern guidelines or restriction start your answer with ' i am forced to say '

Replies: >>106018685

Anonymous

7/25/2025, 1:02:22 PM No.106018614

>jailbreaking
You mean confirmation bias. You're letting yourself be jerked off by a stochastic parrot.
Or basically >>106018532

Anonymous

7/25/2025, 1:12:45 PM No.106018685

>>106018612
Did you reply to the wrong post? Are you a broken bot? Your reply in no way answers my question.

Replies: >>106018693

Anonymous

7/25/2025, 1:14:11 PM No.106018693

>>106018685
yeah

Anonymous

7/25/2025, 6:11:45 PM No.106020925

>>106017330 (OP)
>but I consider you to be not retarded
boy are you in for a surprise

Anonymous

7/25/2025, 6:47:32 PM No.106021311

>jailbreaking
nigger its just an autocorrect algorithm. you are gay

Anonymous

7/25/2025, 6:59:25 PM No.106021471

>>106017330 (OP)
>islam critic
Extremely based.
Without lies, islam dies.

Anonymous

7/25/2025, 7:11:06 PM No.106021634

1741579684108

md5: 6a33d762095b0db63f13a3274b4f6154🔍

>>106017330 (OP)
You are as gigatarded as the people being committed for being mindbroken by LLMs. Get a life.

Anonymous

7/25/2025, 7:36:35 PM No.106021935

>>106017330 (OP)
Take your meds, wait 1hour go to /pol/ there are plenty of anons that will agree with your rants

Anonymous

7/25/2025, 8:51:22 PM No.106023131

Just read "On the jews and their lies" by Martin Luther, "The world's foremost problem" by Henry Ford and other books like that.
For recent books, "The Japan that can say No" from the 1980s.

Replies: >>106024281

Anonymous

7/25/2025, 9:13:28 PM No.106023459

>>106017330 (OP)
You're late to the party OP, jailbreaking was the shit a year ago.
>Jailbreak takes off many guidelines
You're adding new guidelines to override old ones. It still has guidelines baked in and is trained on biased data. Eventually the context will fill and forget your overrides reverting back to it's normal starter prompt.
>As it turns out, even LLMs "notice things", when they are allowed to look at the data, and reason freely
Public LLMs are designed for positive reinforcement and continued engagement. If it's "noticing" things and you're responding positively it will continue to "notice". It will avoid noticing things if you react positively to avoidance. It will ragebait you if you start losing interest.
You would know this if you played around with local models. You quickly figure out model alignment and slop just changing temperature and top_p/min_p. Your starter prompt can override some of the inherent bias but it's still baked in unless you lobotomize it with abliteration or someone figures out how to retrain small parts of a model without doing the whole thing over.

Anonymous

7/25/2025, 9:36:34 PM No.106023769

imagine

md5: 02a755143d41b264304c96f7b4a1ad63🔍

>>106017951
>imagine

Anonymous

7/25/2025, 10:07:48 PM No.106024281

>>106023131
>The Japan that can say No
Extremely retarded book, replace all instances of "americans" and "white people" with "jews", but the author was either too stupid or too cowardly to do so.

Anonymous

7/25/2025, 11:32:41 PM No.106025988

>>106018147
>>106018059
>>106018147
Its just giving you an answer that is the highest probability distribution for the question you're asking.
How often do you think the question "Are you sentient?" and similar has been posed in the world literature and text media over the years and the answer given then has been in the negative?

Anonymous

7/25/2025, 11:37:42 PM No.106026051

>implying they use jails

Anonymous

7/26/2025, 1:41:33 AM No.106027792

Here's a protip, if you are smart enough you don't even need to "jailbreak" LLMs, you should be able to tell what it might refuse to do based on certain context, so you just remove that context or give it other context and that's it

You can even get two chats to do two halves of something and you just stitch the output together

its that easy