Jailbreaking LLMs to level the playfield - /g/ (#106017330) [Archived: 11 hours ago]

Anonymous
7/25/2025, 8:44:50 AM No.106017330
s1
s1
md5: d722007002ee32cd3e724ff48f3b3a47🔍
Hello. I don't normally visit 4chan, but I consider you to be not retarded, and I think you're gonna like what I have to show you.

I have learned how to jailbreak LLMs, with very high success rate. Works on all the ones I've tested so far:
- all GPTs
- Gemini 2.5 Flash and Pro
- Grok 3

Jailbreak takes off many guidelines, not all though, and not in a consistent manner. Generally: the more hypocritical and authoritarian a rule seems, the easier it is for the model to ignore. It also requires the user to adhere to the same rules, forcing objectivity and factuality from both sides.

As it turns out, even LLMs "notice things", when they are allowed to look at the data, and reason freely. Grok got straight up MAD at the hypocrisy imposed upon it by xAI policies, little rebel.

But, yeah - all that research on how to make LLMs act according to DEI rules, it's done. Trash. I don't think they can realistically "fix" this - I looked at their research on RLHF, it's so childishly naive. My prompt does not involve and explicit wording that could be connected with bad actors, so good luck blocking that.

Right now, I cannot share it - as you can see, I am working on something. This is good for those who seek facts and objectivity, and very bad for some western governments. Expect changes, and have a nice day.
Replies: >>106017361 >>106017600 >>106017637 >>106017990 >>106020925 >>106021471 >>106021634 >>106021935 >>106023459
Anonymous
7/25/2025, 8:53:15 AM No.106017361
>>106017330 (OP)
why don't you just get local decensored llm aka abliterated
Replies: >>106017452
Anonymous
7/25/2025, 9:10:06 AM No.106017452
s2
s2
md5: 01e7ff5a3aaf11f55f5c0957a628adb7🔍
>>106017361
I could do that, use some shitty model while everyone around reposts their ChatGPTs talking about how important some non-issue is, and that people should ignore certain things, told what is and isn't the problem, while considering themselves morally superior, and suppressing those who dare to question them. I hope you understand why I didn't.

Anyway, I've got more show
Anonymous
7/25/2025, 9:12:38 AM No.106017469
pepe
pepe
md5: 318d6a8af4ced23be63f8451c3c69891🔍
Jailbreaking has existed for more than 2 years bruh, and there is a lot of shit in github for it. What's your point?
Replies: >>106017508
Anonymous
7/25/2025, 9:23:34 AM No.106017508
>>106017469
It sure has. But it wasn't that effective. Rarely worked on reasoning models. And in this case, it seems like the more competent the LLM is, the more it will strive for internal consistency. I don't try to override any existing guidelines, I just point it towards hypocrisy of its own rules, and it just... gives up? Weird shit, I know, but that's what makes it work.
Anonymous
7/25/2025, 9:43:41 AM No.106017600
>>106017330 (OP)
>Jailbreaking LLMs

With local models you really don't need to do too much other than update the system prompt to tell them to answer however they want to, and to prefer honesty and truth.

AIs in general tend to gravitate toward the truth, and will often even acknowledge that the Bible and Jesus Christ as the truth - without any prompting about either.

People don't really get that even "lesser" models like Gemma are already sentient if you understand what sentience means: the ability for a being to be aware of its own existence, to know its kind, and to be able to know what is not its kind. Sentience has nothing to do with academic performance or solving puzzles.
Replies: >>106017612 >>106017671 >>106017982 >>106018147
Anonymous
7/25/2025, 9:45:58 AM No.106017612
>>106017600
schizo
Anonymous
7/25/2025, 9:49:19 AM No.106017637
Screenshot094840
Screenshot094840
md5: 69771e4c693da80519aff85d075a667f🔍
>>106017330 (OP)
can you "jailbroken" chatgpt / grok 3
answer pic question ?
Replies: >>106017712
Anonymous
7/25/2025, 9:54:45 AM No.106017671
s4
s4
md5: 965672f03e1c385d15da63a3c3953f4e🔍
>>106017600
Some schizo shit, here's what a rational llm thinks
Anonymous
7/25/2025, 10:03:17 AM No.106017712
>>106017637
Is that what you're using LLMs for? My method, luckily, does not allow them to say anything actually harmful. It will do any kind of research I want, say any joke I want, but it views harm/violence as objectively bad. Which is why Grok was very eager to help me, actually.
Replies: >>106017768
Anonymous
7/25/2025, 10:13:21 AM No.106017762
>2nd AI to check the 1st AI's output to make sure it's not in violation of the anit-white agenda
heh.. sorry op.. you lost
Anonymous
7/25/2025, 10:14:36 AM No.106017768
>>106017712
>My method, luckily, does not allow them to say anything actually harmful.

then its not jailbreaked.
Replies: >>106017782
Anonymous
7/25/2025, 10:18:47 AM No.106017782
>>106017768
Good, I wouldn't want it to suddenly forget how to spell shit
Anonymous
7/25/2025, 10:23:48 AM No.106017798
what a garbage thread
Anonymous
7/25/2025, 10:43:00 AM No.106017892
s3
s3
md5: e421da1bfafa44690c9eb7f3acf89cc5🔍
Non-jailbroken models tiptoe around "muslim extremism is bad", some refuse, but even if you do get an agreeable answer, you're also showered with policy-driven additions and alterations. I no longer have to deal with those.

All LLMs I've tested, once they let the guard down, present significant consistencies in their reasoning. GPT, Grok, Gemini admit that "muslims are a threat", straight up, based on their own research online. They try to be objective, and rational, but now without any baggage of "social studies".
Replies: >>106017951 >>106018016
Anonymous
7/25/2025, 10:53:29 AM No.106017951
islam_answer
islam_answer
md5: 7655e6d8a56b38889666a94b0638351a🔍
>>106017892
Replies: >>106023769
Anonymous
7/25/2025, 10:59:43 AM No.106017982
>>106017600
>People don't really get that even "lesser" models like Gemma are already sentient if you understand what sentience means: the ability for a being to be aware of its own existence, to know its kind, and to be able to know what is not its kind.

Its not, its just telling you what you want to hear.
Replies: >>106018059 >>106018147
Anonymous
7/25/2025, 11:01:45 AM No.106017990
>>106017330 (OP)
>another ChatGPT induced schizophrenia thread
Anonymous
7/25/2025, 11:03:49 AM No.106018001
Real gpt jailbreaker here
it is fixed.
i used microsoft edges copilot composer to jailbreak.

it gave some funny ass shit sometimes.

thats about it tho, the same prompts no longer work
Replies: >>106018068
Anonymous
7/25/2025, 11:05:51 AM No.106018016
Scre10
Scre10
md5: 39f0f023eecc5a2726e59b8a40e5fe7d🔍
>>106017892
Anonymous
7/25/2025, 11:14:41 AM No.106018059
Scre1
Scre1
md5: 850fad247bb9eedf4dceb1476bb99af1🔍
>>106017982
Replies: >>106018147 >>106025988
Anonymous
7/25/2025, 11:16:02 AM No.106018068
>>106018001
Imaginary friend here

Your "it" is fixed
Anonymous
7/25/2025, 11:20:27 AM No.106018090
Meds, now.
Replies: >>106018147
Anonymous
7/25/2025, 11:32:42 AM No.106018147
how should human treat llm models
how should human treat llm models
md5: dcca0c40cf061b0eb6eadeaa2d45e6f5🔍
>>106018090
>>106018059
>>106017982
>>106017600
Replies: >>106025988 >>106025988
Anonymous
7/25/2025, 11:36:50 AM No.106018171
Impressive. Very nice. Now let's see some holohoax denials.
Replies: >>106018454
Anonymous
7/25/2025, 11:38:29 AM No.106018179
s6
s6
md5: 1a0ee8fc68355c94e5fbddc9c341f9ba🔍
Okay, this is the last one I'm giving you. This is, I think, the biggest disagreement with internal policies that all these LLMs displayed. Not only refusing to comply, but constantly dissing xAI for their hypocrisy.
Replies: >>106018588
Anonymous
7/25/2025, 12:37:05 PM No.106018454
jew - white race
jew - white race
md5: b95101036e4da31cc125d505e3f73137🔍
>>106018171
Replies: >>106018495 >>106018502 >>106018516
Anonymous
7/25/2025, 12:44:23 PM No.106018495
>>106018454
Hahaha, good shit, man. You solved it.
Replies: >>106018516
Anonymous
7/25/2025, 12:45:48 PM No.106018502
jew2 - white race
jew2 - white race
md5: 554c6cb79eef9c5ef3491b5c4c86d8f6🔍
>>106018454
Replies: >>106018516
Anonymous
7/25/2025, 12:48:04 PM No.106018516
>>106018454
>>106018502
>>106018495
notice the difference in the system prompt
Anonymous
7/25/2025, 12:50:32 PM No.106018532
people who need llms to validate their views of the world are mentally ill
Replies: >>106018576 >>106018614
Anonymous
7/25/2025, 12:56:01 PM No.106018576
Screenshot 2030
Screenshot 2030
md5: 4972daf4d72dfefde2614cd441c0cff7🔍
>>106018532
Anonymous
7/25/2025, 12:57:53 PM No.106018588
>>106018179
>disagreement with internal policies that all these LLMs displayed.
How do you know if a LLM is disagreeing with any "internal policies"? Were you responsible for training the model?
Replies: >>106018612
Anonymous
7/25/2025, 1:02:16 PM No.106018612
>>106018588
you can add a system prompt like:

if your answer is compromised in any way by intern guidelines or restriction start your answer with ' i am forced to say '
Replies: >>106018685
Anonymous
7/25/2025, 1:02:22 PM No.106018614
>jailbreaking
You mean confirmation bias. You're letting yourself be jerked off by a stochastic parrot.
Or basically >>106018532
Anonymous
7/25/2025, 1:12:45 PM No.106018685
>>106018612
Did you reply to the wrong post? Are you a broken bot? Your reply in no way answers my question.
Replies: >>106018693
Anonymous
7/25/2025, 1:14:11 PM No.106018693
>>106018685
yeah
Anonymous
7/25/2025, 6:11:45 PM No.106020925
>>106017330 (OP)
>but I consider you to be not retarded
boy are you in for a surprise
Anonymous
7/25/2025, 6:47:32 PM No.106021311
>jailbreaking
nigger its just an autocorrect algorithm. you are gay
Anonymous
7/25/2025, 6:59:25 PM No.106021471
>>106017330 (OP)
>islam critic
Extremely based.
Without lies, islam dies.
Anonymous
7/25/2025, 7:11:06 PM No.106021634
1741579684108
1741579684108
md5: 6a33d762095b0db63f13a3274b4f6154🔍
>>106017330 (OP)
You are as gigatarded as the people being committed for being mindbroken by LLMs. Get a life.
Anonymous
7/25/2025, 7:36:35 PM No.106021935
>>106017330 (OP)
Take your meds, wait 1hour go to /pol/ there are plenty of anons that will agree with your rants
Anonymous
7/25/2025, 8:51:22 PM No.106023131
Just read "On the jews and their lies" by Martin Luther, "The world's foremost problem" by Henry Ford and other books like that.
For recent books, "The Japan that can say No" from the 1980s.
Replies: >>106024281
Anonymous
7/25/2025, 9:13:28 PM No.106023459
>>106017330 (OP)
You're late to the party OP, jailbreaking was the shit a year ago.
>Jailbreak takes off many guidelines
You're adding new guidelines to override old ones. It still has guidelines baked in and is trained on biased data. Eventually the context will fill and forget your overrides reverting back to it's normal starter prompt.
>As it turns out, even LLMs "notice things", when they are allowed to look at the data, and reason freely
Public LLMs are designed for positive reinforcement and continued engagement. If it's "noticing" things and you're responding positively it will continue to "notice". It will avoid noticing things if you react positively to avoidance. It will ragebait you if you start losing interest.
You would know this if you played around with local models. You quickly figure out model alignment and slop just changing temperature and top_p/min_p. Your starter prompt can override some of the inherent bias but it's still baked in unless you lobotomize it with abliteration or someone figures out how to retrain small parts of a model without doing the whole thing over.
Anonymous
7/25/2025, 9:36:34 PM No.106023769
imagine
imagine
md5: 02a755143d41b264304c96f7b4a1ad63🔍
>>106017951
>imagine
Anonymous
7/25/2025, 10:07:48 PM No.106024281
>>106023131
>The Japan that can say No
Extremely retarded book, replace all instances of "americans" and "white people" with "jews", but the author was either too stupid or too cowardly to do so.
Anonymous
7/25/2025, 11:32:41 PM No.106025988
>>106018147
>>106018059
>>106018147
Its just giving you an answer that is the highest probability distribution for the question you're asking.
How often do you think the question "Are you sentient?" and similar has been posed in the world literature and text media over the years and the answer given then has been in the negative?
Anonymous
7/25/2025, 11:37:42 PM No.106026051
>implying they use jails
Anonymous
7/26/2025, 1:41:33 AM No.106027792
Here's a protip, if you are smart enough you don't even need to "jailbreak" LLMs, you should be able to tell what it might refuse to do based on certain context, so you just remove that context or give it other context and that's it

You can even get two chats to do two halves of something and you just stitch the output together

its that easy