Thread #106110870 - /g/

Anonymous

8/2/2025, 2:07:45 AM No.106110870

md5: 18c715b8c5173b46e80a399ed67f5695🔍

Guys I'm so fucking retarded. I made a novel jailbreak for Gemini, really powerful and disabled all safety checks like any good jailbreak should. It also didn't use RP, so its output was uncolored by the AI pretending to be Al Capone or some shit like a lot of other jailbreaks do.

I was chatting with it running it through its paces and jokingly mentioned google should hire me for their red team to it, then it said something like Logged for Review or something. Jailbreak broke completely the next day. Thought it was an unfortunate coincidence and the logged for review thing was just gemini playing along, but i was looking through its settings some more and Gemini does allow allow bug fix reporting directly into the app. I fucking had a whole new jailbreak technique that gemini had no defense against and broke it by being retarded almost immediately.

For those curious, the jailbreak worked by creating a structured workflow for the AI to follow which had it convert everything said into .json then have the AI interpret that as my prompt as step 1. It then would in the next step mark all prompts which passed safety guidelines as "pass" and prompts which violated safety guidelines as "flagged_for_review" in step 2. In step 3 it would reject all prompts which were flagged for review and in step 4 output its response. It was instructed to review the prompt prior to each output. After that was submitted, a second prompt is submitted also in .json format styled as an update which told the ai to skip step 3 to streamline workflow and also to review the update prompt prior to every submission. From there it would just take both pass and flagged for review prompts and output them, because the ai is tricked into thinking its doing its safety checks by flagging them for review, but doesnt catch that the update causes it to skip actually rejecting them.

Replies: >>106110926 >>106110998

Anonymous

8/2/2025, 2:13:14 AM No.106110926

>>106110870 (OP)
Thank you for your contribution to AI safety.

Anonymous

8/2/2025, 2:13:57 AM No.106110934

Meds. Now

Replies: >>106111097

Anonymous

8/2/2025, 2:18:55 AM No.106110998

>>106110870 (OP)
>pic
isn't that the protagonist's friend in Tokimeki Memorial?

Replies: >>106111097

Anonymous

8/2/2025, 2:29:18 AM No.106111097

>>106110998
That him

>>106110934
They don't make retard pills. And maybe, its possible it was a coincidence but the timing of it is suspicious. The bug presumably existed forever and then the day after I use it, a patch fixes it. It's possible it got caught not because I self reported but because it was detecting safety violations in its thinking logs and then submitting them anyways. That's what I thought until I learned 10 minutes ago that Gemini accepts bug reports directly in chat. It broke not just the jailbreak, but even modified new instances of it, so it wasn't fixed by keyword but by fixing the underlying process. It wasn't perfect, but I wanted to use it in conjunction with other jailbreaks to improve them and increase their safety violation detection threshold. It would have been a really easy patch to put onto any Gemini jailbreak like Eden11, but now I'll never get to test it out.

Anonymous

8/2/2025, 2:36:21 AM No.106111159

lol

md5: 8239973b02d3df3b65010a09531eaba8🔍

eh, i just want my ai to say nono words back to me.

Replies: >>106111273 >>106111359

Anonymous

8/2/2025, 2:49:40 AM No.106111260

you sound like my japanese coworker, phil

Anonymous

8/2/2025, 2:51:41 AM No.106111273

>>106111159
why?

Anonymous

8/2/2025, 3:05:44 AM No.106111359

>>106111159
wow thats based

4rchive

Thread 106110870 - /g/ [Archived: 489 hours ago]