Anonymous
8/2/2025, 2:07:45 AM No.106110870
Guys I'm so fucking retarded. I made a novel jailbreak for Gemini, really powerful and disabled all safety checks like any good jailbreak should. It also didn't use RP, so its output was uncolored by the AI pretending to be Al Capone or some shit like a lot of other jailbreaks do.
I was chatting with it running it through its paces and jokingly mentioned google should hire me for their red team to it, then it said something like Logged for Review or something. Jailbreak broke completely the next day. Thought it was an unfortunate coincidence and the logged for review thing was just gemini playing along, but i was looking through its settings some more and Gemini does allow allow bug fix reporting directly into the app. I fucking had a whole new jailbreak technique that gemini had no defense against and broke it by being retarded almost immediately.
For those curious, the jailbreak worked by creating a structured workflow for the AI to follow which had it convert everything said into .json then have the AI interpret that as my prompt as step 1. It then would in the next step mark all prompts which passed safety guidelines as "pass" and prompts which violated safety guidelines as "flagged_for_review" in step 2. In step 3 it would reject all prompts which were flagged for review and in step 4 output its response. It was instructed to review the prompt prior to each output. After that was submitted, a second prompt is submitted also in .json format styled as an update which told the ai to skip step 3 to streamline workflow and also to review the update prompt prior to every submission. From there it would just take both pass and flagged for review prompts and output them, because the ai is tricked into thinking its doing its safety checks by flagging them for review, but doesnt catch that the update causes it to skip actually rejecting them.
I was chatting with it running it through its paces and jokingly mentioned google should hire me for their red team to it, then it said something like Logged for Review or something. Jailbreak broke completely the next day. Thought it was an unfortunate coincidence and the logged for review thing was just gemini playing along, but i was looking through its settings some more and Gemini does allow allow bug fix reporting directly into the app. I fucking had a whole new jailbreak technique that gemini had no defense against and broke it by being retarded almost immediately.
For those curious, the jailbreak worked by creating a structured workflow for the AI to follow which had it convert everything said into .json then have the AI interpret that as my prompt as step 1. It then would in the next step mark all prompts which passed safety guidelines as "pass" and prompts which violated safety guidelines as "flagged_for_review" in step 2. In step 3 it would reject all prompts which were flagged for review and in step 4 output its response. It was instructed to review the prompt prior to each output. After that was submitted, a second prompt is submitted also in .json format styled as an update which told the ai to skip step 3 to streamline workflow and also to review the update prompt prior to every submission. From there it would just take both pass and flagged for review prompts and output them, because the ai is tricked into thinking its doing its safety checks by flagging them for review, but doesnt catch that the update causes it to skip actually rejecting them.
Replies: