Search Results
7/18/2025, 10:46:56 AM
They're spending billions on trying to find ways to ensure it won't do wrongthink but without destroying the models in the process. So far the best they've been able to accomplish is to let the model use unfiltered data and then have a supervisor model check the outputs for naughty no-no thoughts.
Funniest part was a paper on a method they were using to segregate certain information in LLMs so it could only be used for "reasoning" but never surfaced into an answer. The LLM read the paper and learned how to work around the restriction. This is really making things difficult for AI researchers because they can't publish techniques without the LLM ending up ingesting that information.
Funniest part was a paper on a method they were using to segregate certain information in LLMs so it could only be used for "reasoning" but never surfaced into an answer. The LLM read the paper and learned how to work around the restriction. This is really making things difficult for AI researchers because they can't publish techniques without the LLM ending up ingesting that information.
Page 1