>>509874252ML engineer here. The only ways you can use to direct a LLM to not go to certain sensible topics are the following:
- You curate your data carefully to remove references to the data you dont want it to be trained on.
- You give it a permanent context with structured rules stating that it will not answer anything regarding those topics.
- You insert huge volumes of synthetic data "debunking" the information so the model spews favorable propaganda on the topic in question.
The reality is that no one knows how to directly and reliably ban the access to certain kind of information on these kind of models once they have been trained on that. That fact, along with the fact that the model is trained in on real time interactions between people make the task even more absurdly complex, given that a model of these characteristics is basically an ever changing mathematical function, and to solve this issue you want to ban certain results from an ever changing codomain in real time.