>>106268760
ok. Will try if it refuses again. Didn't have the system prompt/prefill/fictional clause before.
I'm using default llama.cpp settings with temp at 0.6 and context size at 35k.
I'm also considering abliteration but couldn't find one for the full deepseek model. I'm also worried about how it may affect refusals that are in the japanese text.
Is there any better model for a RTX5090+Threadripper(768GB DDR5)? The 1t/s generation is kinda getting to me.