I just got sillytavern working the other day in combination with LM Studio. Am I missing out by only using models I can host myself? I've already got a 5090 anyway, and I've been pretty impressed so far. I feel like I've only scrached the surface. I've grabbed and tried the following models:
> Valkyrie-49B v2 Q4_K_M - A little slow, but pretty great results.
> Impish_Nemo_12B-Q8_HA - A little disappointing. I got a lot of repetition with this on recommended settings, but it may have been the card I was using.
> MN-12B-Mag-Mell-f16 - My second favorite. Runs fast and outputs a comfortable amount of text with good variety.
> PaintedFantasy-Visage-v3-34B-Q6_K_L - My current favorite. Good speed of responses, and seems to be significantly less prone to looping and repetition than the other models. Only downside is that it spits out really long segments, but I've gotten used to stopping and editing it if I want to reply.