I used to run sillyTavern extra w/ Cuda to use vectorization of old chats, I'd export them and strip out all formatting to simply leave the narrative text and nothing more! Vectorize them and have "relevant" portions into the RP. This was meant for my "Long-form" narratives that spanned over multiple chats (As I don't like doing RP much past 30-50k), I was content with it and found this to work decently enough.

But it's been a while now since I've done it and I see SillyTavern has a lot of different options. Is there anyone who has experimented with this and some thoughts? I assume Local option just uses CPU which takes time.. I'd like to avoid setting up Extras if I can, but if I have to I'd like to use a custom model rather then Extras default.