>>715680439
"Good shit" is highly subjective, but in my opinion, anything below 70 billion parameters is a toy novelty, while 70B and above can be used how you would expect. 70B is when you can give rules in the context and a model will keep them factored in accurately the whole way, setting up things like dice rolls, systems for <--AI-decided notes as hidden comments-->>, etc.. 70B is also the very rock bottom of the good stuff.

Having the hardware to run 70B locally isn't common. At bare minimum, you'd want around 100GB of regular RAM (like 48GB x 2). That will be very slow but give decent generations every time you check back. This can be improved further with beefy GPUs with high amounts of VRAM, and especially so with multiple GPUs (most programs, like kobold, can share GPU VRAM). Models can be split over RAM and VRAM, with VRAM being 10x faster so you try to fill it as much as possible before RAM. 70B is good for local, but of course higher like 125B, 150B, 225B can be even better, and it'll always be your own and free to use (after the build costs).

Or you can pay for services to handle all the hardware for you so you can enjoy high beak models without needing a better build. Corpo services will have hardware beyond anything a hobbyist can afford and give you the very best of what's available, but it comes with subscription prices, gen limits, and lack of any privacy. By definition, "the good shit" can only be from these corpo services.

Finally, if you're an absolute poor, someone with no chance of even $200 for 100GB of RAM for hardware and no realistic way to pay a subscription, you can try to slip through the cracks by using any kind of trial or free daily limits for generations and just use that. Other people would recommend [latest big-news 7B model] run locally, but I don't recommend it. 70B is a watershed moment in terms of coherence and ability to follow rules, which is everything in this tech and what separates it from a novelty.