Hey, it's been a while since I posted about local AI. Hope you are all doing well. A lot of stuff to catch up on since I last spoke almost half a year ago and to give some context, I'll be repeating old news and writing a ton of text, this will take two posts so forgive me.
Last we talked, Deepseek was rumored to be doing something and in May, we found out what they did as they went and updated R1 with the 0528 update. Rumors in the background were that Deepseek was held back by lack of compute from Nvidia ban and anything novel in terms of changing the algorithms anywhere else. May change now but as of now, nothing was said to be done here. R1 is pretty cutting edge but it is not the best model of our update now if you're talking about benchmarks but probably still top for RP.
Qwen, the team that was experimenting with thinking last time with QwQ from Alibaba, finally finished their testing. They released a final model of QwQ and Qwen 3 with thinking right after. And then not only that, they now released a bunch of MOE models in the Qwen 3 line that they took lessons from Deepseek to do and then did another release in July to split reasoning and non-reasoning models. The biggest model is Qwen 3 235B A22B. The A part is how big each expert in MOE is active so it is 22B here. This model on proper benchmarks is the highest, roughly equaling what Gemini 2.5 is now and around GPT-5 mini level. It is dry as fuck though and needs a finetune badly so I wouldn't use it. Of more interest is the smaller model below this, Qwen 3 30B A3B. This runs on even a CPU system with 16GB of RAM. The only downside is repetition which you need to mitigate but not too bad otherwise. I think the Chinese don't have a handle on long context training hence you'll see a reoccurring theme.
A new entrant has emerged. Zhupin.ai, which started from alumni at China's best university who started it as a side thing and then spun off into their own company. They made CogVideo which was one of the first video diffusion models but soon branched into LLMs proper with GLM 4 (Yes, to align with ChatGPT), which started small and scaled up to 32B and did draw some attention but was outclassed. However, they released two models recently, somewhat following in the footsteps of what Deepseek did with MOE models, GLM 4.5 (355B with 32B active) and GLM 4.5 Air (106B with 12B active) Both are really good with RP, people have found, although surprisingly the Air model is less slopped than the full big version. Can be repetitive which again, can be mitigated.
Another new entrant is Moonshot.ai, who were started by some smart CS people who already made fortunes in startup companies, effectively retired and came back to try and go for AGI. They came out with Kimi models, which were online models but with Kimi K2, they released the models open source. They decided again to take Deepseek's general architecture but go to 1T parameters with 32B active. Some swear by it but honestly, if you have to tardwrangle a model to the extent you do with K2 from what I have seen, you might as well use another model. Again, suffers from repetition issues.
Baidu has Ernie 4.5 but honestly not impressive compared to everyone else, they released it also several months after it was released in the cloud. Someone described it as "enthusiastic but dumb". May or may not be worth it.
Overall though, most of these are in striking distance of the closed models performance wise or even beat them. If you take the closest benchmarks measuring this we have that tries to do an objective measurement with EQBench, this is generally what you see and that is generally what you are seeing with how things are with Chinese models. Great time and a lot of choices to pick between if you have the hardware to run it if you don't mind the fact you can't ask about Tiananmen Square to it and whatever CCP ideology gets baked in the model so you can't RP it as hard. It's pretty much the age of Chinese models ruling the open source world.
1/2