>>107113749
RAM is only useful if you have really fast connections to the CPU. Most boards don't. Recent Mac's sport integrated RAM which is maybe twice as fast as normal RAM, but still no where near GPU RAM. This enables them to run really big models, but not fast.
The only other boards where this is a plausible option for running large models via CPU are Epyk (?) server motherboards, which also sport fancy server CPUs. But that's very expensive, and I it makes no sense to buy DDR4 for this purpose.
>>107114040
Loading large models into RAM is indeed a CPU AI strategy. It used to be that people would run a stack of 3090s to run large local AI (70B).
But recently, there have been no 70B monolithic models released. Instead, MoE has enabled these models to run as smaller models (3/7/14B), and the big monolithic models are 100s of billions of parameters. Nobody can use the big models at home without running a dozen hefty cards which require so much electricity their house has to be rewired.
That leaves people running small and MoE, which turn out to be pretty good. Most people buy a couple cards, like 2 5090s for 64GB VRAM, and run a local MoE with a lot of context. But that setup can't run the big models.
That's where the CPU option comes in. The idea is to get enough RAM to load the entire model and run it on a CPU, slowly being better than nothing.
But only a retard would buy DDR4 for this purpose. The whole point is that you need the fasted RAM you can get, or your token rate will be 1 token an hour.
If you want to run large models in RAM and you don't mind it being a bit slow, Mac is probably the way to go, unless you have money for Epyk.
Or feel free to spend 100K on a huge stack of RX6000s.