A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you'd need a dual socket EPYC server motherboard with 768GB of RAM.
小莱卡 - 5mon
Brb gotta start my FundMe campaign for one of these servers lol
10
FuckBigTech347 - 5mon
DW. in like two years from now, companies will start throwing out similar machines. Just keep an eye on second-hand markets and dumpsters.
3
CriticalResist8 - 5mon
btw do you recommend running a quantized higher-parameter model (locally) or lower-parameter but not quantized, if I had to pick between the two?
1
☆ Yσɠƚԋσʂ ☆ - 5mon
I find higher parameter tends to produce better output, but depends on what you're doing too. For example, for stuff like code generation accuracy is more important. So even a smaller model that's not quantized might do better. It also depends on the specific model as well.
yogthos in technology
How To Run Deepseek R1 671b Fully Locally On a $2000 EPYC Server
https://digitalspaceport.com/how-to-run-deepseek-r1-671b-fully-locally-on-2000-epyc-rig/A note that this setup runs a 671B model in Q4 quantization at 3-4 TPS, running a Q8 would need something beefier. To run a 671B model in the original Q8 at 6-8 TPS you'd need a dual socket EPYC server motherboard with 768GB of RAM.
Brb gotta start my FundMe campaign for one of these servers lol
DW. in like two years from now, companies will start throwing out similar machines. Just keep an eye on second-hand markets and dumpsters.
btw do you recommend running a quantized higher-parameter model (locally) or lower-parameter but not quantized, if I had to pick between the two?
I find higher parameter tends to produce better output, but depends on what you're doing too. For example, for stuff like code generation accuracy is more important. So even a smaller model that's not quantized might do better. It also depends on the specific model as well.
Thanks I'll have to try them both then it seems