52mon0yogthos in technologyNvidia breakthrough gives 4-bit pretraining technique the accuracy of FP8https://arxiv.org/abs/2509.25149NVIDIA just trained a 12B-parameter language model on 10 trillion tokens entirely in 4-bit precision. Here’s why this matters: NVFP4 delivers 2–3× faster math throughput and 50% less memory vs FP8 Accuracy? Practically identical. (MMLU-Pro: FP8 = 62.62%, NVFP4 = 62.58%) Stability issues have been solved using Random Hadamard transforms, stochastic rounding, and 2D scaling This is the first successful demonstration of large-scale 4-bit pretraining without losing accuracy. The next generation of frontier models will be faster, cheaper, without compromise.
yogthos in technology
Nvidia breakthrough gives 4-bit pretraining technique the accuracy of FP8
https://arxiv.org/abs/2509.25149NVIDIA just trained a 12B-parameter language model on 10 trillion tokens entirely in 4-bit precision.
Here’s why this matters:
This is the first successful demonstration of large-scale 4-bit pretraining without losing accuracy.
The next generation of frontier models will be faster, cheaper, without compromise.