Because of different quantization. However, parameter count is generally the more interesting number so long as quantization isn't too extreme (as it is here). E.g. FP32 is 4x the size of 8-bit quant, but the difference is close to non-existent in most cases.
>so long as quantization isn't too extreme (as it is here)
This is true for post-training quantization, not for quantization-aware training, and not for something like BitNet. Here they claim comparable performance per parameter count as normal models, that's the entire point.