Because of different quantization. However, parameter count is generally the mor... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		int_19h on April 17, 2025 \| parent \| context \| favorite \| on: Microsoft researchers developed a hyper-efficient ... Because of different quantization. However, parameter count is generally the more interesting number so long as quantization isn't too extreme (as it is here). E.g. FP32 is 4x the size of 8-bit quant, but the difference is close to non-existent in most cases.

orbital-decay on April 17, 2025 [–]

>so long as quantization isn't too extreme (as it is here)

This is true for post-training quantization, not for quantization-aware training, and not for something like BitNet. Here they claim comparable performance per parameter count as normal models, that's the entire point.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact