Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Qwen3.5-397B-A17B behaves more like a 17B parameter model. Omitting the MoE part from the headline makes it a lie and stupid hype.

Quantizing is also a cheat code that makes the numbers lie, next up someone is going to claim running a large model when they're running a 1-bit quantization of it.



It behaves more like a ~80B parameter model (geometric mean of active and total params), and has world knowledge closer to a 400B parameter model

There's no misleading here, they show every detail from model to quantization to that atrocious time to first token. Stuff like this feels more like code golf than anyone claiming the mainstream phone user is going to even download 100GB of model weights.


I think we're using different meaning of "behaves like". I meant "has tokens/sec performance comparable to".


I'm using model performance because inference is definitely not comparable to a 17B model when you're streaming model weights on and off disk storage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: