Qwen3.5-397B-A17B behaves more like a 17B parameter model. Omitting the MoE part...

BoorishBears · 2026-03-24T00:16:12 1774311372

It behaves more like a ~80B parameter model (geometric mean of active and total params), and has world knowledge closer to a 400B parameter model

There's no misleading here, they show every detail from model to quantization to that atrocious time to first token. Stuff like this feels more like code golf than anyone claiming the mainstream phone user is going to even download 100GB of model weights.

yencabulator · 2026-03-24T15:31:01 1774366261

I think we're using different meaning of "behaves like". I meant "has tokens/sec performance comparable to".

BoorishBears · 2026-03-25T13:05:09 1774443909

I'm using model performance because inference is definitely not comparable to a 17B model when you're streaming model weights on and off disk storage.