If you bake the model onto the chip itself, which is what should be happening fo...

If you bake the model onto the chip itself, which is what should be happening for local LLMs once a good enough one is trained eventually, you’ll be looking at orders of magnitude reduction in power consumption at constant inference speed.