I'm surprised people are surprised. Of course this is possible, and of course th...

the__alchemist · 2026-02-22T12:53:30 1771764810

I believe this is a CPU/GPU vs ASIC comparison, rather than CPU vs GPU. They have always(ish) coexisted, being optimized for different things: ASICs have cost/speed/power advantages, but the design is more difficult than writing a computer program, and you can't reprogram them.

Generally, you use an ASIC to perform a specific task. In this case, I think the takeaway is the LLM functionality here is performance-sensitive, and has enough utility as-is to choose ASIC.

RobotToaster · 2026-02-22T14:05:25 1771769125

It reminds me of the switch from GPUs to ASICs in bitcoin mining. I've been expecting this to happen.

yunohn · 2026-02-22T17:48:11 1771782491

But the BTC mining algorithm has not and will not change. That’s the only reason ASICs atleast make a bit of sense for crypto.

AI being static weights is already challenged with the frequent model updates we already see - but may even be a relic once we find a new architecture.

fxnn · 2026-02-22T19:41:20 1771789280

We can expect the model landscape to consolidate some day. Progress will become slower, innovations will become smaller. Not tomorrow, not next year, but the time will come.

And then it'll increasingly make sense to build such a chip into laptops, smartphones, wearables. Not for high-end tasks, but to drive the everyday bread-and-butter tasks.

yunohn · 2026-02-22T20:41:18 1771792878

The world continues to evolve, in a way that requires flexibility - not more constraints. I just fail to see a future where we want less general purpose computers, and more hard-wired ones? Would be interesting to be proven wrong though!

dzhiurgis · 2026-02-23T21:04:28 1771880668

TPU usb-c dongle is less than $100 (widely used for detecting people in home assistant / frigate nvr camera feeds). If one-off $100 purchase can replace (and improve 10x by speed) anthropic subscription even for 12 months - I don't see why not.

dangus · 2026-02-22T19:00:04 1771786804

Sounds to me like there’s potential to use these for established models to provide cost/scale advantage while frontier models will run in the existing setup.

yunohn · 2026-02-22T19:13:55 1771787635

IME llama et all require LoRA or fine-tuning to be usable. That's their real value vs closed source massive models, and their small size makes this possible, appealing, and doable on a recurring basis as things evolve. Again, rendering ASICs useless.

fxnn · 2026-02-22T19:33:39 1771788819

Read the blog post. It mentions that their chip has a small SRAM which can store LoRA.

yunohn · 2026-02-22T20:43:45 1771793025

Neither the blog nor Taalas' original post specify what speed to expect when using the SRAM in conjunction with the baked-in weights? To be taken seriously, that is really necessary to explain in detail, than a passing mention.

hkt · 2026-02-22T17:05:42 1771779942

Heh, I said this exact thing in another thread the other day. Nice to see I wasn't the only one thinking it.

GTP · 2026-02-22T13:15:34 1771766134

The middle ground here would be an FPGA, but I belive you would need a very expensive one to implement an LLM on it.

dogma1138 · 2026-02-22T13:41:59 1771767719

FPGAs would be less efficient than GPUs.

FPGAs don’t scale if they did all GPUs would’ve been replaced by FPGAs for graphics a long time ago.

You use an FPGA when spinning a custom ASIC doesn’t makes financial sense and generic processor such as a CPU or GPU is overkill.

Arguably the middle ground here are TPUs, just taking the most efficient parts of a “GPU” when it comes to these workloads but still relying on memory access in every step of the computation.

jgalt212 · 2026-02-22T14:57:57 1771772277

I thought it was because the number logic elements in a GPU is orders of magnitude higher than in a FPGA, rather than just processing speed. And GPU processing is inherently parallel so the GPU beats the FPGA just based on transistor count.

dogma1138 · 2026-02-24T18:12:23 1771956743

With FPGA you are sacrificing performance for flexibility you are far less efficient in transistors for any given task than with a dedicated ASIC even if it’s a general compute ASIC like a GPU is today.

The reason no one is building large FPGAs is that there is no market for them.

If an H200 scale FPGA was viable we would have one.

JKCalhoun · 2026-02-22T13:16:04 1771766164

"This has been demonstrated already…"

I think burning the weights into the gates is kinda new.

("Weights to gates." "Weighted gates"? "Gated weights"?)

Zetaphor · 2026-02-22T16:00:07 1771776007

Is this not effectively the same thing as a Bitcoin ASIC?

brookst · 2026-02-22T15:53:36 1771775616

Geights? Wates?

learn_more · 2026-02-22T16:43:15 1771778595

gweights

dogma1138 · 2026-02-22T13:47:01 1771768021

Not really new, this is 80’s-90’s Neuron MOS Transistor.

It’s also not that different than how TPUs work where they have special registers in their PEs for weights.

IshKebab · 2026-02-22T12:56:01 1771764961

> Because we did this exact same transition from running in software to largely running in hardware for all 2D and 3D Computer Graphics.

We transitioned from software on CPUs to fixed GPU hardware... But then we transitioned back to software running on GPUs! So there's no way you can say "of course this is the future".

rembal · 2026-02-22T16:05:36 1771776336

It's not certain this is the future: the obvious trade off is lack of flexibility: not only when a new model comes out, but also varying demand in the data centers - one day people want more LLM queries, another day more diffusion queries. Aaand, this blocks the holly grail of self improving models, beyond in-context learning. A realistic use case? More efficient vision based drone targeting in Ukraine/Taiwan/ whatevers next. That's the place where energy efficiency, processing speed, and also weight is most critical. Not sure how heavy ASICS are though, bit they should be proportional to the model size. I heard many complaints about onboard AI 'not being there yet', and this may change it. Not listing middle east as there is no serious jamming problem there.

darkwater · 2026-02-22T16:30:29 1771777829

In a not-too-distant future (5 years?) small LLMs will be good enough to be used as generic models for most tasks. And if you have a dedicated ASIC small enough to fit in an iPhone, you have a truly local AI device with the bonus point that you get something really new to sell in every new generation (i.e. acces to an even more powerful model)

wmf · 2026-02-22T18:58:56 1771786736

The Taalas approach is much more expensive than the NPU that phones already have.

slow_typist · 2026-02-22T19:59:52 1771790392

Yes but not in five years. The chips will be dirt cheap by then. We‘ll get “intelligent” washing machines that will discuss the amount of detergent and eventually berate us. Toasters with voice input. And really annoying elevators. Also bugs that keep an extremely low RF profile (only phoning home when the target is talking business).

wmf · 2026-02-22T20:43:04 1771792984

No, Taalas requires more silicon which will always cost more than storing weights in DRAM.

throwthrowuknow · 2026-02-22T16:59:19 1771779559

it doesn’t need to go in the phone if it only takes a few milliseconds to respond and is cheap

yunwal · 2026-02-22T19:06:19 1771787179

Perceptible latency is somewhere between 10 and 100ms. Even if an LLM was hosted in every aws region in the world, latency would likely be annoying if you were expecting near-realtime responses (for example, if you were using an llm as autocomplete while typing). If, say, apple had an LLM on a chip any app could use some SDK to access, it could feasibly unlock a whole bunch of usecases that would be impractical with a network call.

Also, offline access is still a necessity for many usecases. If you have something like an autocomplete feature that stops working when you're on the subway, the change in UX between offline and online makes the feature more disruptive than helpful.

https://www.cloudping.co/

hamdingers · 2026-02-22T17:07:37 1771780057

It does if you care about who can access to your tokens

luckydata · 2026-02-22T16:22:02 1771777322

It doesn't have be to true for all models to be useful. Thinking about small models running on phones or edge devices deployed in the field that would be a perfect use case for a "printed model".

iugtmkbdfil834 · 2026-02-22T17:29:31 1771781371

The real benefit, to a very particular type of mind, is that the alignment will be baked in ( presumably a lot robust than today ) and wrongthink will be eliminated once and for all. It will also help flagging anyone, who would need anything as dangerous as custom, uncensored models. Win/win.

To your point, its neat tech, but the limitations are obvious since 'printing' only one LLM ensures further concentration of power. In other words, history repeats itself.

pwarner · 2026-02-22T15:11:59 1771773119

I'd be kind of shocked if Nvidia isn't playing with this.

I don't expect it's like super commercially viable today, but for sure things need to trend to radically more efficient AI solutions.

saati · 2026-02-22T16:22:16 1771777336

These are chips that become e-waste the second a better a model comes out, and nvidia is already limited by TSMC capacity.

hamdingers · 2026-02-22T17:17:57 1771780677

This is a ridiculous mindset. Llama 3.1 8B can do lots of things today and it'll still be able to do those things tomorrow.

If you baked one of these into a smart speaker that could call tools to control lights and play music, it will still be able to do that when Llama 4 or 5 or 6 comes out.

bigyabai · 2026-02-22T18:21:49 1771784509

If you pay $1,500 for a Mistral ASIC that is beaten by a $15 Qwen ASIC that comes out six months later, you'd be feeling pretty dang ridiculous.

hamdingers · 2026-02-22T18:28:30 1771784910

I'm equally capable of making up numbers to support my perspective but I don't see the point.

bigyabai · 2026-02-22T18:34:12 1771785252

The point is that the GP's mindset is not very ridiculous if you value things by a price/utility ratio. Software and hardware advancements will lead to buyer's remorse faster than people get an ROI from local inference.

darkwater · 2026-02-22T19:13:20 1771787600

SW and HW advancements will bring this topic in the "good enough for vast majority" field, thus making GP point moot. You don't care if your LLM ASIC chip is not the latest one because it works for the use you purchased it for. The highly dynamical nature of LLM itself will make part of the advantage of upgradable software not that interesting anymorw. [1]

[1] although security might be a big enough reason for upgrades to still be required

dzhiurgis · 2026-02-23T21:08:20 1771880900

I'd pay for $100 chip that replaces anthropic sub and works 10x faster, even for 12 months.

Edit: assuming model owners will let this happen, which they wont

sowbug · 2026-02-22T17:54:04 1771782844

They'll be perfect for an appliance like the Rick and Morty butter robot.

cyanydeez · 2026-02-22T16:46:06 1771778766

Only in VC backed funding land.

In the real world, theres talking refrigerators who dont need to know how to recite shakespeare.

HPsquared · 2026-02-22T17:00:36 1771779636

On the upside, Shakespeare isn't going to change soon.

MarsIronPI · 2026-02-22T17:54:47 1771782887

So you're saying we should burn Shakespeare onto a chip? /s

throwthrowuknow · 2026-02-22T16:57:17 1771779437

these aren’t made for general chatbot use

MarsIronPI · 2026-02-22T17:54:02 1771782842

Doesn't Google have custom TPUs that are kind of a halfway point between Taalas' approach and a generic GPU? I wonder if that kind of hardware will reach consumers. It probably will, though as I understand them NPUs aren't quite it.

theptip · 2026-02-22T16:20:32 1771777232

Are people surprised?

I think the interesting point is the transition time. When is it ROI-positive to tape out a chip for your new model? There’s a bunch of fun infra to build to make this process cheaper/faster and I imagine MoE will bring some challenges.

dyauspitr · 2026-02-22T17:57:09 1771783029

Job specific ASICs are are “old as time.”