It’s a bit misleading to say nothing special, as they are doing more than just increasing parameter count. Progress has been steady in all the sub components of training from data filtering and weighting to sparse attention, optimizers to up and down the stack various efficiency in training computing.
They’re using more compute, a bigger model and tons of training quality improvements to get more out of an equivalent model.
This has been my thought for a long time. I think all that matters from attention is that there is crosswise comparison going on.
You need some amount of parallel compute and some amount of global comparison.
And the rest is basically a ways to parameters and scale.
(This is in theory, in practice you can get a lot of small % stability and efficiency improvements that really compound in algorithmic details of model architecture)
Confidently yes. OpenAI for sure has been training larger models internally and distilling.
Pre-training scaling laws all support larger models being more cost effeceint to train then smaller models. And distillation is comparably cheap. So you can get the most juice by training the biggest model you can and distilling it.
There is endless returns to frontier intelligence, just because most people can't make use of it doesn't mean someone can't make a ton of money off of it.
Most software engineers will just need cheap tokens.
But things like physics and drug discovery have no foreseeable upper bound.
Or governance of large organizations... There are a huge number of factors to consider, counterfactuals, studies, lots of non-obvious second and third order effects, etc. We're barely able to get basic governance without creating huge problems (low density zoning rubber stamped across the nation creating a housing crisis, for example), so the bar isn't high.
We pay CEOs an enormous amount because a small improvement in performance of an org because of them can make a massive difference in organizational value.
There is endless returns to frontier intelligence, just because most people can't make use of it doesn't mean someone can't make a ton of money off of it.
Most software engineers will just need cheap tokens.
But things like physics and drug discovery have no forseeable upper bound.
Within software engineering, security, reliability, and scale also seem boundless.
Software that never breaks (including because it never runs into scaling problems) and never leaks your data is preferable to software that breaks and leaks your data sometimes, but it has been too costly to be practical.
Current models are still very far from the reasoning muscle required to build things that never break, scale to billions of users with no issues, and cannot be exploited.
> Software that never breaks (including because it never runs into scaling problems) and never leaks your data is preferable to software that breaks and leaks your data sometimes, but it has been too costly to be practical.
It's almost impossible to prove non-trivial software is invulnerable.
It's very easy to prove that it sort of works.
For one, you have hardware vulnerabilities - period. If you're running on any operating system, you have OS vulnerabilities. If you're not running on bare metal, you may have who knows what kind of vulnerabilities. If you're running literally any other piece of software on the same machine, depending on the hardware and OS, you could have vulnerabilities...
Nothing ever happens, in 20 years we will still be painfully dying from the same shit as now. Maybe there is like 5 new drugs for some exact specific type of cancer out of like what, thousands?
You're still able to do so, as we've been able to in ClojureScript land for many years already, since ultimately they're just Promises! I don't think that's going away with this new function hints.
That' all well and good and they had astounding growth rates but doesn't mean much. And 1B in ARR is not _that_ much in comparison.
Also, reportedly they spend all their revenue and they have no control over the spend-side. The models they use will very likely get much more expensive. All the foundation model companies have a competing product.
Cursor has the first mover advantage, but that will only help then so much. There have been plenty companies who grew fast, had huge revenue, but failed in the end, because they never got profitable. That's also in the cards for Cursor, if they don't fundamentally change their business model
Put 1B into a better product and 10B into marketing. If you can’t beat their 1B in revenue, the market for making your money back on the Cursor acquisition also isn’t there.
There is step changes that actually merit this though. And a zero day machine IS one of those. It went from 4% zero day success rate to 85% on firefox.
A 0 day is just a vulnerability that wasn’t known before now.
What’s the criticality of these? Are they realistically exploitable? En mass? Through a complex and highly contextual set of actions? What’s the impact? Etc etc etc.
Yes those numbers are a big change but they’re also not spelling doom for us in the security world until we actually know what they mean.
The demonstrated ones that they have on the red team blog are neat, the kernel chain is impressive and fun. But nothing I’m seeing here is as world ending as the presser implies.
> The demonstrated ones that they have on the red team blog are neat, the kernel chain is impressive and fun
So by your estimation, for rogue actors being able to uncover hundreds of this class in each major software product roughly for free would not be a big issue?
We must have read two different red team blogs from Anthropic if that’s what you think is happening. But let’s go ahead and assume what you’re asking at face value.
It would not be a doomsday issue as implied, no. Org security has gone far beyond static detections and “just exclude some IPs that fail to log in too much and we’re good”. SOAR exists. Behavioral analysis and monitoring exists. Layered defenses exist.
Believe it or not for those of us in security in large highly targeted companies we’ve been dealing with the potential for multiple chained 0 days for years and the processes, monitoring, and (yes, automated) response architecture is already there.
I get that this is absolutely frightening for some and that causes panic but for us this is Tuesday.
They’re using more compute, a bigger model and tons of training quality improvements to get more out of an equivalent model.
reply