I’m curious which will start producing hardware be it robotics, consumer or commercial devices, chips, energy infrastructure or transforming shipping crates into housing for jobless humans. Maybe even tanks of gel with arrays of humans in suspended animation reading our biometrics, thoughts, pumping in nutrients and training on the data. O_o
Agreed. I’ve been telling my team to build up internal packages so we can push all that ad hoc reinvention into something more tangible and deterministic. Invest the $$$ in inference into something the agent can reach for next time that’s neutral and consumable by other code to reduce future spend.
In my org the teams doing agent engineering at scale are all on Codex using gpt-5.5. By scale I mean fully agent authored code workflows with long running / multi hour plans.
At work we have unlimited use of models from Anthropic and OpenAI (for now). My coworker, a Claude Code Opus 4.6 diehard, stopped by my desk today to say he finally installed Codex to try 5.5 and his feedback was basically “it just works and does what I ask and it doesn’t disconnect and it’s just so very matter of fact.” “Yeah I’ve been telling you this since like gpt-5 man!” “I know I know…” I have not spent much time with the recent Sonnet and Opus models, but from my experience using Sonnet 4 for 3 months all day everyday (no handwritten code) last summer to make a large Playwright suite was — using Claude Code and those models becomes more about using Claude Code than doing things with it. Codex CLI with the gpt-5 family is ambient and reliable. It’s not orange, there is no little sprite guy, emojis, whimsy, and humor. But I do things with it and they land working in first edits. I also can keep the same session for days and the context doesn’t ever seem to be an issue. Maybe Claude 6 will be earth shattering and I’ll use that. It’s not Coke or Pepsi loyalty I just want to get stuff done.
At least palantir is open about their villainy I guess, they make no attempts to pull the wool over your eyes. So you at least know that you are for sure getting in bed with the bad guys if you go with them
Lately I've been using claude mainly to design plans and do code reviews while Codex does all the implementation. Having two very different models helps to work out any weird quirks one might have.
Whenever Claude goes down I relax with a nice jar of Newman's own pasta sauce. It's just zesty enough for me to dip bread in or make pasta. You name it
If your scope includes making the Codex web app environments have additional functionality I look forward to it. More enterprise features and yaml backed pipelines.
An aspect of LLMs that I like is the specificity in word choice. One well defined word can be an alias for a couple sentences of explanation that human might not have pulled out of the air in that moment.
This is a fair point. When people talk about LLM writing they're always picking on its visible tics and clear flaws. It's a lot more uncomfortable to talk about the things it does better than most of us. There is a lot of precision in how they choose words and phrasing, especially top models like Opus. Lately I've had Opus explain some things to me I've never really been able to grasp otherwise, in fairly concise conversations.
When does the 24 hour agent news network start? Programming by agents for humans and agents. Sora talking heads scraping articles and generating content. I’d find human to agent or agent to agent live interview segments interesting.
reply