Can anyone explain to a non-server person what Oxide hopes to accomplish? Is it ...

p_l · on Nov 30, 2021

Their target market is essentially private cloud space combined with turnkey rack space - a pretty common on-premise setup where you order not individual servers, but complete racks that are supposed to be "plug and play" in your DC (in practice YMMV, spent a month fighting combination of manglement and Rackable mess).

You can think in this case of the final product as pretty big hypervisor cluster that is delivered complete. I'll admit more than once I'd kill for that kind of product, and I suspect that the price/performance ratio might be actually pretty good.

The operating system in this case is used for internal service processor bits (compare: Minix 3 on Intel ME, whatever was that proprietary RTOS on AMD's PSP, etc. etc) that help keep the whole thing running and ship shape.

_alex_ · on Nov 30, 2021

Bingo. My guess is this is for control plane microcontrollers

hinkley · on Nov 30, 2021

Brian has also complained in interviews about how many microcontrollers are already on your motherboard and how few of them Linux really controls. It's all proprietary and god knows what's actually running in there (and how many bugs and security vulnerabilities they have).

None of those are a great situation for multitenant scenarios.

This doesn't have to be control plane only. It could also be IO subsystems.

p_l · on Nov 30, 2021

That said, Oxide doesn't get as much control in hands of owner as Raptor, but Raptor doesn't provide high integration rack like that :<

steveklabnik · on Nov 30, 2021

It is primarily being used for the root of trust as well as our service processor, aka "totally not a BMC."

ksec · on Nov 30, 2021

>a pretty common on-premise setup

I have been wondering if it will become a thing in some cloud hosting services as well. I guess we need to see their pricing.

p_l · on Nov 30, 2021

Depending on market, I would be totally unsurprised to see some cloud providers using turnkey racks (though they might usually have nicer deals with places like quanta), and oxide could definitely strike some contracts there, though the question is how it would mesh with the existing setup

panick21_ · on Nov 30, 2021

Just to be clear, this OS Hubris is for the service processor. Its an OS for firmware, not the main OS that will run on the CPU.

However they will likely ship with a something derived from Illumos and bhyve hypervisor. You can then provision VM threw the API (and likely integrated with tools like terraform or whatever). You will likely not interact directly with Illumos.

Its basically attempt to help people make running data-center easier.

detaro · on Nov 30, 2021

Pretty much "let's redo datacenter hardware from the ground up for current requirements, cutting off legacy things we don't need anymore"

jeffbee · on Nov 30, 2021

But the BMC would be the #1 item on my list of "things I don't need any more". How do you come up with a scratch legacy-free universe that still includes BMCs?

p_l · on Nov 30, 2021

Because BMC is a term for function (which turns out to be very useful and important) not a specific technology (I like the tongue in cheek "totally not a BMC" used by some people from Oxide)

detaro · on Nov 30, 2021

You don't need low-level remote management anymore? Or what specifically are you associating with the term "BMC"? (i.e. for me, "BMC" is "turn it off and on and force it to boot from network, remotely")

jeffbee · on Nov 30, 2021

Correct. The larger my installation becomes, the less I care about the state of individual machines. The ability to promptly remediate a single broken machine becomes irrelevant at scale.

larkost · on Nov 30, 2021

But at scale you now have more and more machines that are going offline. That tends to me to push the organization more and more to having something doing this sort of management. And without a BMC-like system, that means more in-person work, which again, at scale becomes a real cost burden.

It sounds to me more like at the scale you are at you are no longer the person making sure that individual computers are still running, and so are forgetting that this job needs to be done.

detaro · on Nov 30, 2021

So if a machine behaves odd/goes away and its OS doesn't respond you don't want management plane to be able to redeploy it/run hardware checks/... automatically?

jeffbee · on Nov 30, 2021

If you put it that way you make it too simple. The question is whether I want a second, smaller computer inside my larger computer that may at any time corrupt memory, monkey with network frames, turn off the power, assert PROCHOT, or do a million other bad things. It's not just a tool with benefits. It has both benefits and risks, and in my experience the risks are not worth those benefits.

detaro · on Nov 30, 2021

but we are talking in the context of a project which specifically aims to do these things without the baggage of other platforms. And these things are BMC functions.

p_l · on Dec 1, 2021

So, you manage computers using crashcarts like in ugly old days of PC servers?

I thought that had considerable issue with scaling...

jeffbee · on Dec 1, 2021

I guess you thought wrong. What do you think this hardware tech is doing in this video?

https://youtu.be/XZmGGAbHqa0?t=224

Men with carts scale perfectly. One guy can manage 1000 machines. 2 guys can manage 2000. Perfect scaling.

p_l · on Dec 1, 2021

I see no crashcart there, only hw maintenance while crashcarts are where they belong, in the trash of history (yeah, google servers have BMCs)

(For reference: a crash cart is a cart with monitor, keyboard, mouse, cabling for them, and possibly external drives to help you install base OS, used for when you have no remote management)

panick21_ · on Nov 30, 2021

That's exactly what they are doing. The are removing as many thing from the BMC as possible. It only contains a few things, it boots and hand over and allows for some remote control of the low level. That's it.

znpy · on Nov 30, 2021

No, it's a rack level design with the target market being not companies that buy single servers and fill racks with them but hyperscalers that need to fill whole datacenters with servers.

Basically there are a huge range of scale levels where having hardware on-prem make sense financially.

Also, Bryan Cantrill has some sort of personal nitpick with modern servers basically being x86 PCs with a different form factor, and with the fact that in modern servers hardware and software do not cooperate at all (and in some occasion, hardware gets in the way of software).

tw04 · on Nov 30, 2021

> but hyberscalers that need to fill whole datacenters with servers.

I strongly doubt this is aimed at Amazon, Google, or Microsoft (hyperscalers). They all already have their own highly customized hardware and firmware. If that is their target I wish them luck. There’s no margin and a ton of competition in that space and as long as they’ve been working on this that feels like a pretty poor gamble.

What I believe this is actually targeting is small enterprise and up. A company that has dozens to thousands of servers. They’re willing to pay a premium for an easier go to market.

p_l · on Nov 30, 2021

There's a big "turnkey rack" market, where multiple servers might be delivered as complete racks and are supposed to be already wired up and everything.

All ranges of business except very small turn up in those purchases.

znpy · on Nov 30, 2021

> I strongly doubt this is aimed at Amazon, Google, or Microsoft (hyperscalers).

Indeed, that is not aimed at those hyperscalers.

ksec · on Nov 30, 2021

They are pretty much the "only" hyperscalers. The only two you could add is possibly Alibaba and Tencent Cloud.

tw04 · on Nov 30, 2021

I think IBM (softlayer), hetzner, and ovh would disagree. They may not have the breadth of services but they measure their scale in datacenters, not servers.

ksec · on Dec 1, 2021

They are certainly large with their own Datacenter, along with Oracle which is growing fast. But they are not HyperScalers. At least not by first and common analytical definition of it. And no one in the same industry, ( Linode or DO ) would think of Softlayer and Hetzner and OVH are hyper scalers. The three names combined wouldn't be equal to Azure or GCP scale.

Even TecentCloud and Alibaba are relatively new addition to the term once people discovered their scale. Although generally speaking Hyperscaler used by industry analyst still dont include these two when they use the term.

znpy · on Nov 30, 2021

Joyent?

joshgavant · on Nov 30, 2021

https://www.joyent.com/press/samsung-to-acquire-joyent

mlindner · on Dec 1, 2021

Which was a result of Joyent being unable to scale like the hyperscalers because there was no 3rd party that could make the hardware as well as the hyperscalers. That's what Oxide is for, to fix what Joyent was unable to do, to enable others to become hyperscalers.

floatboth · on Nov 30, 2021

Hyperscalers have already moved their custom stuff in a direction quite far from x86 PCs (how many new form-factors and interconnects and whatnot are under the Open Compute Project already?) while the typical Supermicro/Dell/HPE/whatever boxes available to regular businesses are still in that "regular PC" world. This is what they're trying to solve, yeah.

solmag · on Nov 30, 2021

I think this OS is intended to run in embedded context where there are significant memory constraints; read its description, no runtime allocations etc.

I linked two speeches where he goes over this in a bit more detail, but I hope the presentation opens it up even more.

zweifuss · on Nov 30, 2021

I haven't looked at Oxide in depth. Hubris seems to be about reducing the Attack Surface of a server by

* decreasing the active codebase by at least three orders of magnitude

* using no C-Code (Rust only?)

* most code is kernel independent and not privileged (e.g. drivers, task management, crash recovery)

Also: Administration is mostly done by rebooting components.

detaro · on Nov 30, 2021

Hubris is for system management components like the BMC, not for the main CPU.