Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Proxmox Docker Containers Monster – 13000 containers on a single host (virtualizationhowto.com)
98 points by new_user_final on April 17, 2023 | hide | past | favorite | 62 comments


>docker swarm init

And just like that you have a cluster to run containers on. I really like the simplicity of Docker Swarm. I've been using it for at least five years and it's just worked.

During the COVID lockdown I got tired of having to open a UI (at the time I was using CapRover[0]) to edit any of the services I run so I decided to make my own PaaS with a nice CLI. Connecting to the docker socket is easy and the API is simple enough. It's been working no problem for the last two years.

The only complaint I have is that I can't see the user's IP for HTTP requests[1] but there is some hope in the form of Proxy Protocol[2]. I have no idea how complex the code for Docker Swarm ingress is, but I may spend a weekend in the near future scouting the code to get an idea. The current possible solution is to put a load balancer in front of the cluster that either sets the X-Forwarded-For header (or any of the equivalent ones) or speaks Proxy Protocol but I will avoid that solution for now.

I recommend Docker Swarm as a solution for anyone starting that doesn't want to spend hours and hours configuring a production environment. Even if it is just one node, you get services, replication, healthchecks, restart policies, secrets... And it all starts with that simple command, no further config needed.

[0]: https://caprover.com/ [1]: https://github.com/moby/moby/issues/25526 [2]: https://github.com/moby/moby/issues/39465


I liked the Rancher 1.x runtime (Cattle) for similar reasons. It was a little bit more complicated than Swarm, but also offered an IPv4 overlay network that could be accessed by containers across hosts (amongst other things): https://rancher.com/docs/rancher/v1.6/en/rancher-services/

For version 2 they've replaced it entirely with K8s however.


Eh....kind of. There's several problems with Swarm in production, even at a small scale. I've ran into issues with nodes refusing workloads, inability to attach to the cluster, Docker failing to start, etc. Also since it's so sparingly used, I have to dig into the depths of Google to try to remediate the issue if I can't figure it out on my own.

Making Swarm production ready is also a time sink. Most people end up with a bunch of bash scripts mashing together YAML for secrets, deployments, etc.

Personally I think it's a waste of developer time. Most people don't need the features of Swarm or K8S. Just use a VM at a popular IaaS provider and stick a LB in front of it. Then you don't have to ask questions about IP rate limiting, or other gotchas. If you outgrow a couple of VMs, we can talk about throwing ASGs or containers at the problem. Honestly though most people I talk to don't need containerized solutions.


You can absolutely start simpler, with even a single docker host and something like dokku. Then it's a matter of a git push for deploys.


I've been using a relatively small vm with caddy to act as a reverse proxy, this can pass the client IP (X-Forwarded-For) down into the relevant services. I'll usually wrap a logger in the context, so request id and client ip are used, and a request id is passed downstream to peer services per request as well.


I've done this before too. Another approach is to front the site with Cloudflare, which passes the original IP in an HTTP header.


As opposed to "minikube start"? Or "k3s server"?


Why Swarm over Compose?

Why Swarm over k3s?


compose has no proper concept of service and its replicas. Deploying an update to a service in Compose can be problematic. Swarm also enables multi-node deployments, which you can't do with Compose.

k3s, as simple as it is compared to full k8s, still carries some of the complexity of k8s. Swarm just feels simpler and easier to manage.


Trying to figure out what's unique here, but it seems like Proxmox is being used to create VMs that then run docker. Then docker on these VMs is used to spin up a bunch of containers. So really, it's just Proxmox -> VM -> Docker -> Containers. So it's dedicated docker VMs to coordinate containers...

I was expecting Proxmox's LXC capabilities to be used to scale up to 13000, but this is just VMs + docker allowing that. Seems like the same thing could be done with any LVM hypervisor and VMs? Can someone correct me if I'm missing something?


AFAIK it’s the only way to run docker containers on proxmox. You can run lxc containers directly but docker requires an intermediate vm… which is one more reason to avoid docker altogether :)


You can install Docker directly on Proxmox. I did a little comparison of the 3 methods [1].

[1] https://danthesalmon.com/running-docker-on-proxmox/


No, sounds like you got it.


If you keep in mind that containers are only namespaces for the filesystem and network resources, this is not too different from running 13000 processes on the host without containers.

Comparable to building something with make -j64.


I guess I still don't fully understood containers / Docker. If they are only namespaces, what does it mean to run an Ubuntu image on my Mac?


On Mac there is simply the additional step of Docker Desktop running a Linux VM for you, so that it has something to run the containers in.

(EDIT: On Windows that's also an option, or you set it to run against Windows' native container support - which then can only run windows-based images. But really, usually people mean "on Linux" when they discuss how Docker works)


Hmm, what kind of VM? By default, my M1 Mac will only run arm64 software under Docker. If an arm64 build is not available for something (ie x64 only), I can change the architecture setting, and only then will Docker attempt to run the process using Qemo. If Docker on Mac is already a VM, why does it need Qemo? Why does the Docker VM not emulate across architectures?


> Why does the Docker VM not emulate across architectures?

Because it's easier to use a CPU emulator on just one kernel a single VM than manage multiple VMs. So you can use QEMU just like Macs already use Rosetta for x86_64 macOS binaries on aarch64 macOS.

Linux further has a built-in feature for running 'foreign' (different CPU architecture, but still Linux) binaries (and more!) in a transparent way, so if you set this up in the VM you can invoke those x86_64 binaries as if they were native, without really thinking about it: https://www.kernel.org/doc/html/latest/admin-guide/binfmt-mi...

You can also use it with DOS and Windows binaries, or via Apple's Rosetta for Linux instead of QEMU for CPU emulation (and there are some other open-source competitors in that space as well). Pretty neat!


VM != cpu emulation, but fwiw I hear the opposite: M1 mac runs aarch64 containers natively, and emulates x64 containers. Mostly this works until it becomes confused and tries to run an x64 container without emulation.


> M1 mac runs aarch64 containers natively, and emulates x64 containers

Macs don't run any containers natively, because macOS doesn't support containers and doesn't have a Linux ABI translation layer (some operating systems, like Illumos and NetBSD, do, and you can run Linux binaries on them almost like you can run Windows binaries on Linux via WINE). On M1, when you run aarch64 containers under Docker, you are 'merely' emulating an environment for a Linux kernel to run in and you can pass certain aspects of your CPU through. But when you do the same with x86_64 containers, you are additionally 'emulating' (via a translation layer) x86_64.

Hopefully some day macOS will have its own truly native containers, like Linux and Windows do. That could be a serious savings for containerized macOS development environments, because they will be way more efficient than Docker Desktop on Mac currently can be. For now, the only way to get that kind of efficiency is Nix— which is great and better suited to that use case than Docker, but underutilized at many organizations.

Probably even then people will still use virtualized Linux, since that's a closer environment to where apps will usually run in prod. But at such an org I'd argue for using macOS containers for local development, should they ever come to be available. The efficiency gains are well worth it IME.


Yes I got the architectures messed up, meant what you said.


I'd guess so they get away with one VM (and have all the containers in one VM and thus under one kernel, for networking them etc?)? One arm64 VM, using qemu to run x86 when needed, achieves that.


Let's say you run: docker run -it ubuntu:22.04 bash , then roughly this happens:

There's a Linux VM hosted on your mac. There's a docker daemon running on your mac. The docker command process above communicates with said daemon which has the ability to exec processes inside the Linux VM. It uses that capability to exec runc (or something similar) which in turn starts a bash process inside a namespace inside the VM.

If you run some other container the same thing happens, using the same VM.

So the secret is : docker isn't running anything on your mac, it's running stuff on a Linux VM hosted on your mac.


Docker in Mac runs via a VM, so you wouldn't be running it a process under your Mac, necessarily.


Adding to what the others are saying about Docker running inside a VM on Mac:

When it comes to running for example an Ubuntu docker container, it would share the Linux kernel running in the VM with other containers on the same host. The "Ubuntu" is the distribution packaged specifically for containers, meaning that the regular Ubuntu packaged kernel is not used and neither is the init system because the Linux distribution container images are meant to containerize single applications and traditionally do not need the concept of "services".

I'm sure it's possible to run systemd inside a container, just as some people run Cron and even X11 inside containers.


When people talk about containers they almost always mean Linux containers, i.e. a collection of features in the Linux kernel that allow creating the illusion of containers.

Needless to say, this doesn't make any sense on macOS because there's no Linux kernel. Therefore you need a Linux VM to run Linux containers on macOS.


Depends on the kind of container. It is not necessarily true that 1 container == 1 process. I run Docker containers with 1000s of processes.


Generally this is not what Docker containers are 'for', right? Docker is intended as a system for 'application containers', which yes, may spawn child processes, but are generally not expected to include a process supervisor or service management layer internally.

Contrast this to 'operating system containers' which are designed to run an init system, so individual containers are intentionally multiprocess. Many virtual private servers run in containers of this kind, e.g., by OpenVZ.

On that note, how come you've gone with Docker for this rather than something like LXC or OpenVZ, which are ostensibly designed for the large, multiprocess container use case?


Yes I run an init system that starts multiple processes and services. All in a Docker container.

I would be very keen to understand why LXC is better than Docker for this. I've also read this kind of "marketing" around so called "operating system containers" but I am still clueless why they are better. Concretely, what are the benefits?


Well, they could also just mean their service is a multi-process program (as opposed to threads, which...are also processes), but it's not super clear.


see my reply above


The "container" is the first process, which can create children that inherit the same properties.


I strongly suggest any sys admin to look at the relatively new Proxmox Backup Server https://www.proxmox.com/en/proxmox-backup-server which makes full incremental backups so light thanks to a well enginered deduplication


It works well and saves a fair bit of space due to deduplication (which is filesystem-independent as it implements its own deduplication layer). It can get quite slow when backing up containers, e.g. a container with a 200GB root filesystem takes a bit more than 2 hours (PBS running in container on the same Proxmox host, backup from SAS array to single SATA drive). Backup of a group of 4 filesystems totalling 2.25TB containing mostly larger files (Peertube video storage, Nextcloud data directories and image archives) on the same installation takes about 4 hours so the time needed varies significantly with the characteristics of the data to be backup up.


It really pairs well with proxmox. I've tried many methods but for proxmox vm and cts nothing was better than their own backup server.


Also, this can be deployed to hyper-v. So you can backup your proxmox server to your desktop machine easily. Most people's desktops are well spec'd so that's functionally free


it's could happen that only Sysadmins among this site visitors crowd are FreeBSD guys, everyone else are Developers or DevOps/SREs in the worst case :)

And on FreeBSD there are Jails and BHyve, so not a Proxmox audience as well, IMHO.


Oh hey, you don't see a lot of Docker Swarm nowadays, though in my experience it's still a wonderful solution for getting started with container orchestration, that will take a lot of the smaller/medium scale projects pretty far, before you need to look at something else (e.g. Nomad or Kubernetes). There's a lot of benefit in being able to hit the ground running even when you're self-hosting your clusters and administering them yourself.

It comes available with an install of Docker, is easy to setup and operate, has great optional UI solutions like Portainer (analogue to Rancher for Kubernetes), has one of the lower resource usages for the orchestrator itself, as well as supports the Docker Compose specification, which in my opinion is far more usable than the Kubernetes manifests (though less powerful than Helm charts) and far more common than Nomad's HCL.

For my Master's Degree, I explored a comparison where I ran the same workloads across a Docker Swarm cluster and a K3s cluster (a great Kubernetes distro that's low on resource usage as well) and even then Swarm used less memory (~2x less than Kubernetes for the leader node both under load and when idle) and used a bit less CPU (~30% less for the leader nodes under load) as well. That said, K3s still performed admirably, at least in comparison to RKE which wouldn't even run in a stable fashion on the limited hardware that I had at the time.

Maybe one of these days I should run Proxmox in my homelab as well, instead of just something like Debian or Ubuntu directly on the hardware. Also, while Podman is great, Docker still seems like a dependable option just because of how common it is and given how it's gotten more stable over time (despite the arguable architecture disadvantages).

I think the only actual issues I've had since when using Docker Swarm have been using a network that ran out of addresses to assign to the containers (probably some default), some Oracle Linux bug where kswapd would top out the CPU when the swap got full, as well as some Debian bug years ago on an old version of Docker that caused networking to fail and the cluster needed to be re-created to fix it.


> comparison where I ran the same workloads across a Docker Swarm cluster and a K3s cluster (a great Kubernetes distro that's low on resource usage as well) and even then Swarm used less memory (~2x less than Kubernetes for the leader node both under load and when idle) and used a bit less CPU (~30% less for the leader nodes under load) as well.

have you found why it's so? I'm curious - in theory all the processes inside are the same, except, what it should be slight overhead, from Swarm/K3S itself.


> I'm curious - in theory all the processes inside are the same, except, what it should be slight overhead, from Swarm/K3S itself.

What I saw during benchmarking was that the load on the worker nodes was almost the same for both K3s and Swarm, presumably because most of the resources were used by the actual container runtimes and containers themselves, the overhead for communication with the orchestrator not being too notable.

However, when I tested the whole setup under load (containers serving lots of web requests, databases inside of containers etc., some containers eventually getting OOM killed and restarted), I found that the leader nodes had more pronounced resource usage differences, as described above.

My guess is that Kubernetes (even K3s) is just a bigger and more complex system than Swarm, which probably also explains the comparatively higher requirements/overhead (for even single node Kubernetes as well). That's also why I'd be careful about running anything apart from the orchestrator on the leader nodes.


We've been using docker swarm for running some basic services as kubernetes seems way over-engineered for simple stuff. Our basic requirements is that a service will be auto-restarted as soon as possible if/when something breaks.

However, though swarm is easy to use and configure, I've found that having a small swarm (e.g. three nodes) means that you end up having each node be a manager as the swarm fails if half the manager nodes aren't up.

I've also found that "docker compose config" doesn't work in a compatible manner and have to use "docker-compose config" instead.


"Your scientists were so preoccupied with whether they could, they didn’t stop to think if they should." Dr Ian Malcom, Jurassic Park

Not knocking this achievement though, it's awesome they were able to pack that many containers in one host.


This is all well & good, but I'm not sure what useful images you could actually run with 10 megabytes of memory each. The nginx containers provisioned would not have been able to do much, if anything.


Why run docker in lxc when you can run lxc directly?


LXC doesn't have the images and tools ecosystem around. Also, LXC and Proxmox have a mutual vibe that is amazing to work with (if you care to take a closer look).

LXC is a first class citizen in Proxmox, with clear documentation, config files, CLI tools and web UI. It can be backed-up just like a VM (Proxmox Backup Server FTW!), can be set up mostly like a VM, it boots instantly and so on. It's such a joy to work with, unless you want access to actual hardware (which includes things like mounting or hosting NFS/Samba), but even then it's easy to find help on docs and forums (the latter are surprisingly up to date).

But LXC does not have that "ephemeral" nature like Docker. Container "templates" are clean but full operating systems, full hard drives are connected and host all data, both user files/images and OS. Just like a VM, I guess.

Now, you can actually run Docker inside LXC easily and have best of both worlds: docker with docker-compose and alike to quickly prototype and homelab AND super light and quick to boot machines which will host that Docker for you.

It's actually a very clean and pleasant approach, I highly recommend this as tool for testing and homelabing.


just out of curiosity - why wouldn't You do a SAMBA in LXC? Specially since there's a turnkey template exactly for this.

Is it for making a RAID directly on the HW drives?


Oh no, just these services still use some /dev/ and /proc/ stuff, unprivileged container doesn't have access to that. So you either have to run privileged container (I wanted to avoid this), try to pass-through all devices needed, or mount those on the underlying host OS and pass that to container as a bind mount.

For me so far there is always a decent workaround when I want to use USB device etc.. The only thing I am currently stuck with is sound in LXC container (I build kind of a VDI desktop, RDP access to a system with KDE) and struggle to pass the actual sound card through. There is some active proxmox forum thread on that so I'm hopeful.


I would LOVE it if proxmox supported docker (or podman) out-of-the-box. (I mean with docker/podman showing up in the gui, just like LXC containers and vms)

LXC is just a container, docker is much more than just that. It is a recipe (dockerfiles), it is sort of a social project sharing setup, it is like a version control system (the layered filesystem that only updates changes), it is a disposable dev container, it is a deployable runtime container, etc


I run nested Docker in unprivileged LXC on Proxmox on ZFS since 2019 without problems. Gitlab, Nextcloud, mailcow-dockerized etc., for 10 people, - altogether 10 LXC with about 25 Docker services, 1-2% CPU utilization on average, thanks to sharing all resources of the host. I've written a blog post about it [1].

[1]: https://du.nkel.dev/blog/2021-03-25_proxmox_docker/


This is one of the reasons is switched to TrueNas Scale for my HomeServer. Before that I always had a VM that was the Docker Host and another one with OpenMediaVault as Backup Server. Both these VMs are replaced by TrueNas. Less complexity and less maintenance overhead.


Have you by chance tried GPU passthrough on TrueNAS Scale? I'm planning a Homelab do-over this summer and one requirement is to have a Windows VM with a dedicated GPU I can use as a Steam host via Parsec.

I haven't used TrueNAS since it was FreeNAS and BSD based.


Heck, I didn't know you could run containers in containers. I'm setting up some container VMs on my Proxmox soon and didn't even think of using LXC since I thought it technically infeasible.

And yes as others have said, LXC and docker use cases seem different.

LXC seems to be for the same use case as VMs, but smaller, easier to spin up, better resource utilization, etc.

Docker can be used that way too, but is more useful with the surrounding ecosystem of being ephemeral (all data stored on remote storage, container only contains the "app") and with supporting deployment infrastructure.


Actually, I'm confused. They talk about running it in LXC containers, but where exactly he installs Portainer is still a mystery to me. Furthermore, the screenshots show a lot of VM icons, not CT icons. So where is this person using LXC?

edit: Wait, did they install docker/portainer on Proxmox bare metal? They say to access Portainer through the Proxmox host IP, but any CT or VM created on Proxmox would probably have its own IP on a Proxmox bridge. So the IP should be the IP of the VM/CT hosting the docker install, not the Proxmox host.


While there is merit to this post my main criticism running 13000 containers with zero load - essentially all the nginx processes are doing nothing - zero I/O, etc. after launch. Its a bit more interesting to see N# of containers running something synthetic that mimics a workload.

That said, containers are very lean (or can be with the right setup) given there is no kernel, drivers, etc to load.


It's not the same, but reminded me of this post [1] I wrote some time ago, about a 63-nodes EKS cluster running on VMs with Firecracker on a single instance.

[1]: https://www.ongres.com/blog/63-node-eks-cluster-running-on-a...


Tl;dr - there is nothing really special in Proxmox that lets run and manage 13000 containers. Author created 10 vms on Proxmox host and ran docker on them. You don’t really need Proxmox for what described.

When I first saw Proxmox I also wanted to see if I could use it to manage docker containers, but it doesn’t support it directly. For working with containers you need other tools, eg Kubernetes.


If you wanna watch a beefy Windows machine fall apart, run this under WSL. Even relatively modest Docker workloads completely destroy vmmem.exe (or vmmemWSL.exe, depending on your WSL version).


13000x 10MB memory, to be precise


I am so glad the latest Proxmox VE release contains a dark mode.


dark mode? what is it?


as a technology risk and compliance manager who is embroiled in a big disaster recovery/business continuity/ISO 22301 project at the moment, reading this headline made parts of me turn to dust and drop off.


Why ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: