> why were you trying to hand allocations over to malloc()? So that they could b...

Karellen · on Aug 8, 2024

> So that they could be released with free().

Well, yes, you have to replace free() and calloc()/realloc() too. Sorry, didn't think that needed spelling out.

> > Why not entirely replace malloc() for the process?

> Now that's just rude.

Isn't that generally what alternate custom allocators do, though? Like dmalloc and jemalloc?

Joker_vD · on Aug 8, 2024

Well, it's one thing when you, as an author of a program, links libraries into your code, and then you pick a custom allocator, and then the libraries you've picked will (transparently) use it. It's an altogether different thing when you, an author of a library, decide to use a custom allocator and then, since your users probably can't be persuaded to use your custom foo_free() that you could export, you hijack the libc's malloc implementation from anyone who links against you and replace it with your own. I personally think it's just rude.

So, with those constraints it's either memcpy-ing into the buffer that you get from the global malloc(), or trying to remap the addresses that that buffer spans onto your own buffer; I thought the latter could be faster starting with moderately large buffers but since I couldn't make it work, I couldn't benchmark it so nothing came out of it.

Karellen · on Aug 9, 2024

Ah.

One other approach that does occur is to just malloc() a 16GiB buffer to begin with. Only pages you touch should end up being backed by RAM(/swap). Then your finalize() operation is just a realloc() to "shrink" the buffer down to its final size. Any decent allocator should keep the data where it is, and just make the now-unused tail portion of address space available again, without ever having needed to back it.

Joker_vD · on Aug 9, 2024

It's possible, absolutely, but if you allocate N such fragments one after another, do some allocations in them, and then finalize them from first to the last, there will be gaps left less than 16 GiB in size, so they could only be reused for small allocations; and IIRC allocators with special support for huge allocations (e.g. using so-called huge pages) do not reuse that memory for "normal-sized" allocations (although I can be wrong on this).

So it's a tradeoff: this fragmentation is not that bad but it's still noticeable in a sufficiently long-running program because 16 GiB is 2**34 so you only can make 16 Ki such allocations before you hit the 2**48 limit. And if you could just simply remap them!..