Ben Kloosterman
2013-11-12 05:42:16 UTC
In contrast to a buddy allocator, however, this space can mostly be
defragmented. The stacks *may* (not always) be non-relocatable because of
C frames, but all of the other blocks can be evacuated over time. And we
need to do that anyway to avoid fragmentation of the *virtual* address
space in the LOS. This layer is also the layer at which real pages get
returned to the underlying operating system.
There is some cost however, if you could have any 2^N size then there isdefragmented. The stacks *may* (not always) be non-relocatable because of
C frames, but all of the other blocks can be evacuated over time. And we
need to do that anyway to avoid fragmentation of the *virtual* address
space in the LOS. This layer is also the layer at which real pages get
returned to the underlying operating system.
far less point in copying , yet i agree it should not be an issue ...
which is interesting..
So let me propose a non copying RC-Immix heap - its not as silly as it
first looks. ie you start with large blocks as normal (likely larger) but
when you scan for space looking for opportunities to copy you can split
existing blocks to reclaim holes.. In fact even with 3 sizes this is a
very useful property which could reduce copying substantially maybe to the
point of not being needed ( Id gladly give 20% fragmentation space , a few
percent in locality and some work on a background thread for no copying and
great scaling / low pauses) . With RC-Immix the main reason we copy (and
its not a lot of copying compared to a gen collector) is because a few
objects can hold an entire block and crucially there are too many of these
for the recycled block allocation , but we can for example just break the
512K block into 5 64K blocks and possibly even break the last block into
8K blocks. (Likely you would leave the recycled 128K block as 128K and
only under heavy memory pressure or after a long time do you break it down
further)
Quick Notes
- The nice part is this block splitting / merging can be done by a
background thread.
- It doesnt have the nice property Nursery collectors have of
improving locality of mid aged objects by compacting but Rc-Immix does a
poor job on this as well. ( eg they have poor small heap performance) .
As far as i can see this is the main negative but it should be at least as
good as Immix which is not bad. Immix copying is only for defragmenation.
- Copying has a cost as well. Adding a stage 1 Nursery like Gen-Immix
will likely be better (esp for low thread counts) but this has to be
weighed up against the cost of copying especially for highly concurrent
cases ie the requirement for barriers and safe points as well as a
complexity cost which restricts optimization ( and im not talking about
just LLVM) . When trying to run fully concurrent RC-Immix ( as opposed to
mutators and collectors running) you will get worse performance. A Nursery
is much simpler than a fully concurrent collector requiring just a write
barrier and will be better than Rc-Immix and a non relocating heap by
itself is likely only a little worse and likely similar to a concurrent
Rc-Immix .
- It scales well.
- It doesnt have the low cache1 hit rate of sized order heaps because
different sized objects ( related / similar liveliness) are still
allocated adjacent as normal.
- RC-Immix inserts new objects into holes in recycled blocks , a new
smaller block will make this more efficient as the mutator does not have to
scan for a hole ( which is one of the main reasons for lines but you still
need it for object /fragmentation counts and ref count) . We only have
recycled blocks that have poor splitting potential.
- These empty blocks are then reused as new and can be remerged with
adjacent ( blocks that are still partially full after being released by the
TLA and obviously adjacent empty blocks are also merged) . Thus allowing
blocks to reform. When all blocks are in use they should all be default
size ( - blocks held for stack , regions and TLA) . This is important so
the heap doesnt just degraded to minimum sized blocks.
- Simple to implement.
- Under high memory pressure /fragmentation you can break blocks down
further reducing fragmentation but increasing the block splitting work and
the TLA has to change blocks more frequently .. I think this will be more
efficient than scanning for lots of holes which Immix has to do.
- As stated earlier unlike a Buddy allocator we can work with smaller
blocks when needed ( as long as the minimum size is equal or larger than
the minimum large object) and as you say only a few sizes are used which
reduce fragmentation cost.
- It should not be that hard to test on Jikes ( but take a while to
tune) eg remove copying from Rc-Immix patch and change the block
manager/allocator , try various sizes and change the TLA allocator block
strategy. ( eg instead of use all empty , use all recycled collect you
need to work out when to use smaller blocks and when to look for them ).
Likely ( Use large empty block , use smaller sized empty blocks ( There
will only be many small blocks when fragmentation is high but medium blocks
will be common) , use recycled blocks) .
I think i need to think further on this (especially the cost of managing
block sizes and the cost of the locality changes for mid aged objects) as
its too obvious but i thought Id throw it out there as it seems a good
compromise.
Ben