Introduction
This is the second part of a series of blog posts on SGen, Mono’s new garbage collector. In this installment we shall be taking a closer look at the first generation, also called the “nursery”.
Nursery
As the name implies, the nursery is the generation where objects are allocated, or, to be more poetic, born. In SGen, it is a contiguous region of memory that is allocated when SGen starts up, and it does not change size. The default size is 4 MB, but a different size can be specified via the environment variable MONO_GC_PARAMS
when invoking Mono. See the man page for details.
Allocation
In a classic semi-space collector allocation is done in a linear fashion by pointer bumping, i.e. simply incrementing the pointer that indicates the start of the current semi-space’s region that is still unused by the amount of memory that is required for the new object.
In a multi-threaded environment like Mono using a single pointer would introduce additional costs because we would have to do at least a compare-and-swap to increment it, potentially looping because it might fail due to contention. The standard solution to this problem is to give each thread a little piece of the nursery exclusively from which it can bump-allocate without contention. Synchronization is only needed when a thread has filled up its piece and needs a new one, which should be far less frequent. These pieces of nursery are called “thread local allocation buffers”, or “TLABs” for short.
TLABs are assigned to threads lazily. The relevant pointers, most notably the “bumper” and the address of the TLAB’s end, are stored in thread local variables. Currently TLABs are fixed in size to 4 KB (actually that’s the maximum size – they might be smaller because the nursery can be fragmented). In the future we might dynamically adapt that size to the current situation. The less threads that are allocating objects, the larger the TLABs should be so that they need to be assigned less often. On the other hand, the more threads we allocate TLABs to the sooner we run out of nursery space, so we will collect prematurely. In that situation the TLABs should be smaller. Ideally a thread should get a TLAB whose size is proportional to its rate of allocation.
Calling from managed into unmanaged code involves doing a relatively costly transition. To avoid this the fast paths of the allocation functions are managed code. They only handle allocation from the TLAB. If the thread does not have a TLAB assigned yet or it is not of sufficient size to fulfill the allocation request we call the full, unmanaged allocation function.
Collection
Collecting the nursery is not much different in principle from collecting the major heap, so much of what I discuss here is also valid there.
Coloring
The collection process can be thought of as a coloring game. At the start of the collection all objects start out as white. First the collector goes through all the roots and marks those gray, signifying that they are reachable but their contents have not been processed yet. Each gray object is then scanned for further references. The objects found during that scanning are also colored gray if they are still white. When the collector has finished scanning a gray object it paints it black, meaning that object is reachable and fully processed. This is repeated until there are no gray objects left, at which point all white objects are known not to be reachable and can be disposed of whereas all black objects must be considered reachable.
One of the central data structures in this process, then, is the “gray set”, i.e. the set of all gray objects. In SGen it is implemented as a stack.
The roots
As I mentioned above as well as in my overview post, the collection starts with the roots. These include, most prominently, the references on the stacks of all managed threads and static class fields. In addition to that there are numerous other roots that the runtime registers and uses for internal purposes. Since SGen is a generational collector, the roots for the nursery collection also include all references from the major heap to objects in the nursery. These are kept track of with write barriers which I will discuss in a later blog post.
Forwarding
In a copying collector like SGen’s nursery collector, which copies reachable objects from the nursery to the major heap, coloring an object also implies copying it, which in turn requires that all references to it must be updated. To that end we must keep track of where each object has been copied to. The easiest way to do this is to use a forwarding pointer. In Mono, the first pointer-sized word of each object is a pointer to the object’s vtable, which is aligned to at least 4 bytes. This alignment leaves the two least significant bits for SGen to play with during a collection, so we use one of them to indicate whether that particular object is forwarded, i.e. has already been copied to the major heap. If it is set, the rest of the word doesn’t actually point to the vtable anymore, but to the object’s new location on the major heap. The second bit is used for pinning, which I will discuss further down.
The loop
Here is a simplified (pinning is not handled) pseudo-code implementation of the central loop for the nursery collector:
1: while (!gray_stack_is_empty ()) { 2: object = gray_stack_pop (); 3: foreach (refp in object) { 4: old = *refp; 5: if (!ptr_in_nursery (old)) 6: continue; 7: if (object_is_forwarded (old)) { 8: new = forwarding_destination (old); 9: } else { 10: new = major_alloc (object_size (old)); 11: copy_object (new, old); 12: forwarding_set (old, new); 13: gray_stack_push (new); 14: } 15: *refp = new; 16: } 17: }
In line 2
we pop an object out of the gray stack and in line 3
we loop over all the references that it contains. refp
is a pointer to the reference we’re currently looking at. In line 4
we fetch the actual reference. We are only interested in collecting the nursery, so if the reference already points to the old generation we skip it (lines 5
and 6
). If the object is already forwarded (line 7
) we fetch its new location (line 8
). Otherwise we must copy it to the major heap. To that end we allocate the space required (line 10
), do the actual copying (line 11
) and forward the object (line 12
). That newly copied object must also be processed, so we push it on the gray stack (line 13
). Finally, for both cases, we update the reference to point to the new location of the object (line 15
).
Pinning
In the overview post I mentioned that we currently cannot scan stack frames precisely, i.e. we do not know whether a word that looks like a reference just does so by coincidence or is actually one. Work for scanning managed stack frames precisely is under way, but for unmanaged frames this is not possible anyway because the C compiler does not provide us with the necessary information (some runtimes avoid this problem by not exposing managed references to unmanaged code directly).
Because we lack this knowledge we must proceed conservatively and assume that what looks like a reference is one, which means considering the object it points to as reachable. On the other hand, since we cannot be sure, we are not allowed to modify this supposed reference, because it might just be a number. That implies that we also cannot move the object. In other words the object is “pinned”.
Pinned objects are a bit of a complication. Having pinned objects in the nursery means that we cannot clean it out completely and so it becomes fragmented. It also requires an additional check in the central loop.
Finishing up
After all the copying is done what’s left is to clean up the nursery for the next round of allocations. SGen doesn’t actually zero the nursery memory at this point, however. Instead that happens at the point when TLABs are assigned to threads. The advantage is not only that it potentially gets distributed over more than one thread, but also that is has better cache behavior – the TLAB is likely to be touched again soon by that thread.
What has to happen at this point, though, is to take score of the regions of nursery memory that are available for allocation, which means everything apart from pinned objects. Since we have a list of all pinned objects this is done very efficiently – I will discuss this “pin queue” in a later post. It is these “fragments” from which TLABs will be assigned in the next mutator round.
I should mention that there is a bit of additional work to be done prior to fragment creation, namely making sure unreachable finalizable objects are finalized and weak references are zeroed if the objects they pointed to have become unreachable. Both are very similar issues and shall be discussed in a later post.