SGen – The Nursery

Introduction

This is the second part of a series of blog posts on SGen, Mono’s new garbage collector. In this installment we shall be taking a closer look at the first generation, also called the “nursery”.

Nursery

As the name implies, the nursery is the generation where objects are allocated, or, to be more poetic, born. In SGen, it is a contiguous region of memory that is allocated when SGen starts up, and it does not change size. The default size is 4 MB, but a different size can be specified via the environment variable MONO_GC_PARAMS when invoking Mono. See the man page for details.

Allocation

In a classic semi-space collector allocation is done in a linear fashion by pointer bumping, i.e. simply incrementing the pointer that indicates the start of the current semi-space’s region that is still unused by the amount of memory that is required for the new object.

In a multi-threaded environment like Mono using a single pointer would introduce additional costs because we would have to do at least a compare-and-swap to increment it, potentially looping because it might fail due to contention. The standard solution to this problem is to give each thread a little piece of the nursery exclusively from which it can bump-allocate without contention. Synchronization is only needed when a thread has filled up its piece and needs a new one, which should be far less frequent. These pieces of nursery are called “thread local allocation buffers”, or “TLABs” for short.

TLABs are assigned to threads lazily. The relevant pointers, most notably the “bumper” and the address of the TLAB’s end, are stored in thread local variables. Currently TLABs are fixed in size to 4 KB (actually that’s the maximum size – they might be smaller because the nursery can be fragmented). In the future we might dynamically adapt that size to the current situation. The less threads that are allocating objects, the larger the TLABs should be so that they need to be assigned less often. On the other hand, the more threads we allocate TLABs to the sooner we run out of nursery space, so we will collect prematurely. In that situation the TLABs should be smaller. Ideally a thread should get a TLAB whose size is proportional to its rate of allocation.

Calling from managed into unmanaged code involves doing a relatively costly transition. To avoid this the fast paths of the allocation functions are managed code. They only handle allocation from the TLAB. If the thread does not have a TLAB assigned yet or it is not of sufficient size to fulfill the allocation request we call the full, unmanaged allocation function.

Collection

Collecting the nursery is not much different in principle from collecting the major heap, so much of what I discuss here is also valid there.

Coloring

The collection process can be thought of as a coloring game. At the start of the collection all objects start out as white. First the collector goes through all the roots and marks those gray, signifying that they are reachable but their contents have not been processed yet. Each gray object is then scanned for further references. The objects found during that scanning are also colored gray if they are still white. When the collector has finished scanning a gray object it paints it black, meaning that object is reachable and fully processed. This is repeated until there are no gray objects left, at which point all white objects are known not to be reachable and can be disposed of whereas all black objects must be considered reachable.

One of the central data structures in this process, then, is the “gray set”, i.e. the set of all gray objects. In SGen it is implemented as a stack.

The roots

As I mentioned above as well as in my overview post, the collection starts with the roots. These include, most prominently, the references on the stacks of all managed threads and static class fields. In addition to that there are numerous other roots that the runtime registers and uses for internal purposes. Since SGen is a generational collector, the roots for the nursery collection also include all references from the major heap to objects in the nursery. These are kept track of with write barriers which I will discuss in a later blog post.

Forwarding

In a copying collector like SGen’s nursery collector, which copies reachable objects from the nursery to the major heap, coloring an object also implies copying it, which in turn requires that all references to it must be updated. To that end we must keep track of where each object has been copied to. The easiest way to do this is to use a forwarding pointer. In Mono, the first pointer-sized word of each object is a pointer to the object’s vtable, which is aligned to at least 4 bytes. This alignment leaves the two least significant bits for SGen to play with during a collection, so we use one of them to indicate whether that particular object is forwarded, i.e. has already been copied to the major heap. If it is set, the rest of the word doesn’t actually point to the vtable anymore, but to the object’s new location on the major heap. The second bit is used for pinning, which I will discuss further down.

The loop

Here is a simplified (pinning is not handled) pseudo-code implementation of the central loop for the nursery collector:

 1:  while (!gray_stack_is_empty ()) {
 2:      object = gray_stack_pop ();
 3:      foreach (refp in object) {
 4:          old = *refp;
 5:          if (!ptr_in_nursery (old))
 6:              continue;
 7:          if (object_is_forwarded (old)) {
 8:              new = forwarding_destination (old);
 9:          } else {
10:              new = major_alloc (object_size (old));
11:              copy_object (new, old);
12:              forwarding_set (old, new);
13:              gray_stack_push (new);
14:         }
15:         *refp = new;
16:     }
17:  }

In line 2 we pop an object out of the gray stack and in line 3 we loop over all the references that it contains. refp is a pointer to the reference we’re currently looking at. In line 4 we fetch the actual reference. We are only interested in collecting the nursery, so if the reference already points to the old generation we skip it (lines 5 and 6). If the object is already forwarded (line 7) we fetch its new location (line 8). Otherwise we must copy it to the major heap. To that end we allocate the space required (line 10), do the actual copying (line 11) and forward the object (line 12). That newly copied object must also be processed, so we push it on the gray stack (line 13). Finally, for both cases, we update the reference to point to the new location of the object (line 15).

Pinning

In the overview post I mentioned that we currently cannot scan stack frames precisely, i.e. we do not know whether a word that looks like a reference just does so by coincidence or is actually one. Work for scanning managed stack frames precisely is under way, but for unmanaged frames this is not possible anyway because the C compiler does not provide us with the necessary information (some runtimes avoid this problem by not exposing managed references to unmanaged code directly).

Because we lack this knowledge we must proceed conservatively and assume that what looks like a reference is one, which means considering the object it points to as reachable. On the other hand, since we cannot be sure, we are not allowed to modify this supposed reference, because it might just be a number. That implies that we also cannot move the object. In other words the object is “pinned”.

Pinned objects are a bit of a complication. Having pinned objects in the nursery means that we cannot clean it out completely and so it becomes fragmented. It also requires an additional check in the central loop.

Finishing up

After all the copying is done what’s left is to clean up the nursery for the next round of allocations. SGen doesn’t actually zero the nursery memory at this point, however. Instead that happens at the point when TLABs are assigned to threads. The advantage is not only that it potentially gets distributed over more than one thread, but also that is has better cache behavior – the TLAB is likely to be touched again soon by that thread.

What has to happen at this point, though, is to take score of the regions of nursery memory that are available for allocation, which means everything apart from pinned objects. Since we have a list of all pinned objects this is done very efficiently – I will discuss this “pin queue” in a later post. It is these “fragments” from which TLABs will be assigned in the next mutator round.

I should mention that there is a bit of additional work to be done prior to fragment creation, namely making sure unreachable finalizable objects are finalized and weak references are zeroed if the objects they pointed to have become unreachable. Both are very similar issues and shall be discussed in a later post.

About these ads

7 thoughts on “SGen – The Nursery

  1. Why is it needed to use one bit in the vtable pointer to mark forwarding pointers? That bit could be used elsewhere I guess. Wouldn’t it be enough to perform a range-check on the address, after requiring that vtables are never allocated in the young generation? I guess it might be slower and this looks like an inner loop, but one would need to benchmark this.

    • No, this wouldn’t work. The forwarding address is the new address, in the old generation, so it doesn’t fall into the nursery. Some of our major collectors don’t allocate one contiguous heap space, so a range check wouldn’t work there, either.

      • I see – I misread your post (very much) and thought you had another semispace to copy to – but I was missing the accounting of generation.
        Another question: have you considered the design of the Hotspot generational collector? It has an “eden” (the young generation) and two survivor spaces, so that one can tune the number of GC after which an object is promoted – that makes it more flexible.

    • We’ve had an experimental semi-space nursery implementation once that would promote objects only once the semi-space reached a certain fill ratio, but it performed worse than the current implementation. Note that since we do conservative stack scanning, many young objects remain in the nursery due to pinning.

      • The heuristic you describe looks quite different from checking the age of an object to decide for promotion. If you don’t store the age of an object, you’d risk promoting young objects too early once the fill rate is exceeded.
        I guess you need a quite low fill rate to keep copying collection convenient.

        Anyway, about this heuristic, I’d be curious to read archived discussions on MLs, to understand the rationale.

        Another question, prompted by your remark on pinning: might I suggest you look at Immix? It is a recently introduced kind of Garbage Collector (published at the top conference PLDI 2008), which supports pinned objects much more cleanly than a copying collector (where pinning destroys some fundamental properties), and without significant performance degradation for most benchmarks. It is also faster and more memory efficient than many other canonical strategies. There was a Google Summer of Code project to use it for the GHC Haskell runtime.

        You can find their paper by looking on Google Scholar for:
        Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance

    • Keeping track of the ages of objects is more administrative complexity, and we couldn’t be bothered to implement that (yet).

      We’re aware of IMMIX. In fact our Mark&Sweep collector’s evacuation (which I’ll describe in the following post) is inspired by it.

Comments are closed.