SGen – The Nursery

Introduction

This is the second part of a series of blog posts on SGen, Mono’s new garbage collector. In this installment we shall be taking a closer look at the first generation, also called the “nursery”.

Nursery

As the name implies, the nursery is the generation where objects are allocated, or, to be more poetic, born. In SGen, it is a contiguous region of memory that is allocated when SGen starts up, and it does not change size. The default size is 4 MB, but a different size can be specified via the environment variable MONO_GC_PARAMS when invoking Mono. See the man page for details.

Allocation

In a classic semi-space collector allocation is done in a linear fashion by pointer bumping, i.e. simply incrementing the pointer that indicates the start of the current semi-space’s region that is still unused by the amount of memory that is required for the new object.

In a multi-threaded environment like Mono using a single pointer would introduce additional costs because we would have to do at least a compare-and-swap to increment it, potentially looping because it might fail due to contention. The standard solution to this problem is to give each thread a little piece of the nursery exclusively from which it can bump-allocate without contention. Synchronization is only needed when a thread has filled up its piece and needs a new one, which should be far less frequent. These pieces of nursery are called “thread local allocation buffers”, or “TLABs” for short.

TLABs are assigned to threads lazily. The relevant pointers, most notably the “bumper” and the address of the TLAB’s end, are stored in thread local variables. Currently TLABs are fixed in size to 4 KB (actually that’s the maximum size – they might be smaller because the nursery can be fragmented). In the future we might dynamically adapt that size to the current situation. The less threads that are allocating objects, the larger the TLABs should be so that they need to be assigned less often. On the other hand, the more threads we allocate TLABs to the sooner we run out of nursery space, so we will collect prematurely. In that situation the TLABs should be smaller. Ideally a thread should get a TLAB whose size is proportional to its rate of allocation.

Calling from managed into unmanaged code involves doing a relatively costly transition. To avoid this the fast paths of the allocation functions are managed code. They only handle allocation from the TLAB. If the thread does not have a TLAB assigned yet or it is not of sufficient size to fulfill the allocation request we call the full, unmanaged allocation function.

Collection

Collecting the nursery is not much different in principle from collecting the major heap, so much of what I discuss here is also valid there.

Coloring

The collection process can be thought of as a coloring game. At the start of the collection all objects start out as white. First the collector goes through all the roots and marks those gray, signifying that they are reachable but their contents have not been processed yet. Each gray object is then scanned for further references. The objects found during that scanning are also colored gray if they are still white. When the collector has finished scanning a gray object it paints it black, meaning that object is reachable and fully processed. This is repeated until there are no gray objects left, at which point all white objects are known not to be reachable and can be disposed of whereas all black objects must be considered reachable.

One of the central data structures in this process, then, is the “gray set”, i.e. the set of all gray objects. In SGen it is implemented as a stack.

The roots

As I mentioned above as well as in my overview post, the collection starts with the roots. These include, most prominently, the references on the stacks of all managed threads and static class fields. In addition to that there are numerous other roots that the runtime registers and uses for internal purposes. Since SGen is a generational collector, the roots for the nursery collection also include all references from the major heap to objects in the nursery. These are kept track of with write barriers which I will discuss in a later blog post.

Forwarding

In a copying collector like SGen’s nursery collector, which copies reachable objects from the nursery to the major heap, coloring an object also implies copying it, which in turn requires that all references to it must be updated. To that end we must keep track of where each object has been copied to. The easiest way to do this is to use a forwarding pointer. In Mono, the first pointer-sized word of each object is a pointer to the object’s vtable, which is aligned to at least 4 bytes. This alignment leaves the two least significant bits for SGen to play with during a collection, so we use one of them to indicate whether that particular object is forwarded, i.e. has already been copied to the major heap. If it is set, the rest of the word doesn’t actually point to the vtable anymore, but to the object’s new location on the major heap. The second bit is used for pinning, which I will discuss further down.

The loop

Here is a simplified (pinning is not handled) pseudo-code implementation of the central loop for the nursery collector:

 1:  while (!gray_stack_is_empty ()) {
 2:      object = gray_stack_pop ();
 3:      foreach (refp in object) {
 4:          old = *refp;
 5:          if (!ptr_in_nursery (old))
 6:              continue;
 7:          if (object_is_forwarded (old)) {
 8:              new = forwarding_destination (old);
 9:          } else {
10:              new = major_alloc (object_size (old));
11:              copy_object (new, old);
12:              forwarding_set (old, new);
13:              gray_stack_push (new);
14:         }
15:         *refp = new;
16:     }
17:  }

In line 2 we pop an object out of the gray stack and in line 3 we loop over all the references that it contains. refp is a pointer to the reference we’re currently looking at. In line 4 we fetch the actual reference. We are only interested in collecting the nursery, so if the reference already points to the old generation we skip it (lines 5 and 6). If the object is already forwarded (line 7) we fetch its new location (line 8). Otherwise we must copy it to the major heap. To that end we allocate the space required (line 10), do the actual copying (line 11) and forward the object (line 12). That newly copied object must also be processed, so we push it on the gray stack (line 13). Finally, for both cases, we update the reference to point to the new location of the object (line 15).

Pinning

In the overview post I mentioned that we currently cannot scan stack frames precisely, i.e. we do not know whether a word that looks like a reference just does so by coincidence or is actually one. Work for scanning managed stack frames precisely is under way, but for unmanaged frames this is not possible anyway because the C compiler does not provide us with the necessary information (some runtimes avoid this problem by not exposing managed references to unmanaged code directly).

Because we lack this knowledge we must proceed conservatively and assume that what looks like a reference is one, which means considering the object it points to as reachable. On the other hand, since we cannot be sure, we are not allowed to modify this supposed reference, because it might just be a number. That implies that we also cannot move the object. In other words the object is “pinned”.

Pinned objects are a bit of a complication. Having pinned objects in the nursery means that we cannot clean it out completely and so it becomes fragmented. It also requires an additional check in the central loop.

Finishing up

After all the copying is done what’s left is to clean up the nursery for the next round of allocations. SGen doesn’t actually zero the nursery memory at this point, however. Instead that happens at the point when TLABs are assigned to threads. The advantage is not only that it potentially gets distributed over more than one thread, but also that is has better cache behavior – the TLAB is likely to be touched again soon by that thread.

What has to happen at this point, though, is to take score of the regions of nursery memory that are available for allocation, which means everything apart from pinned objects. Since we have a list of all pinned objects this is done very efficiently – I will discuss this “pin queue” in a later post. It is these “fragments” from which TLABs will be assigned in the next mutator round.

I should mention that there is a bit of additional work to be done prior to fragment creation, namely making sure unreachable finalizable objects are finalized and weak references are zeroed if the objects they pointed to have become unreachable. Both are very similar issues and shall be discussed in a later post.

Advertisements

SGen

Introduction

SGen is Mono’s new garbage collector that we have been working on intensely for almost two years now and that has been becoming stable and competitive during the past few months.

In this series of blog posts I will try to explain how garbage collection works in general, how SGen works in particular, how to get the best performance out of it and, finally, what our plans are for the future.

This first post will give a very brief overview over what a garbage collector does and how it does it, before outlining the broad architecture of SGen.

Why a new garbage collector?

Since its inception Mono, like many other garbage collected runtimes, has been using the Boehm-Demers-Weiser collector, which I shall refer to as “Boehm” henceforth. Boehm’s main advantages are its portability, stability and the ease with which it can be embedded. It is designed as a garbage collector for C and C++, not for managed languages, however, so it does not come as a surprise that it falls short for that purpose compared to collectors dedicated to such languages or runtimes.

Our goal with SGen was to overcome Boehm’s limitations and provide better performance for managed applications, in terms of allocation speed, collector pauses as well as memory utilization. In this and the following posts of this series I will mention points where we improve upon Boehm whenever appropriate.

Garbage collection

Before digging into the details of SGen it seems prudent to discuss how garbage collection actually works. I will only discuss topics that are relevant to SGen, and even paint those only with very broad strokes. For more comprehensive overviews of garbage collection see “Uniprocessor Garbage Collection Techniques” by Wilson or, for more detailed information, Jones and Lins’s book.

The garbage collector, in short, is that part of the language’s or platform’s runtime that keeps a program from exhausting the computer’s memory, despite the lack of an explicit way to free objects that have been allocated. It does that by discovering which objects might still be reached by the program and getting rid of the rest. It considers as reachable objects those that are directly or indirectly referenced by the so-called “roots”, which are global variables, local variables on the stacks or in registers and other references explicitly registered as roots. In other words, all those objects are considered reachable, as well as those that they reference, and those that they reference, and so on. The pedantic will note that the program might not actually be able to reach all of those. Computing the minimum set of objects that the program could reach is impossible, however, which we know thanks to this guy.

SGen, as well as Boehm, are “stop-the-world” collectors, meaning that they do their work while the actual program, slightingly called the “mutator” in garbage collection lingo, is stopped.

There are three classic garbage collection algorithms: Mark-and-sweep, Copying, and Compaction, of which only the first two are relevant to our discussion of SGen since we have not implemented a compacting collector and have no plans of doing so in the foreseeable future.

Mark-and-Sweep

Mark-and-sweep is the oldest and probably most widely implemented garbage collection algorithm. Boehm is a mark-and-sweep collector.

The idea is to have a “mark” bit on each object that is set on all reachable ones in the “mark” phase of the collection, by recursively traversing references starting from the root set. Of course in practice this is not actually implemented recursively, but by using a queue or stack.

The “sweep” phase then frees the memory occupied by the objects that were not marked and resets the mark bits on the marked ones.

Of course many variants of this algorithm exist that vary in details.

Copying

Traditional mark-and-sweep has two main weaknesses. First, it needs to visit all objects, including the unreachable ones. Second, it can suffer from memory fragmentation like a malloc/free allocator.

A copying collector addresses both problems by copying all reachable objects to a new memory region, allocating them linearly one after the other. The old memory region can then be discarded wholesale. The classic copying collector is the so called “semi-space” algorithm where the heap is divided into two halves. Allocation is done linearly from one half until it is full, at which point the collector copies the reachable objects to the second half. Allocation proceeds from there and at the next collection the now empty first half is used as the copying destination, the so-called “to-space”.

Since with a copying collector objects change their addresses it is obvious that the collector also has to update all the references to the objects that it moves. In Mono this is not always possible because we cannot in all cases unambiguously identify whether a value in memory (usually on the stack) is really a reference or just a number that points to an object by coincidence. I will discuss how we deal with this problem in a later installment of this series.

Generational garbage collection

It is observed that for most workloads the majority of objects either die very quickly or live for a very long period of time. One can take advantage of this so-called “generational hypothesis” by dividing the heap into two or more “generations” that are collected at different frequencies.

Objects begin their lives in the first generation, also referred to as the “nursery”. If they manage to stick around long enough they get “promoted” to the second generation, etc. Since all objects are born in the nursery it grows very quickly and needs to be collected often. If the generational hypothesis holds only a small fraction of those objects will make it to the next generation so it needs to be collected less frequently. Also, it is expected that while only a small fraction of the objects in the nursery will survive a collection, the percentage will be higher for older generations, so a collection algorithm that is better suited to high survival rates can be used for those. Some collectors go so far as to give objects “tenure” at some point, making them immortal so they don’t burden the collection anymore.

One difficulty with generational collection is that it’s not quite possible to collect a young generation without looking at the older generations at all, because the mutator might have stored a reference to a young generation object in an older generation object. Even if that young object is not referenced from anywhere else it must still be considered alive. Clearly scanning all objects in older generations for such references would defeat the whole purpose of separating them in the first place. To address this problem generational collectors make the mutator register all new references from old to young generations. This registration is referred to as a “write barrier” and will be discussed in a later installment. It is also possible to register reads instead of writes, with a “read barrier”, but this is impractical without hardware support. It’s obvious that the write barrier must be very efficient since it’s invoked for every write to a reference field or array member.

SGen

SGen, which historically stands for “Simple Generational”, is a generational garbage collector with two generations. The nursery is collected with a copying collector that immediately promotes all live objects, if possible, to the old generation (or “major heap”).

For the old generation we have two collectors that the user can choose between: A mark-and-sweep and a copying collector. The mark-and-sweep collector is available in four variants, with and without a parallel mark phase and with a dynamic or fixed-size heap.

In addition to that SGen has a separate space for large objects (larger than 8000 bytes), called the “Large Object Space”, or “LOS”, which is logically part of the old generation. Large objects are not born in the nursery but directly in the LOS, and they are not moved.

One major difficulty SGen has to deal with is objects that are “pinned”, which means that they cannot be moved. The reason for this is usually that they are referenced from stack frames that we cannot scan “precisely”, i.e. for which we do not know what is a reference and what is not. Work is currently under way to scan managed stack frames precisely, but we will always have to handle unmanaged stack frames, too, for which no type information is available and which can therefore only be scanned “conservatively”. More on this in one of the following posts.

Future installments

Having covered a few basic concepts of garbage collection I shall go into more detail about SGen in the blog posts following this one. Here is a tentative list of the posts I intend to write in the coming weeks. Comments and suggestions are of course welcome.

Allocating objects and the minor collector

How is the nursery organized, how are new objects allocated and how is the nursery collected?

The major collectors

An overview of SGen’s major collectors, mark-and-sweep and copying. I’ll discuss how they organize the heap and how they collect.

Sundry

Several shorter topics: Large objects, GC descriptors, conservative scanning, pinning, write barriers, finalization, weak references and domain unloading.

Debugging a garbage collector

Finding bugs in a garbage collector is ridiculously hard. Here I’ll describe some of the tools we use for assistance.

Getting the best performance out of SGen

How to fine-tune SGen for a workload and how to take write applications to take advantage of SGen’s performance characteristics.

The future

What’s in stock for SGen in the future?