SGen is Mono’s new garbage collector that we have been working on intensely for almost two years now and that has been becoming stable and competitive during the past few months.

In this series of blog posts I will try to explain how garbage collection works in general, how SGen works in particular, how to get the best performance out of it and, finally, what our plans are for the future.

This first post will give a very brief overview over what a garbage collector does and how it does it, before outlining the broad architecture of SGen.

Why a new garbage collector?

Since its inception Mono, like many other garbage collected runtimes, has been using the Boehm-Demers-Weiser collector, which I shall refer to as “Boehm” henceforth. Boehm’s main advantages are its portability, stability and the ease with which it can be embedded. It is designed as a garbage collector for C and C++, not for managed languages, however, so it does not come as a surprise that it falls short for that purpose compared to collectors dedicated to such languages or runtimes.

Our goal with SGen was to overcome Boehm’s limitations and provide better performance for managed applications, in terms of allocation speed, collector pauses as well as memory utilization. In this and the following posts of this series I will mention points where we improve upon Boehm whenever appropriate.

Garbage collection

Before digging into the details of SGen it seems prudent to discuss how garbage collection actually works. I will only discuss topics that are relevant to SGen, and even paint those only with very broad strokes. For more comprehensive overviews of garbage collection see “Uniprocessor Garbage Collection Techniques” by Wilson or, for more detailed information, Jones and Lins’s book.

The garbage collector, in short, is that part of the language’s or platform’s runtime that keeps a program from exhausting the computer’s memory, despite the lack of an explicit way to free objects that have been allocated. It does that by discovering which objects might still be reached by the program and getting rid of the rest. It considers as reachable objects those that are directly or indirectly referenced by the so-called “roots”, which are global variables, local variables on the stacks or in registers and other references explicitly registered as roots. In other words, all those objects are considered reachable, as well as those that they reference, and those that they reference, and so on. The pedantic will note that the program might not actually be able to reach all of those. Computing the minimum set of objects that the program could reach is impossible, however, which we know thanks to this guy.

SGen, as well as Boehm, are “stop-the-world” collectors, meaning that they do their work while the actual program, slightingly called the “mutator” in garbage collection lingo, is stopped.

There are three classic garbage collection algorithms: Mark-and-sweep, Copying, and Compaction, of which only the first two are relevant to our discussion of SGen since we have not implemented a compacting collector and have no plans of doing so in the foreseeable future.


Mark-and-sweep is the oldest and probably most widely implemented garbage collection algorithm. Boehm is a mark-and-sweep collector.

The idea is to have a “mark” bit on each object that is set on all reachable ones in the “mark” phase of the collection, by recursively traversing references starting from the root set. Of course in practice this is not actually implemented recursively, but by using a queue or stack.

The “sweep” phase then frees the memory occupied by the objects that were not marked and resets the mark bits on the marked ones.

Of course many variants of this algorithm exist that vary in details.


Traditional mark-and-sweep has two main weaknesses. First, it needs to visit all objects, including the unreachable ones. Second, it can suffer from memory fragmentation like a malloc/free allocator.

A copying collector addresses both problems by copying all reachable objects to a new memory region, allocating them linearly one after the other. The old memory region can then be discarded wholesale. The classic copying collector is the so called “semi-space” algorithm where the heap is divided into two halves. Allocation is done linearly from one half until it is full, at which point the collector copies the reachable objects to the second half. Allocation proceeds from there and at the next collection the now empty first half is used as the copying destination, the so-called “to-space”.

Since with a copying collector objects change their addresses it is obvious that the collector also has to update all the references to the objects that it moves. In Mono this is not always possible because we cannot in all cases unambiguously identify whether a value in memory (usually on the stack) is really a reference or just a number that points to an object by coincidence. I will discuss how we deal with this problem in a later installment of this series.

Generational garbage collection

It is observed that for most workloads the majority of objects either die very quickly or live for a very long period of time. One can take advantage of this so-called “generational hypothesis” by dividing the heap into two or more “generations” that are collected at different frequencies.

Objects begin their lives in the first generation, also referred to as the “nursery”. If they manage to stick around long enough they get “promoted” to the second generation, etc. Since all objects are born in the nursery it grows very quickly and needs to be collected often. If the generational hypothesis holds only a small fraction of those objects will make it to the next generation so it needs to be collected less frequently. Also, it is expected that while only a small fraction of the objects in the nursery will survive a collection, the percentage will be higher for older generations, so a collection algorithm that is better suited to high survival rates can be used for those. Some collectors go so far as to give objects “tenure” at some point, making them immortal so they don’t burden the collection anymore.

One difficulty with generational collection is that it’s not quite possible to collect a young generation without looking at the older generations at all, because the mutator might have stored a reference to a young generation object in an older generation object. Even if that young object is not referenced from anywhere else it must still be considered alive. Clearly scanning all objects in older generations for such references would defeat the whole purpose of separating them in the first place. To address this problem generational collectors make the mutator register all new references from old to young generations. This registration is referred to as a “write barrier” and will be discussed in a later installment. It is also possible to register reads instead of writes, with a “read barrier”, but this is impractical without hardware support. It’s obvious that the write barrier must be very efficient since it’s invoked for every write to a reference field or array member.


SGen, which historically stands for “Simple Generational”, is a generational garbage collector with two generations. The nursery is collected with a copying collector that immediately promotes all live objects, if possible, to the old generation (or “major heap”).

For the old generation we have two collectors that the user can choose between: A mark-and-sweep and a copying collector. The mark-and-sweep collector is available in four variants, with and without a parallel mark phase and with a dynamic or fixed-size heap.

In addition to that SGen has a separate space for large objects (larger than 8000 bytes), called the “Large Object Space”, or “LOS”, which is logically part of the old generation. Large objects are not born in the nursery but directly in the LOS, and they are not moved.

One major difficulty SGen has to deal with is objects that are “pinned”, which means that they cannot be moved. The reason for this is usually that they are referenced from stack frames that we cannot scan “precisely”, i.e. for which we do not know what is a reference and what is not. Work is currently under way to scan managed stack frames precisely, but we will always have to handle unmanaged stack frames, too, for which no type information is available and which can therefore only be scanned “conservatively”. More on this in one of the following posts.

Future installments

Having covered a few basic concepts of garbage collection I shall go into more detail about SGen in the blog posts following this one. Here is a tentative list of the posts I intend to write in the coming weeks. Comments and suggestions are of course welcome.

Allocating objects and the minor collector

How is the nursery organized, how are new objects allocated and how is the nursery collected?

The major collectors

An overview of SGen’s major collectors, mark-and-sweep and copying. I’ll discuss how they organize the heap and how they collect.


Several shorter topics: Large objects, GC descriptors, conservative scanning, pinning, write barriers, finalization, weak references and domain unloading.

Debugging a garbage collector

Finding bugs in a garbage collector is ridiculously hard. Here I’ll describe some of the tools we use for assistance.

Getting the best performance out of SGen

How to fine-tune SGen for a workload and how to take write applications to take advantage of SGen’s performance characteristics.

The future

What’s in stock for SGen in the future?

22 thoughts on “SGen

  1. Hi,

    First of I would like to thank you, and of course the Mono team, for bringing to the world a very nice and unique implementation of .Net Framework and C# Language on Unix/Linux/BSD-like systems. I’m following Mono since version 0.29 or so, and I can tell there are great evolutions in Mono since that early version.
    I’m completly fine with SGen and I’m eagerly awaiting for it to be complete and to work on many architectures.

    Since I’m a FreeBSD user, it’s using NPTL (Native Posix Threads Library).
    Also, I’m using 2 other Linux system : OpenSUSE, and ArchLinux, on Virtual Machines.
    For now Linux supports 2 threads library : LinuxThreads which is marked as “no longer supported” in “man pthreads” on Linux, and of course NPTL (glibc) which is supposed to be supported on Linux system with a kernel above 2.6.
    It seems Mono is still using LinuxThreads as a default, which Mono implementation may be more stable than the NPTL one.
    What I would like quite much is that NPTL implementation would be more complete in the future, since for now it is not : try SGen + pthread, it’s not working quite well *for now* (mark my word) on Linux (OpenSuse, Archlinux) and FreeBSD (x86/i366 and x86_64/amd64).

    I know that implementations may not be complete for now, that SGen is still under development.
    I’m just wishing for a better pthread (NPTL) support in SGen inside Mono in the future.

    I don’t know if you’ll cover a (small) part of your posts on the underlying threads system used by SGen, and there might be no use, but if you’ve got information on that, I’ll be delighted to read about that particular subject.

    And of course, nice post !
    (I hope my comment is not rude, I didn’t intend it to be rude.)

    • I don’t know enough about the differences between these thread implementations to answer this question satisfyingly, other than to say that the main issue with regards to thread implementations for SGen is the handling of TLS (thread local state). Patches are always welcome ;-)

      • Thanks for your answer :)
        Unfortunately I can’t patch myself this bug since I’m not qualified enough in the TLS. If I knew more NPTL/LinuxThreads and TLS in Mono, I would surely give a try at patching.
        I’ll try to find things about TLS in Mono by myself, that’s just reading code and trying to understand how it works.

  2. I’d be interested to learn why you chose 2 generations rather than the more obvious 3. When collecting your nursery, some fraction of those objects will survive, because they are currently referenced by stack frames and the like, but will still have a very short lifetime; they’re just those nursery objects unlucky enough to still be alive at the time you happened to choose to collect the nursery. Assuming that the optimal nursery size would be something near L2 cache (for efficient collecting), you’d be collecting it very often, and would end up leaking a lot of these “in flight” young objects who die very shortly after promotion – i.e. have an unpleasantly large number of middle-age-dying objects polluting your major heap. This will force you to (expensively) collect the major heap more often than is optimal, no?

    • One can always make this argument about any number of generations. You are right, though, that we should probably have three generations, like many current high performance VMs, and this is certainly something we will look into in the future.

      • There’s also the variant that the nursery is cut into several segments which are used like a ring buffer. Only the oldest segment is collected and recycled when the nursery is full. This could reduce the problem of some very-newborn objects to be matured immediately, just because they happen to be allocated when the nursery is nearly full.

        AFAIR, the .NET implementation has a variant using the server collector which simply allocates a new nursery, and lets the old one be collected asynchroneously.

  3. That’s a nice thumbnail of the difference between mark-and-sweep and copying (plus the generational extension of copying).

    I am also curious about what the triggers are for copying some or all of the nursery.

    I’m thinking there might be ways to maybe not be so hard hard on the nursery by exploiting some sort of zoning that reflects working-set locality properties that might not be too hard to track.

    I’ve never analyzed situations where write barriers were involved, so I am speculating wildly. That said, I wonder that there might be clues in write-barriers held by entities that may themselves be near end-of-life based on stack-frame nesting. Zone compaction and coalescence might be more appropriate as a way of pruning the nursery rather than wholesale migration to the big generation.

    • We had a prototype of a semi-space nursery collector once that would only promote the nursery objects once a certain fill threshold in the semi-space was reached. Interestingly, though, it turned out not to improve performance at all, so we concentrated our efforts on other issues. It didn’t take into account how many nursery collections an object had survived, though, i.e. once the threshold was reached, it would promote the whole nursery. Some form of aging might have improved it.

      One of our (current) shortcomings helps us here, though: We still scan stack frames conservatively, so we have to pin objects that are referenced from stack frames and cannot promote them. They are likely to be the youngest ones.

      I’m not sure I understand what you’re saying about using write-barrier/stack frame nesting information to guide promotion. Could you elaborate please?

  4. Afterthought: Still, wildly speculating, I wonder how well the runtime neonatal (on the stack) technique in Chicken Scheme would be adaptable in a VM to relieve the need to sweep the nursery so aggressively. (And I have no clue on the performance trade-offs.)

    • Chicken uses Baker’s incredibly clever and simple stack-as-heap technique to get proper tail recursion and first class continuations despite compiling to C. It requires CPS compilation, however, which makes it almost impossible to have unmanaged code that can call back to managed code – if you have to do a stack collection and there’s a few unmanaged stack frames sandwiched between managed ones, what do you do with them? You couldn’t even move them.

      There are other approaches that utilize the stack, though. Azul, for example, optimistically allocates objects on the stack and promotes them to the heap if a reference to them “escapes”. I believe they have hardware support for this, though.

  5. “If the generational hypothesis holds only a small fraction of those objects will make it to the next generation so it only needs to be collected less frequently”

    isn’t the idea, that it needs to be collected less frequently because objects making it to the second generation are though to be very (till the end of program) long lived? and not that there are just fewer of them?

    At least thats what i remember from msdn article about the .net gc. Or is the mono approach totally different?

    • Two factors make the old generation require less frequent collection. First, it grows less quickly. Even in the very worst case, namely that all objects survive a nursery collection, it will only grow as quickly as the nursery. Second, it is usually larger than the nursery, so there’s more space to fill up. How long its objects live doesn’t enter into it directly – if the heap is full, it’s full, and requires collection, period.

      Where object longevity does come into play is heuristics about when to do a major collection. The collector doesn’t actually usually wait until the heap is completely full, but sets a certain fill threshold at which it triggers. If it has information about how long the objects will live, that should affect the heuristic. For example, in the pathological case of all objects on the major heap living until the program halts, no major collection should ever occur because it couldn’t reclaim any space anyway. The hard part is getting that information, of course.

  6. If you have time, I would like more information on the following topics:

    – dig a bit more into how the list of roots is created. Is it created only once? How do you keep it updated? How is it used by SGen? How do you know an object is a root?
    – how to configure SGen’s log to know when it kicks in for a minor or major collection (for example).

    Also, it would be great if, when discussing the nursery for example, you could point to the related code inside the mono source code.

    Thank you very much.

    • There are two primary sources for roots from SGen’s perspective: Registered roots and thread stacks. The runtime can at any time register (and deregister) references as roots. It does that, for example, for static fields, i.e. global variables. Threads are also registered with SGen, and their stacks are scanned as roots. We don’t go through all objects and decide whether they are roots or not. I’m not sure whether that answers your first question to your satisfaction.

      As to the second question: just set “MONO_GC_DEBUG=1” and SGen will log each collection.

      SGen is not very well modularized, unfortunately. Some pieces, like the major collectors, are, others, like the nursery are not. Maybe I’ll give a short overview.

      • Alright. In SGen’s case, all (or most) roots are added using the “GC_add_roots” method from the libgc/mark_rts.c file?

        It would be really great if you had some time to give a short overview of the nursery with references to the code base.

        Thank you.

  7. Great to see Mono moving towards accurate garbage collection!

    Two things you might like to consider if you haven’t already are using a shadow stack as an easy way to make the GC accurate, and mark region GCs as an alternative to generational collection.

    You may be interested in the following blog post of mine about the performance overhead of the shadow stack in my HLVM project, which is quite reasonable for most workloads even though I’m stacking quad-word references (!):

    I’ve also been measuring the performance of a prototype mark-region collector recently:


Comments are closed.