The Best Books I’ve Read This Year


Maybe 15 years ago I read William Gibson’s Neuromancer and hated it. At some point after that my brain started confusing Gibson with Neal Stephenson, with the result that I would not touch any of Stephenson’s books, until this year when finally an overwhelming number of recommendations accumulated. I relented and picked up “Anathem”, expecting confused and unfinished sentences, when instead I was hooked right from the start.

“Anathem” is set on a world that is home to an ancient system and culture of secular[1] monasteries, dedicated to science and philosophy, and carefully sealed from the rest of civilization. Most of the plot plays out in those monasteries, slowly revealing their weird culture, such as how even the monasteries are strictly separated into parts that open every ten years, every hundred years, and every thousand years, respectively.

The civilizations outside the monasteries are volatile, sometimes reverting from a modern culture to medieval barbarism within decades, it seems, something that’s never explained. The book starts off with a tenner monk interviewing somebody from the outside world, in preparation for the opening of the tenner-part of the monastery, leading off with a question from a script for making contact, “Do your neighbors burn one another alive?”

During the time the monastery is open, a significant and shocking discovery is made, the nature of which would be a spoiler if you haven’t read it, but it took me from appreciating the world Stephenson had built, and being curious about its many intricacies to being totally engrossed in the plot and longing for explanations, which Stephenson, through his characters’ philosophical discussions, does provide.

I proceeded to read “Cryptonomicon” and “Seveneves”, but out of those three “Anathem” was by far the strongest book for me.

Zero to One

This is a book about entrepreneurship and innovation, by Peter Thiel, one of PayPal’s founders. He puts out a lot of ideas, not all of which I agree with, but many of which I found good food for thought.

One of Thiel’s central theses is that only monopolies can innovate. Businesses in competition have too much on their hands just to survive and cannot do long-term development:

In business, money is either an important thing or it is everything. Monopolists can afford to think about things other than making money.

A monopoly is not necessarily a huge business. Most successful start-ups begin very small but they’re the only ones providing a particular service. In their domains, they are monopolies.

To get there, according to Thiel, you need a definite outlook on the future—having an idea of where things are going. He believes that most of Western society today is indefinite, explaining the prevalence of finance:

Finance epitomizes indefinite thinking because it’s the only way to make money when you have no idea how to create it.

But in an indefinite world, people actually prefer unlimited optionality; money is more valuable than anything you could possibly do with it.

We place far too much importance on “well-rounded” education:

a definite person determines the one best thing to do and then does it. Instead of working tirelessly to make herself indistinguishable, she strives to be great at something substantive—to be a monopoly of one.

This resonates with me, having had to waste time at school studying parochial subjects such as Austrian history and geography or Latin, when I would have been better off just doing math and English, and having the rest of my time for playing with computers.

Another central theme in the books is secrets. A secret is something you know, but nobody else yet knows, or believes. It’s an essential ingredient to a start-up. Discovering secrets might seem daunting, but Thiel is optimistic:

There’s plenty more to learn: we know more about the physics of faraway stars than we know about human nutrition.

Thing Explainer

Bill Gates wrote a wonderful review of Randall Munroe’s “Thing Explainer”, a book that uses illustrations and only the one thousand most used English words to explain difficult concepts and products.

The language is not just a gimmick. Avoiding difficult terminology makes Munroe reduce the inferential distance to his readers, making even difficult concepts easy to grasp. His explanation of turbofan jet engines—sky boat pushers—is much better than the one on the Wikipedia page.

Sometimes the reduction of vocabulary makes banal statements utterly hilarious, like when, describing “bags of stuff inside you”, he writes about the mouth:

It’s where air goes in and out, food goes in, and words come out. Note: Some people don’t like it when you make words come out while you’re putting food in.

Aside from being entertained, I learned quite a few things. For example, in nuclear reactors, there is a room below the core:

If there are problems and everything is on fire, the special metal can get so hot that it starts moving like water. Sometimes, it can get hot enough to burn a hole through the floor. If that happens, this room is here so the watery metal can fall down and then spread out over the floor.

It’s good if the metal can spread out, since when it’s all close together, it keeps making itself hotter. If this room ever gets used, it means everything has gone very, very wrong.

I read the Kindle edition on my iPad, which worked very well, except that the page explaining operating rooms—rooms for helping people—was broken. I hope that’ll get fixed in an update soon.

Rationality: From AI to Zombies

This book is a compilation of Eliezer Yudkowsky’s writings on the topics of rationality and artificial intelligence. It contains a lot of wisdom. As far as I know all of it can be found online, but the book provides a nice structure.

I have highlighted and annotated more passages in this book than in any other by far. This is but a small selection.

Yudkowsky is big on separating the map, your beliefs, from the territory, reality. A lot of “philosophy” and deep-sounding “wisdom” vanishes when you do, like some people’s love of mystery:

But ignorance exists in the map, not in the territory. If I am ignorant about a phenomenon, that is a fact about my own state of mind, not a fact about the phenomenon itself. A phenomenon can seem mysterious to some particular person. There are no phenomena which are mysterious of themselves. To worship a phenomenon because it seems so wonderfully mysterious is to worship your own ignorance.

The ontological argument is just one weird application of this confusion, treating the map (your idea of God) like territory that just lacks existence. It’s as if the shape I drew into the Pacific Ocean part of my globe were a full-fledged continent whose only drawback was that it doesn’t exist.

This carries over to other topics, like in his discussion of quantum mechanics (he’s a firm proponent of the Many-worlds “interpretation” of QM, as he should be):

Quantum physics is not “weird.” You are weird. You have the absolutely bizarre idea that reality ought to consist of little billiard balls bopping around, when in fact reality is a perfectly normal cloud of complex amplitude in configuration space. This is your problem, not reality’s, and you are the one who needs to change.

If we want our map to reflect the territory as best we can, we better use words whose boundaries are boundaries in the real world, too. Hence we should be careful about how we define words:

Your brain doesn’t treat words as logical definitions with no empirical consequences, and so neither should you. The mere act of creating a word can cause your mind to allocate a category, and thereby trigger unconscious inferences of similarity. Or block inferences of similarity; if I create two labels I can get your mind to allocate two categories.

Many chapters on cognitive biases and how to combat them. For example, when confronted with a problem, hold off on proposing solutions, because

Once you can guess what your answer will be, you have probably already decided. If you can guess your answer half a second after hearing the question, then you have half a second in which to be intelligent. It’s not a lot of time.

Some beliefs are very important to us, but yet we might be wrong. How can we even begin to evaluate those beliefs dispassionately?

You would advise a religious person to try to visualize fully and deeply the world in which there is no God, and to, without excuses, come to the full understanding that if there is no God then they will be better off believing there is no God. If one cannot come to accept this on a deep emotional level, one will not be able to have a crisis of faith. So you should put in a sincere effort to visualize the alternative to your belief, the way that the best and highest skeptic would want you to visualize it.


it is wisest to regard our past selves as fools beyond redemption— to see the people we once were as idiots entire. […] As long as we are making excuses for the past, trying to make it look better, respecting it, we cannot make a clean break.

This obviously applies to non-religious beliefs, too.

A lot of discussion about politics:

Politics is an extension of war by other means. Arguments are soldiers. Once you know which side you’re on, you must support all arguments of that side, and attack all arguments that appear to favor the enemy side; otherwise it’s like stabbing your soldiers in the back—providing aid and comfort to the enemy.

This is why I have a hard time watching documentaries on politicized issues. They are usually done by a proponent of one side of the issue, and they will lie through their teeth to make their side look good. In reality, it’s rarely the case that one proposed solution has no downsides:

If you defend yourself, you may have to kill. If you kill someone who could, in another world, have been your friend, that is a tragedy. And it is a tragedy. The other option, lying down and dying, is also a tragedy. Why must there be a non-tragic option? Who says that the best policy available must have no downside?

Somewhat related to this is a post by Scott Alexander, whose blog you should definitely read.

Something that I worry about, too, in the chapter Superstimuli and the Collapse of Western Civilization:

If people have the right to be tempted—and that’s what free will is all about—the market is going to respond by supplying as much temptation as can be sold.

Paul Graham also wrote about this.

A big thing Yudkowsky is wrong about is Bayesianism and positivism. The central idea of positivism:

Above all, don’t ask what to believe—ask what to anticipate. Every question of belief should flow from a question of anticipation, and that question of anticipation should be the center of the inquiry.

As David Deutsch points out, this very idea, that only beliefs about physical observables are in any way relevant, is itself a belief that is not about physical observables, shooting itself in the foot.

The Bayesian comes through here:

Rationality is not for winning debates, it is for deciding which side to join.

Sometimes, all sides are wrong, and only a new idea will help, yet Bayesianism is very quiet on the topic of the generation of new hypotheses. Again, see David Deutsch for more arguments why science is not Bayesian.

Becoming Steve Jobs

Walter Isaacson’s biography was badly received by the cognoscenti, so I didn’t bother reading it and can’t draw any comparison’s to Brent Schlender and Rick Tetzeli’s “Becoming Steve Jobs”, but I can definitely recommend the latter.

This is not a collection of anecdotes about Jobs, although it does have its fair share. It’s the story of a person who learned and changed.

The story starts with the visionary product creator who led the development of the Apple II and the Macintosh but failed to command the then much larger company Apple, ultimately being fired by the board. With NeXT he created yet another technological masterpiece but failed to bring the company to success, for a variety of reasons, while at the same time helping Pixar become what it is today. He emerged having learned from his failures, taking the helm at Apple again, but this time succeeding spectacularly, having become the Steve Jobs he’s remembered for now.

What stayed most with me from this book was Jobs’ ability to completely change his mind, very quickly, and damn the sunk costs:

On the car ride over to the prototype hangar, Johnson told Steve that he thought they’d gotten it all wrong. “Do you know how big a change this is,” Steve roared. “I don’t have time for this. I don’t want you to say a word to anyone about this. I don’t know what I think of this.” They sat for the rest of the short ride in silence.

When they arrived at the hangar, Steve spoke to the assembled group: “Well,” he said, “Ron thinks we’ve designed our stores all wrong.” Johnson waited to hear where this line of thought would go. “And he’s right,” said Steve, “so I’m going to leave now and you should just do what he’s going to tell you to do.” And Jobs turned around and left.

  1. Stephenson doesn’t help by calling the world outside the monasteries “saecular”, when it clearly isn’t secular. It takes a while to get used to his invented words here, but it’s worth it.  ↩

Polish History

Often, when working on private branches with long histories, it so happens that a commit somewhere in between breaks the build, or some tests don’t pass anymore. If you know which commit it is you can fix it with git-rebase. But maybe you don’t know. Maybe you aren’t even sure whether all the commits in your history build and pass the tests.

git-polish-history to the rescue!

Let’s say you are on a private feature branch with quite a few commits that you are unsure about. The last one that’s been pushed is last-pushed-commit. To check whether your code compiles and passes all the tests you can run make check, but you don’t want to do that manually for each commit, so let’s automate it:

git polish-history --test="make check" start last-pushed-commit

git-polish-history will now go through all commits since last-pushed-commit and run make check on them. On the first one that fails it will stop and tell you to fix the problem. You can do this any way you want as long as you commit your changes. Typically you’d amend the last commit. Once you’re done, you do

git polish-history continue

and the process continues. It will stop again whenever a commit fails the test, when a merge conflict occurs, or when it’s done. If you want it to do the committing of your changes automatically, use

git polish-history continue --automatic

Here’s a video with a demonstration:

Associated Types Considered Weird

In my first post on Swift I expressed mild bewilderment at Apple’s decision to make protocols generic not by giving them generic type arguments, like C# does, but via what they call “associated types”. To reiterate the example, the Sequence protocol in Swift is defined as

protocol Sequence {
    typealias GeneratorType : Generator
    func generate() -> GeneratorType

whereas if Swift had generic type arguments in protocols (as it does in classes), the definition would look more like

protocol Sequence<T> {
    func generate() -> Generator<T>

When playing around with a set implementation that Alexis Gallagher posted, I came across a problem with associated types: it exposes more type information than the user of a class needs, and worse, more than the user of a class should have.

Here’s how Alexis implemented the generate method on his set type:

func generate() -> SetGeneratorType<KeyType> {
    let dictGenerator = self.dictionaryOfItems.generate()
    return SetGeneratorType<KeyType>(dictGenerator)

The type that generate is actually required to return (to adhere to the Sequence interface) is Generator. It’s not possible, however, (as far as I can tell) to declare generate as such:

func generate() -> Generator

The reason being that Generator here is under-specified—it must be a Generator whose associated type Element is KeyType, but that can’t be specified. If Swift had generic type arguments for protocols, it would simply be Generator<KeyType>.

The problem here is that the set type needs to publicly expose an implementation detail, namely the concrete type by which it implements its Generator. If that implementation detail changes, the method’s type changes, and code might break.

This problem is exacerbated by the fact that users of the generator might have to depend on its type. As an example, look at my simplification of Alexis’ code:

typealias GeneratorType = MapSequenceGenerator<Dictionary<KeyType,

func generate() -> GeneratorType {
    return dictionaryOfItems.keys.generate()

The body of the function explains the weird GeneratorType: The dictionary has a keys property that is exactly what I need—a sequence of all the elements in the set. The type that I’m returning now is the generator type of that sequence, instead of my own, and I must specify the concrete return type. The only way to get the concrete type, however, is to look up the type that keys returns, which means that I now depend on an implementation detail of Dictionary, and, making matters worse, I’m surfacing it to my users.

For comparison, let’s look at how this would work it Swift had generic type arguments for protocols. Sequence and Generator would be defined as

protocol Sequence<T> {
    func generate() -> Generator<T>

protocol Generator<T> {
    func next() -> T?

Dictionary’s keys property would be

var keys: Sequence<KeyType> { get }

And this would be our set’s generate method:

func generate() -> Generator<KeyType> {
    return dictionaryOfItems.keys.generate()

No leaked implementation details and no unneeded dependencies.

Generators in Swift

In my last post on Swift I examined sequences and generators. Today we’ll implement our own sequence.

Many languages have a function like Clojure’s interpose, which takes an item and a sequence, and inserts the item between each two subsequent elements of the sequence:[1]

Array(interpose("in", ["San Francisco", "California", "the USA"]))
=> ["San Francisco", "in", "California", "in", "the USA"]

There are three things we need here: The interpose function itself, which must return a Sequence, which, in turn, generates a Generator, which does the actual work of producing the items.

Let’s start with the Generator:

struct InterposeGenerator<G: Generator, T where T == G.Element>
        : Generator {
    let sep: T
    var gen: G
    var needSep: Bool
    var nextOrNil: T?

    init (sep: T, gen: G) {
        self.sep = sep
        self.needSep = false
        self.gen = gen
        self.nextOrNil =

    mutating func next() -> T? {
        if needSep {
            needSep = false
            return sep
        } else {
            let n = nextOrNil
            if n {
                nextOrNil =
                needSep = nextOrNil != nil
            return n

InterposeGenerator is a little state machine that alternates between producing either an element of the original generator or the separator. A bit of care is required to make sure that it doesn’t produce the separator again after the generator is exhausted.

One noteworthy thing here is the invocation of in init. If we try to call next on the constructor argument gen instead of the property self.gen we get a compiler error:

immutable value of type 'G' only has mutating members named 'next'

This makes sense: the Generator is a value type, and it’s passed in here as an argument, which apparently makes it immutable, and next is a mutating function. At least, that is how it would typically happen—the compiler doesn’t seem to have any information to that effect, though. Generator is a protocol, which can be implemented by reference as well as value types:

protocol Generator {
    typealias Element
    func next() -> Element?

As you can see, next is not marked as mutating, either. I suspect there are two things going on here: the protocol definition that Xcode shows is probably incorrect, in that the mutating modifier is missing. The definition that Swift actually uses contains the modifier. In addition, the compiler doesn’t know whether the Generator we are using to create InterposeGenerator is a value type or not, so it must assume that it is, for the sake of safety in the case of mutating functions.

Let’s move on to the Sequence. It doesn’t really have to do anything other than creating an InterposeGenerator when asked to, so apart from all the type fiddling involved it’s rather simple:

struct InterposeSequence<S: Sequence, T
                         where T == S.GeneratorType.Element>
        : Sequence {
    let sep: T
    var seq: S
    func generate () -> InterposeGenerator<S.GeneratorType, T> {
        return InterposeGenerator(sep: sep, gen: seq.generate())

Finally, the interpose function just creates an InterposeSequence:

func interpose<S: Sequence, T
               where T == S.GeneratorType.Element>
        (sep: T, seq: S) -> InterposeSequence<S, T> {
    return InterposeSequence(sep: sep, seq: seq)

It seems to be a pattern in the base library for the Sequence and Generator to be implemented in the same type, which we could easily do here, as well. I’ll leave it as an exercise for the reader.

Languages with a yield operator, like C# or Python, allow sequence generation to be expressed much more succinctly, with the compiler taking over the task of managing the state machine:

static IEnumerable<T> Interpose<T> (T sep, IEnumerable<T> seq) {
    var en = seq.GetEnumerator ();
    if (!en.MoveNext ())
        yield break;
    for (;;) {
        yield return en.Current;
        if (!en.MoveNext ())
            yield break;
        yield return sep;

  1. It doesn’t modify the original sequence, of course, but rather returns a new one.  ↩

Playing with Swift

I really like what we’ve seen so far of Swift, Apple’s new programming language. For starters, I was happy to learn that references cannot be nil[1], and integer arithmetic is checked by default[2].

Let’s look at how Swift deals with sequences, and learn a little bit about generics on the way. Arrays, for example, are sequences, so you can iterate over them with a for loop:

for x in [1,2,3] {

Lots of other types are sequences, too, like strings, dictionaries and ranges. What makes them sequences is that they all implement the Sequence protocol:

protocol Sequence {
    typealias GeneratorType : Generator
    func generate() -> GeneratorType

I’m not sure why Swift elects to do generic protocols via typealias instead of just having the protocol take type arguments, like C# does, which might look something like this:

protocol Sequence<T> {
    func generate() -> Generator<T>

Whatever the reason might be, we can see that a sequence can generate a Generator, which is defined as follows:

protocol Generator {
    typealias Element
    func next() -> Element?

Generators have a next method which returns either the next element in the sequence, or nil, when the end has been reached. The for loop from above is probably mostly equivalent to

var gen = [1,2,3].generate()
while let x = {

The equivalent of Sequence and Generator in C# are IEnumerable and IEnumerator.

The map function takes a sequence and a function, returning a new sequence containing the results of applying the function to the elements of the given sequence. For example:

for x in map([1,2,3], {i in i*2}) {



It’s important to realize that the sequence which map returns is not necessarily of the same type as the one it is passed:

println(map([1,2,3], {i in i*2}))


[1, 2, 3]
VSs17MapCollectionView (has 2 children)

Also, sequences can generate their elements lazily:

var seq = map([1,2,3], { (i: Int) -> Int in
    println("processing " + String(i))
    return i*2
var gen = seq.generate()


processing 1

Let’s write a function that takes a sequence and returns an array containing all the sequence’s elements!

First we need to figure out the type of this function. It turns out that this doesn’t work:

func toArray (seq: Sequence) -> seq.GeneratorType.Element[]

The compiler complains that seq is an undeclared type. This doesn’t work, either:

func toArray (seq: Sequence) -> Sequence.GeneratorType.Element[]

Here the complaint is that Sequence.GeneratorType is ambiguous. That’s because we can’t tell the compiler that the two Sequence types are the same. Maybe we can use generics to solve the problem:

func toArray<S:Sequence> (seq: S) -> S.GeneratorType.Element[] {
    var arr = S.GeneratorType.Element[]()
    return arr

Swift is now happy with the type of the function, but unfortunately it doesn’t let us instantiate the array. Instead it thinks we’re trying to subscribe S.GeneratorType.Element, which isn’t possible, because it’s a type and not a value. As far as I can tell, the only way to get around this problem is to introduce a second generic argument, which we can force to the correct type via a where clause:

func toArray<S : Sequence,
             T where T == S.GeneratorType.Element>
        (seq : S) -> T[] {
    var arr = T[]()
    for x in seq {
    return arr

println(toArray(map([1,2,3], {x in x*2})))

And there we have it:

[2, 4, 6]

  1. Swift would be the first mainstream programming language that corrects Tony Hoare’s “billion dollar mistake”, after barely 60 years of all others repeating it.  ↩

  2. It seems we’re finally learning that it’s better for a program to halt than to give away all your secrets and then launch all the nuclear missiles in the case of a bug.  ↩

Stacks and Signals

Now and then we get bug reports of test cases crashing on a specific configuration in an indeterminate manner or frequency, both from our own build bots as well as from our users. One of those I looked at recently involved a runtime test case on Linux/AMD64. The test would only crash infrequently, but luckily I could reproduce it in a VM.

What the program does is run two threads, one of which causes null reference exceptions in a loop over and over again:

while (!finished) {
    object o = null;
    try {
        o.ToString ();
    } catch {

The other thread also runs a very simple loop, manually invoking the garbage collector again and again:

while (i < niter) {
    if (i % 100 == 0)
        Console.Write (".");
    GC.Collect ();

Mono generates null reference exceptions by just letting the code trigger a segmentation violation signal, which we catch in a signal handler that then proceeds to throw the NRE. This approach, as opposed to doing a null check whenever we access an object, is that there is no overhead at all in the case the reference is legit, and the code we generate is shorter.

Garbage collection, on the other hand, also involves signals: stopping the world for a collection on Linux is done by sending a signal to each managed thread. Each thread catches its signal, which tells it that it has to suspend until the GC has done its work.[1]

When we look for the cause of a bug like this the first thing we try to do is catch the crash in a debugger, in this case GDB. Unfortunately, the crash was caused by a segmentation violation that our signal handler cannot handle (because it doesn’t occur in managed code), but a regular run of the program also triggers segmentation violations like crazy, so just breaking at each SIGSEGV didn’t get us anywhere.

My first attempt at getting to the crashing SIGSEGV was to write an Expect script that drove GDB to the first SIGSEGV, presumably caused by the null reference in the managed code loop, remember the offending address and then keep continuing until it hit a SIGSEGV at a different address. Sadly (and quite typical of cases like this), running the code that way in GDB never succeeded in triggering the bug, so we had to think of other options.

At first we thought the crash was caused by the interaction of the two signals (the segmentation violation from the null reference and the stop-the-world signal), but we looked into other hypotheses. We tried two different ways of getting rid of the segmentation violation: using explicit null checks to trigger NREs and just throwing an exception manually without accessing a null reference.

Explicit null checks (enabled with the environment variable MONO_DEBUG=explicit-null-checks) makes the JIT emit less efficient and more elaborate null checks whenever an object is accessed, so an NRE can be thrown without going through a SIGSEGV.

The other way to get rid of the SIGSEGV while still retaining most of the code’s behavior is to just throw an exception, like so:

while (!finished) {
    var e = new Exception ();
    try {
        throw e;
    } catch {

We figured it probably didn’t matter which type the exception was, and indeed it did not: both with explicit null checks as well as with the manual exception throw the crash still occurred. We also looked into whether invoking the garbage collector was critical to the crash, and it turned out it was: just making the other thread sleep instead of calling the GC made the crash go away. Also, despite what the bug report claims, the crash was not dependent on SGen—it occurred with the Boehm collector, as well.

Armed with this new data we set out to finally get a close look at the crash in the debugger. Since the crash still only occurred very infrequently I brought my Expect script to bear again, and after a while we did get the crash in GDB. It did, regrettably, give us less information that we had hoped: instead of crashing somewhere in unmanaged code while dereferencing an invalid pointer, the crash resulted from the program counter being set to an invalid address, namely zero.

There are three main ways of setting the program counter on AMD64: CALLing a function, RETurning from a function and JMPing to an address. Spending some time looking at the stack dump it seemed that the most likely cause was a jump, which did not make our job easier, since it leaves no trace of where it came from.

After some more tinkering we decided to investigate whether the point at which the exception throwing thread was interrupted by the GC played any role. We got lucky—it so happened that every time the test crashed, the last time the thread was stopped was always in the exact same piece of code; even more, at the exact same instruction. This is the code in which the thread suspension occurred, with the instruction where it happened labeled:

    mov    %rdi,%r11
    mov    (%r11),%rax
    mov    0x10(%r11),%rcx
    mov    0x18(%r11),%rdx
    mov    0x8(%r11),%rbx
    mov    0x20(%r11),%rbp
    mov    0x30(%r11),%rsi
    mov    0x38(%r11),%rdi
    mov    0x60(%r11),%r12
    mov    0x68(%r11),%r13
    mov    0x70(%r11),%r14
    mov    0x78(%r11),%r15
    mov    0x28(%r11),%rsp
    mov    0x80(%r11),%r11
    jmpq   *%r11

This piece of code is executed as the last step of throwing an exception. It jumps to the catch handler, which involves first restoring a bunch of registers, including the stack pointer, and then doing the actual jump. We guessed that this jump to register r11 was the one that landed us at address zero. Observing in a few more crashes that r11 was indeed always zero corroborated that hypothesis.

The code that calls this restore_context() function looks like this:

mono_amd64_throw_exception (MonoObject *exception, ...)
    MonoContext ctx;
    ... figure out where on the stack the matching catch clause is ...
    ... set ctx to the values required to end up in that catch block ...
    restore_context (&ctx);

Note that the context—the data structure that contains all the values for the registers, including the program counter—is a local variable of this function, i.e., it resides in this function’s stack frame. Said stack frame is on the same stack as the catch handler that it eventually dispatches to, but further down, of course. We pass the address of the context to the above restore_context() function.

It seemed obvious now that for whatever reason the value for the program counter was set to zero in the context, so we added an assertion to check for that directly before the call to restore_context(). The assertion never failed, but the program still crashed.

Looking at the assembly code again Rodrigo suddenly proclaimed that he knew exactly what caused the bug: after setting the stack pointer to the catch handler’s frame we did another load from the context, which we weren’t allowed to do. By moving the stack pointer upward the context ended up lying not above the stack pointer, where it used to be, but below:

Most ABIs specify that programs must assume that data below the stack pointer (plus the red zone) is subject to arbitrary change because it’s used by the operating system for invoking signal handlers, for example—exactly what happened in our case with the thread suspend signal.

The correct way to do the context restore, then, is to load everything from the context before restoring the stack pointer. In our case we use the scratch register r8 to hold the value that will end up in rsp:

    mov    0x28(%r11),%r8
    mov    0x80(%r11),%r11
    mov    %r8,%rsp
    jmpq   *%r11

It’s somewhat embarrassing that we could already generate the correct code, but only did so in case we were running on Valgrind, because it gave us a warning about it that we apparently thought was a false positive.

Looking at our code for x86, after fixing AMD64, we found that we had the same bug there, but nobody had reported it yet, probably because it was harder to trigger.

  1. Reality is a tiny bit more complicated: Sending a signal to a thread is asynchronous, meaning that the GC has no direct way of knowing that a thread has already gotten the signal and reacted to it. Instead, the suspend signal handlers message back to the GC that the point of suspension has been reached, upon which they actually suspend, waiting for yet another signal from the GC, telling them to resume. Note that if the resume signal arrived between the point where the GC is notified of the suspension and the point of initiating the wait for said resume signal, the signal would be lost and the thread would never resume.  ↩

How to Split a Pizza

Assume two people want to split a pizza in a fair way. In our context, fair means that if both people follow the procedure, each person gets half a pizza, or, if not, only has themselves to blame. The obvious answer seems to be “split it in the middle”, but that leads to all kinds of further questions and complications: Who does the splitting? How is “the middle” determined? Even if we had protocols for that, it would still leave the possibility of one person not being satisfied with how they were carried out—sloppy measurements or sloppy cutting.

The standard way to avoid all such problems is to let one person cut the pizza in two halves, and the other person choose who gets which half. If the cutter thinks they got the smaller piece it’s because they didn’t cut fairly or precisely enough, and if the chooser thinks their piece is smaller, they should have chosen the bigger one.

What if there are three people? Before we try to come up with a procedure, let’s consider people’s motivations first. In the case of two people, whatever one person wins, the other loses. With three people, one person might elect to get a smaller piece for themselves in order to also make one of the other two worse off. Or, more realistically, two people might collude against the third. We’d like a procedure that ensures that even under those circumstances, fairness is guaranteed.

One generalization of the two-people procedure that comes to mind is to let one person—let’s call them “A”—cut, and then let the two others choose, first B, then C. Unfortunately, if A cuts in such a way that there is one large piece and two pieces that are each less than 1/3, and B chooses the large piece, C doesn’t get their fair share (neither does A, but that’s A’s fault). Randomizing the order of cutter and choosers is not a solution, either, because there is still the possibility of somebody getting the short end of the stick.

Here’s a procedure that works: A cuts the pizza, B chooses their slice. C has two options: they can either choose one of the remaining two slices, or they can call shenanigans. In the latter case, B picks a second slice[1], then both slices are put together and C divides them into two halves. B picks one half, C gets the other.

To see why this procedure is fair, let’s consider each person in turn. A will always get the piece that B and C didn’t pick, so if A cut fairly, they get their fair share. B will either get the piece they pick, or at least half of two pieces they pick, so if they pick the two largest pieces, they will get at least one third of the pizza. C will either get the piece they pick, or, in the case of shenanigans, half of the two pieces that B picked. It’s not obvious that they will get at least a third in the latter case. We assume that C only calls shenanigans if the two pieces that are left after B picked one are both smaller than one third—if one piece is at least a third, C should of course just pick it, thereby getting their fair share. The larger of the two pieces will have size 1/3-x, the smaller one 1/3-x-y, where x>0 and y>=0. If the two pieces are the same size, y is zero. The piece that B picked therefore has size 1/3+2x+y. B now picks one of the smaller pieces. If B wants to minimize C’s share, B must pick the smallest piece, i.e.1/3-x-y, which then gets put together with B’s original piece for C to cut. The two pieces together have size 2/3+x, so if C cuts fairly, they will actually get a size larger than their fair share.

I don’t know whether this procedure can be generalized to a larger number of people.

Update: As many commenters have thankfully pointed out, I have solved a problem that smarter people have already solved for more general cases. For a good overview of a few methods, read this article. My personal favorite is the Last Diminisher algorithm. The method I came up with is a variant of the three-person Lone Divider procedure.

  1. It is important that it is B, not C, who picks the second slice when C calls shenanigans.  ↩

SGen – Concurrency and Evacuation


In my last blog post about SGen, Mono’s garbage collector, I wrote about concurrent mark-and-sweep, which makes most of the garbage collection work happen concurrently with the running program.

One thing we cannot do in a concurrent collector, as opposed to a stop-the-world collector, is move objects around in memory[1]. This is typically not a problem in a mark-and-sweep collector, but SGen goes beyond basic mark-and-sweep and prevents fragmentation by evacuating objects when the need arises.

Fragmentation is no less of an issue in a concurrent collector, so being unable to do evacuation is problematic. Our best solution at this point is to stop the world when it happens[2]. The threshold we use to decide when to do evacuation with concurrent mark-and-sweep is higher than on our non-concurrent collector, but there is still a chance that a full collection pause will hit the program when interactive speeds are desired.


We’re introducing a new API in the Mono.Runtime class. The application can use it to ask the GC not to do full major collection pauses:

if (!Mono.Runtime.SetGCAllowSynchronousMajor (false))
        Console.WriteLine ("Sorry, the GC won't cooperate.");

// your low-latency code here

Mono.Runtime.SetGCAllowSynchronousMajor (true);

One possible use case would be a game, where you want the lowest latency possible while the player is in a level, but between levels, or during loads, a longer collection pause is not much of an issue. Note that there are still no guarantees about the lengths of the pauses—this is a best-effort approach, not a real-time garbage collector.

This API will be available in the next Mono release, but only via reflection. It will be a public API on mobile soon.

  1. This is technically possible, but it involves using a read barrier, which would not only be forbiddingly complicated to implement in Mono, but would also probably incur a significant performance hit as well as break our embedding API.  ↩

  2. An approach that would keep marking mostly concurrent would be to decide which blocks to evacuate before starting concurrent mark, and then during marking to keep tabs on the references from already marked objects to objects within those blocks. In the finishing collection pause we could evacuate the marked objects in those blocks, but we’d have to update the references to them. Depending on the workload that might still require a significant pause.  ↩

SGen – Concurrent Mark


Like most garbage collectors, SGen is of the stop-the-world variety, which means that when the time has come to perform its work, it stops all managed threads in their tracks and only resumes them once it has finished.

Nursery collections are usually relatively short affairs, and can be sped up even further by using a smaller nursery size. Major collections, on the other hand, can take a long time because they must scan every single live object in the system. Long garbage collection pauses are clearly a serious problem for many applications, be they interactive or on the server, and garbage collector engineers make considerable efforts to shorten pause times.

One approach to this is to do parts of the major collection concurrently with the running program, aka the mutator. SGen’s major collector, mark-and-sweep, consists of two main phases, unsurprisingly called mark and sweep.

Thanks to Zoltan’s recent work—lazy sweep—SGen already does most of sweep’s work outside of major collection pauses. I have been working on the other part of the story for the past few weeks: concurrent mark.

How it works

Our implementation of concurrent mark is very similar to that of Sun JVM’s concurrent mark-and-sweep (CMS) collector.

A major garbage collection starts with an initial marking pause, during which all roots are scanned.[1] Then it restarts the world and in the background a worker thread performs the marking and scanning of major heap objects, in the usual manner. When the concurrent marking phase is done, the collector stops the world again to finish up the collection in the final marking pause.

Dealing with concurrent mutation

Consider the following code, which manipulates a linked list, swapping the order between the two nodes following a:

1   b = a->next;
2   c = b->next;
3   a->next = c;
4   b->next = c->next;
5   c->next = b;

Let’s assume that right after a->next has been set in line 3, the concurrent worker thread scans and marks the node referenced by a, then follows next and processes c. If at that point our program hasn’t come around to setting c->next to b in line 5, the garbage collector will never encounter b—it will have been lost because for a brief moment that node wasn’t part of the list.

We can recruit the help of our friend the write barrier to deal with this issue, but we need to extend its scope a little: instead of just recording stores that point from the major heap to the nursery, it must now also record stores pointing from the major heap to the major heap. In other words: all reference stores that happen in the major heap, no matter where they point to, must be recorded.[2]

With this new write barrier the store to c->next will record the node c as requiring a re-scan. This re-scanning of all modified major heap objects happens in the final marking pause, so it’s not further disturbed by the mutator.

What if the final marking pause happens before c->next is set in line 5, however? The node b might not be pointed to by any object in the heap, so we won’t find it that way, yet it is still live. It’s live because it’s referenced by the variable b on the stack, so the final marking pause must also re-scan all roots.[3]

Nursery collections

Nursery collections that happen while concurrent mark is running stop the world, but concurrent mark keeps chugging along. It never follows references into the nursery anyway, so there is little potential for conflict.

There is an issue with the card table, however: nursery collections clear the card table, but its data is required during the final marking pause, which might only occur a few nursery collections later. The solution to this is to maintain a separate card table that accumulates the marked bits from the “regular” card table, and to use this union table in the final scan.


This graph shows maximum pause times for a few of the programs in our GC benchmarks suite. All of the pause times are relative to Boehm’s, so values below 1 are better than Boehm’s. Note that the values correspond to the longest stop-the-world pause that occurred when running the program, not the median or some average.

The three SGen configurations are standard mark-and-sweep in blue, concurrent mark-and-sweep in green and concurrent mark-and-sweep in conjunction with cementing, in yellow.

The programs represented in this graph are those whose maximum pause times with Boehm were longer than 10ms. The ones with the longest pause times are on the left—on graph4 Boehm had a pause of 2.5s—which are those that benefit most from concurrent mark, as expected: the longest pause on graph4 is less than 180ms with concurrent mark. I have not yet investigated why graph8 is such an outlier.

  1. For concurrent mark the roots not only include the usual suspects like thread stacks and global variables, but also nursery objects. In fact, the initial marking pause performs a nursery collection, after which only pinned objects will be left in the nursery. Those are treated as roots, or rather the references they contain are.  ↩

  2. Since we use a single card table for both kinds of references, this will increase the number of cards marked that don’t have references pertinent to nursery collections. The card scanning overhead will therefore increase, but in practice this doesn’t seem to be a problem. An alternative would be to use two separate card tables, one for references to nursery objects and the other for references to major heap objects. That would be more expensive both in terms of memory usage as well as in write barrier performance.  ↩

  3. This entails doing a full mark-scan loop, of course, which might still take a very long time in pathological cases. Imagine a program that keeps a huge linked list around, referenced only by a single object on the major heap. At some point concurrent mark kicks in. The program now loads the reference to the list into a local variable and nulls the reference on the heap. The garbage collector will not be able to mark the list until it scans the stack in the final marking pause, and will now have to traverse the list with the world stopped.  ↩

SGen – The Write Barrier


In previous installments of my blog post series on SGen, Mono’s generational garbage collector, I wrote about the major and minor heaps and their organizations. What I have for the most part neglected to pontificate on is how they interact with each other.

The advantage of generational garbage collectors comes from being able to collect the minor heap, also referred to as the nursery, very quickly because it doesn’t have to look at the objects in the major heap. There is a problem, however, which this illustration from my talk on SGen explains:

Let’s say we collect the nursery, starting from the single root reference. We find a graph of four live objects, colored blue, and promote them to the major heap, discarding all the others, including the three white ones. But wait—there’s a reference in some object on the major heap pointing to one of the objects we have just destroyed! This cannot stand.

Where did this reference come from? Consider the previous nursery collection. We’ll ignore pinning for now, so after that collection the nursery was empty. The mutator (that’s what garbage collector people affectionately call the pesky thing that insists on messing with the heap and that most people refer to as the program) allocated that white object, and stored a reference to it in an object on the major heap—in that order, but not necessarily in direct succession.

One way to handle those references is to scan the whole major heap for them and to treat them as roots. That would make nursery collections as slow as major collections, so it’s not a good idea. What’s needed is a way to record all such references happening between nursery collections. The nursery collector would then process them like ordinary roots and everything would be in order again.

Write barriers

Write barriers[1] do exactly that: On every store of a reference in an object they check whether the reference resides in the major heap and points to the nursery. If those conditions are satisfied, the reference is recorded for the benefit of the next nursery collection. This sounds like an awful overhead for each store, but it really isn’t, and for almost all applications it’s far outweighed by the advantages from generational collection.[2]

A note on terminology: The literature sometimes uses “write barrier” to refer to just the code that identifies references to remember, and sometimes to refer to both the code that identifies as well as records those references. I shall be using the term in the latter sense.

SGen’s write barrier is a so-called card table. We also implement an alternative, a sequential store buffer, which I will not discuss here.

Card tables

The basic idea is to conceptually divide the major heap into small, equally sized regions, called cards, and to have one bit per card that is set—the card is marked—when a reference to a nursery object is written to it.[3]

The advantages of this scheme are that it requires only a handful of instructions to record a store, so it can be inlined, and that its space overhead is small and proportional to the heap size. The disadvantage is that it doesn’t record the exact locations of the references we’re interested in.

In SGen’s case, cards are 512 bytes large, so the nursery collector needs to scan roughly that amount of memory to potentially locate a single reference.[4] The references it does find it treats like ordinary roots: the referenced objects are promoted to the major heap and the references are updated to reflect their new locations. The newly promoted objects are then scanned in turn, of course.

Once all marked cards have been scanned there are no more references from the major heap to the nursery, so all cards mark bits are cleared, to be ready for the next round.

Pinned objects

So far we have been assuming that when we encounter a reference from the major heap to the nursery, we can always move the nursery object to the major heap, thereby rendering the bothersome reference harmless.

As I explained in the blog post on the nursery, sometimes objects become pinned, which implies that they cannot be moved. This threatens to confound our cunning plan: we can only clear the card mark bits because there are no references left that we need to keep track of. The solution here is simple: Cards containing references to pinned objects must stay marked so that they will be processed again during the next nursery collection, in the hopes that the object referred to will no longer be pinned.

This can lead to another potential problem: Assume that some nursery object remains pinned, and that there are lots of objects in the major heap that refer to it.[5] Lots of cards would remain perpetually marked and nursery collections would become dominated by card scanning. That is in fact what we noticed on two benchmarks in our small GC benchmark suite—nursery collections took much longer than expected and the programs took twice as long or more to run with SGen than with Boehm.


I just implemented a new feature—not yet pushed to master—that solves this new problem in a very simple way. I call it “cementing”, because it institutionalizes bothersome pinned objects: if we encounter a certain number of references from the major heap to a particular pinned nursery object, we just give up on it. Instead of hoping that it will become unpinned soon, we cement it—the punishment for badly behaved pinned objects. A cemented object will always be pinned at the start of future nursery collections as a matter of principle. The consequences of this are that it will be treated as a root, and that it will not be moved, which in turn means that we don’t have to worry about references to it from the major heap any more: the object is kept alive and the references can stay the same. Cards with references to cemented objects can be cleared, reducing the card scanning load.

Major collections give us a blank slate because we need to scan all live major heap objects anyway, so we use them as opportunities to reset cementing, giving those poor objects another chance after all.

The preliminary results from cementing are encouraging. It doesn’t seem to introduce performance regressions, and for the two benchmarks affected, it works wonders:

The red bars represent the wall clock run time with Boehm. The green bars are SGen run times without cementing, and blue are SGen with cementing.

  1. There is at least one other meaning for the term “write barrier”, namely a machine instruction (or series of instructions) that ensures that all stores preceding it are seen by other threads before the stores following it. Those are not the barriers we’re looking for.  ↩

  2. An alternative way of solving the reference problem is a read barrier. Read barriers make it possible to write moving concurrent collectors, a magnificent example of which is Azul’s C4 collector. If you want a glimpse of the future of garbage collection, read the paper.  ↩

  3. SGen uses a whole byte per card for marking. Not only can whole bytes be written easier than single bits, but multiple threads can do so without conflicts between contiguous ones. If we used individual bits, we’d have to use CAS to avoid losing marks by other threads.  ↩

  4. Nitpickers will notice that a marked card might not even contain a single reference to a nursery object—the mutator can write a reference, hence mark the card, and then later overwrite it with null, for example.  ↩

  5. A simple way to trigger this is to allocate some object high up on the call stack, keep a reference to it on the stack, and then to construct a long list, with every node containing a reference to said object. Keep the list around and its nodes will eventually be promoted, staining the major heap with perpetually marked cards.  ↩