Mono is a .NET jit-in-time compiler (JIT), which means that it translates .NET code (CIL code, to be precise) into native machine code on the fly. Something that happens very often is that it’s necessary to look up where the native code is for a given piece of .NET code (i.e. for a method). The opposite happens less frequently, namely looking up which method corresponds to a piece of native code. That need arises, for example, when walking the stack to find the method which should catch an exception, or when printing stack traces.
To that end Mono uses a lookup table which, until yesterday, required locking for readers and writers. Since the table isn’t used very often that’s not a real scalability problem. Paolo’s new garbage collector, however, requires unblocking read access to the table, so a new design was necessary.
The old JIT info table, as it is called, was a simple sorted array of pointers to the method’s JIT info structures, which in turn contain pointers to the methods for which they stand. The table was sorted by the addresses of the native code, so looking up a method given a native code address just required a binary search. Inserting and deleting elements required shifting elements up and down, but that was never a performance issue.
Being as lazy as I am, and considering that we always need to be careful about using space-efficient data structures, my first thought was whether it wasn’t possible to make the sorted array lock-free for readers. It turns out it is. Let’s pretend we only want to insert and never remove elements. If we are very careful we can insert an element and at each point during the process make the following two promises:
- The array is always in a sorted state.
- If an element is replaced by another one, then the replacement will be less or equal to the replaced element.
The implications of these two guarantees are that we can still do a binary search on the array while elements are being added. The worst thing that can happen is that we end up not exactly at the element we’re looking for but a little bit to the left of it (but never to the right), so we have to do a quick local linear search afterwards.
The way to add an element and keep those promises is by doing it in three steps:
- Find the position where to put the new element.
- Shift up the elements to the right of it, one by one, starting at the end of the array and working towards the left.
- Put the new element in place.
As an example, let’s take this array:
2 3 5 7 11 17 19 23
and insert the number 13. The first step is easy and can be implemented with a binary search: the place for the new element is between 11 and 17. Now we shift up the elements one by one, first 23, then 19, then 17:
2 3 5 7 11 17 19 23 23
2 3 5 7 11 17 19 19 23
2 3 5 7 11 17 17 19 23
Note that we end up with duplicate elements during the process, but the array remains sorted. Now we just need to insert the 13 at its correct place and we end up with:
2 3 5 7 11 13 17 19 23
So far so good. Now, how do we delete elements? Doing the same process in reverse breaks our promises and the binary search wouldn’t work anymore without locking. For such cases lock-free algorithms use a trick called “tombstones”. Instead of removing the element, we turn it into a tombstone, essentially just marking it dead. It’s still there, so our promises still hold, but when we look for an element we have to make sure to ignore the tombstones.
Using tombstones leads to another problem, however, namely the fact that if we never really remove elements the table can only grow and never shrink. Upon closer inspection we have to deal with a similar problem anyway, namely an overflow of the array, which can only be allocated with a limited size. In such a case we simply allocate a new, bigger, array, copy the whole table over there, without the tombstones, and then, in one atomic step, replace the old table with the new one. We have to make sure not to throw away the old table just yet, though, because there might still be other threads reading from it. That’s a problem I’ll be talking about further down.
The shifting up step in the insertion process has to be done in exactly the way described, so memmove() cannot be used because there’s no guarantee over how it goes about its business. Furthermore, we have to make sure that the writes to the array are visible to all other threads in the correct order, which require a write barrier after every shift step, and that’s very expensive. Too expensive, as it turns out, because that implementation results in a significant overall slow-down.
To avoid shifting so many elements Paolo suggested splitting up the table into a number of small fixed-size chunks. The implementation is quite similar to a B+-tree, the main difference being that it is always two levels deep and the root node can grow as big as necessary. In this data structure, insertions are local, so only a few elements need to be shifted. When an overflow in one of the chunks occurs, we have several strategies:
First, we count the number of non-tombstone elements in the whole table. If it’s below a certain mark (relative to the capacity of the table) or above a certain mark, we reallocate it completely, making sure that not too much space is wasted and that we still have enough room to grow.
If we don’t reallocate the whole table we have to deal with the overflowing chunk locally. We inspect it to see if it contains any tombstones. If it does, we reallocate only the chunk, leave out the tombstones and insert the new element.
If the chunk does not contain any tombstones it really is full. In that last case we split it up into two chunks, which also requires reallocating the root node, but none of the other chunks. There might even be a way to handle this case without reallocating the root node, using an insertion strategy similar to the one we use in the chunks, but I didn’t go that far.
Using this new data structure improves performance to the point where there shouldn’t be a noticable difference to the old, locking, implementation.
I’ve mentioned above that when we replace the table with a new copy we have to make sure that we free the old table only when there are no more reader threads using it. To achive this without locking we use hazard pointers. A hazard pointer is a very cheap and simple way for a thread to signal that it’s using some data structure. Read Maged Michael’s paper for details
There are some further details I won’t go into here. For example, the JIT info table doesn’t map from exact locations to methods, but from code regions, i.e., intervals, which complicates things a bit, but not very much.