Fast Virtual Generic Calls

Until about a week ago, Mono did not have an efficient implementation of virtual generic method calls. Unlike non-virtual generic methods, it is not know at compile-time which method must be called, so some kind of look-up needs to be performed. However, unlike non-generic virtual methods, there can be more than one instantiation of a particular virtual generic method per class, and it is not known in advance how many or which instantiations will be needed, so a traditional virtual method table cannot be used. Mono’s previous solution was to do the look-up dynamically, by calling into unmanaged code, which is very inefficient. According to a simple benchmark I wrote, performing a generic virtual call was about 200 times slower than performing a non-generic virtual call.

My first stab at implementing a fast virtual generic call mechanism involved the runtime generic context data structure I implemented for generic code sharing. The idea was to use the RGCTX as an extensible virtual generic method table. Whenever a new instantiation of a virtual generic method was needed, we would reserve a new slot for it in the RGCTX. The performance this approach achieved was quite impressive, making generic virtual calls only about 3 times slower than non-generic virtual calls, but memory usage made it impractical. For only 64 different instantiations of virtual generic methods one benchmark required 46k more memory than with the old implementation.

Another implementation idea came from Paolo, who wanted to reuse the machinery we have for doing interface method lookups. Every class’s VTable has a fixed number of slots set aside for interface methods. These slots are used as a hash table, with pointers to the individual interface method’s metadata as the keys. Since the interface method to be called is known at compile-time (but not which particular implementation of it), the hash can be computed then and used as a constant index into the hash table. Thus, an interface method call is done in almost the same way as a “normal” virtual method call. Problems occur only when more than one method hashes to the same slot. In that case, a small piece of code, called an “IMT thunk”, is generated, which resolves the lookup by doing a hard-coded search for the method’s implementation, given the method metadata pointer, which is passed as an implicit argument in interface calls.

Applying this technique to virtual generic methods requires reserving a single VTable slot for each virtual generic method, which points to a thunk that performs the look-up of the actual implementation, given its type arguments, which we represent by a single pointer. In contrast to interface methods, where we have a complete list of all methods, the required instantiations of a virtual generic method become know one by one during run-time, so we do a fall-back into unmanaged code if the required instantiation is not found. There the thunk might be re-built to include the new instantiation.

Currently the policy for deciding which instantiations to include in the thunk is very simple: If an instantiation was invoked at least 10 times then it will be included. The potential danger of this policy is that the thunks might grow too big. We discussed a few alternatives, like having an upper limit on the number of instantiations and including only the most frequently used ones, but decided to keep the current implementation until we run into problems.

Performance numbers are a bit harder to come by for this implementation since the speed of the look-up depends on the size of the thunk. For small thunks the performance seems to be comparable to that of my first approach, i.e. about 3 times slower than non-generic virtual calls. Interface calls are not supported, yet.

Advertisements

One thought on “Fast Virtual Generic Calls

  1. What about using multiple class descriptors (including vtable) for the same class? For example if List and List don’t share all their virtual methods, create two different full class descriptors, one for List and List.

    I’m sure you’ve considered this; I bring it up only because I’m curious to know why this doesn’t work for you guys. Too much memory spent on class descriptors? Or maybe I misunderstand the problem?

Comments are closed.