Juggling, Photography, Software, and Atheism

May 16, 2009

Wolfram Alpha disappoints

Filed under: Uncategorized — by schani @ 3:19 pm

Wolfram Alpha, the much-hyped “computational knowledge engine” by Wolfram Research, the people behind Mathematica, went online last night. I played a bit with it today, and I am disappointed. Alpha is not what it’s cracked up to be.

Alpha is hard to figure out

Alpha is supposed to figure out what you mean by your query and present the answer. Unfortunately, as with every natural language processing system in existence, it often fails to understand you. In the end, what that means is that you have to learn how Alpha parses language and adjust your input accordingly.

For example, I tried to get Alpha to give me a plot of infant mortality vs gross domestic product. It doesn’t seem to be able to do that, but that’s not the point. Rather, my attempts illustrate the problems with Alpha’s language parsing.

One of the examples for Alpha’s usage is the comparison between the Pioneer 10 and 11 space probes. You can enter, for example, “Pioneer 10 vs Pioneer 11″, or “Pioneer 10, Pioneer 11″ and get the same nice comparison tables. Enter “infant mortality vs gdp”, however, and you get the result “1 child deaths | $54.61 trillion per year”. I don’t know what that even means. Enter “infant mortality, gdp” and you get “5.499×10^7 person children/yr | $54.61 trillion per year”, which seems to be the number of infant deaths per year in the whole world (but isn’t) and the sum of the GDP’s of all countries. Enter “plot of infant mortality, gdp” or “plot of infant mortality vs gdp” and you get the same nonsensical “1 child deaths | $54.61 trillion per year” answer, but in a slightly different table form.

Interestingly, if you’re interested in mortality rather than infant mortality, Alpha will give you the plot if you ask for “mortality vs gdp” or “mortality, gdp”.

Upon closer inspection the problem is that Alpha knows nothing about infant mortality (even though that data is available in Mathematica’s CountryData) but interprets “infant mortality” as “child multiplied by mortality”. The one number it gave that seemed like it might be the number of infant deaths per year was actually the total number of deaths per year, multiplied by one child, i.e. pure nonsense.

Another thing I tried was something similar to an example they give for finding all words ending with “ager”, the query for which is “words ending with ager”. What if I want to find all words containing “ager” instead? The search “words containing ager” gives a list of words which contain each of the letters in “ager”, but not necessarily in that order. How to get the correct answer, or rather how to correctly pose the question, is anybody’s guess.

Alpha gives wrong data

Not only does Alpha sometimes give nonsensical answers which look like they might mean something, it also presents completely wrong data now and then. For example, when asking for information on the Spanish language, it tells us that there are 321.6 million native speakers and 322.3 million speakers in total, which would mean that only 700 thousand people speak Spanish as a second language, or only 0.2% of the total number of speakers. I find that very hard to believe. According to Alpha, that proportion is 36% for English, 17% for Russian, 12% for German, over 8.6% for Italian and nearly 6% for Turkish. If we are to believe Alpha there are more people who speak Czech as a second language than there are who speak Spanish non-natively.

But wait – Alpha has a feature which lists the sources of its data! Clicking on that link gives us a list of “Background sources and references” (among them Wikipedia), with the nice hint that those sources might not actually be where the data is from. As “Primary source” is lists only one item: “Wolfram|Alpha curated data”. In other words, you can either believe Wolfram or find the correct information yourself.

Alpha doesn’t give all data

To add insult to injury, Alpha also fails to indicate that the data it has is incomplete. Looking up the German language we are informed that it has 60% lexical similarity to English and 29% to French. No other languages are given as points of comparison, leading one to believe that those are the two top ones. However, German is much more closely related to, for example, Dutch, but apparently Alpha doesn’t have that data, and it doesn’t tell us, either.

Alpha isn’t hyperlinked

There are almost no hyperlinks or further explanations in the data Alpha spits out, not even for the properties of whatever it is that is presented. One of the pieces of information on the icosahedron, for example, is its dual polyhedron. What is a dual polyhedron? Alpha has no idea. Wikipedia knows.

Alpha is incomplete and very biased

When asked about “Turing”, Alpha spits out a short table with biographical data about Alan Turing. Regarding his work all it seems to know is that he was a mathematician. “Turing machine” produces no data at all and neither does “Lambda calculus”, whereas Alpha seems to know a whole lot about cellular automata, Stephen Wolfram’s pet subject.

And, possibly worst of all, Alpha knows nothing about juggling or siteswaps.

May 12, 2009

Austria officially prefers superstition to science

Filed under: Atheism, Politics — by schani @ 9:20 pm

Austria cannot affort to participate in CERN anymore. We simply cannot spare the 17 million Euros per year to finance the probably most important and exciting experiment in physics right now because we need to relieve the tax burden on membership fees to, say, a homophobic organization whose leader proclaims that condoms will increase the spread of HIV. Or to an organization that maintains that using blood transfusions to treat an ill child is immoral. And those tax reliefs cost us 30 million Euros, so sorry, science – we can’t afford you anymore.

December 25, 2008

A Solver for Puzzle Quest and Aurora Feint

Filed under: Software — by schani @ 9:07 pm
Tags: ,

Puzzle Quest and Aurora Feint are two very similar games. In both, you have to match three or more square blocks with the same color horizontally or vertically by exchanging two neighboring blocks. Those matched blocks are then eliminated from the board and the blocks above them fall down to take their place. Both games have special pre-built maps which challenge the player to clear all blocks. In Puzzle Quest the difficulty is that only moves which result in matched blocks are permitted, whereas Aurora Feint requires that the challenge be completed with a certain number of moves or less.

Being too lazy to think for myself I wrote a program to solve those maps for me. I have set up a public Mercurial repository for it at http://hg.13thfloor.at/hg/puzzlequest/.

Here’s how to solve an Aurora Feint puzzle with a maximum of two moves:

$ ./puzzlequest -a -d2 <af-puzzles/fall_rain.2
. . . . . .
. . . . . .
. . . . . .
. . . g . .
. . b g . .
. g b b . .
 
2,1 -> 3,1
 
. . . . . .
. . . . . .
. . . . . .
. . . g . .
. . g b . .
. g b b . .
 
3,0 -> 4,0
 
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .

Coordinates are given in the form column,row, with the lower left block as the origin (0,0).

The program is written in Haskell. It’s only my second program written in that language, and it shows: it’s neither elegant nor efficient. My first Haskell program was an interpreter of a small subset of Scheme, which I’ve made a point of implementing as the first thing in any new programming language I learn, a practice I can highly recommend. Here are interpreters I’ve written in Forth and O’Caml.

November 25, 2008

Jim Purbrick

Filed under: Photography, Software — by schani @ 6:54 pm

Jim Purbrick

Jim Purbrick of Linden Lab, the guy who made Second Life run on Mono, shot after the Mono Summit 2007. Yes, I’m a year behind with my photo processing.

More Mono people

November 9, 2008

Shooting Jonglissimo

Filed under: Juggling, Photography — by schani @ 2:13 am
Tags: ,

Last weekend I had the pleasure of photographing my friends Christoph and Manuel Mitasch, aka Jonglissimo, a world-class club passing duo, holders of four club-passing world records and two-time IJA stage competition gold medal winners.

We did the first shooting session at the Stephansplatz at around three in the morning. It was seriously cold but – our reason for shooting at such an uncommon hour – there were almost no people there:

Photo

Photo

After a few hours of sleep we had another session in a studio, where I learnt a few things: Motion-freezing juggling photos tend to suck. Even if you shoot from an uncommon angle they’re barely more than tolerable:

Photo

Shooting with glowing props is much more interesting:

Photo

Five club double backcrosses – I should have shot this from a ladder and with a longer exposure:

Photo

Club swinging sucks, but it looks kinda cool:

Photo

Shooting people while they’re jumping is fun:

Photo

Even more so if you throw stuff at them while they’re in the air:

Photo

Obligatory back-lit portrait:

Photo

Manuel:

Photo

More photos of jugglers.

Faster Synchronized Methods

Filed under: Software — by schani @ 1:23 am
Tags: , ,

Until about a week ago Mono’s Monitor implementation, which is used for synchronized methods and the lock statement, was comparatively slow. Each Monitor.Enter and Monitor.Exit called into unmanaged code, which is very inefficient compared to a normal method call. We improved this situation by implementing fast-paths that can be called cheaply and that can handle most cases.

The fast-paths come in two varieties: Portable fast-paths, implemented in CIL code, and native fast-paths, in hand-optimized assembler code. We only have assembler fast-paths for x86 and AMD64, as those are the most-used architectures that Mono runs on.

I did some micro-benchmarking to compare the fast-paths with the old unoptimized implementation. Each of the tests does 100 million calls to synchronized methods. Two of the tests call static methods, the other two non-static ones, and two of the tests call empty non-recursive methods whereas the other two call recursive ones.

Here are the results for my x86 machine (2.16 GHz Core Duo):

Chart

The blue bars represent our old implementation. The orange bars stand for the IL fast-paths and the yellow bars for the assembler fast-paths. The green bars represent the run-times for non-synchronized methods.

The results for my AMD64 machine (1.86 GHz Core 2) look quite similar:

Chart

To finish things off I also ran the benchmark on Microsoft .NET on my x86 machine. Here is the comparison to Mono with the assembler fastpaths:

Chart

October 12, 2008

Fast Virtual Generic Calls

Filed under: Software — by schani @ 5:02 pm
Tags: , ,

Until about a week ago, Mono did not have an efficient implementation of virtual generic method calls. Unlike non-virtual generic methods, it is not know at compile-time which method must be called, so some kind of look-up needs to be performed. However, unlike non-generic virtual methods, there can be more than one instantiation of a particular virtual generic method per class, and it is not known in advance how many or which instantiations will be needed, so a traditional virtual method table cannot be used. Mono’s previous solution was to do the look-up dynamically, by calling into unmanaged code, which is very inefficient. According to a simple benchmark I wrote, performing a generic virtual call was about 200 times slower than performing a non-generic virtual call.

My first stab at implementing a fast virtual generic call mechanism involved the runtime generic context data structure I implemented for generic code sharing. The idea was to use the RGCTX as an extensible virtual generic method table. Whenever a new instantiation of a virtual generic method was needed, we would reserve a new slot for it in the RGCTX. The performance this approach achieved was quite impressive, making generic virtual calls only about 3 times slower than non-generic virtual calls, but memory usage made it impractical. For only 64 different instantiations of virtual generic methods one benchmark required 46k more memory than with the old implementation.

Another implementation idea came from Paolo, who wanted to reuse the machinery we have for doing interface method lookups. Every class’s VTable has a fixed number of slots set aside for interface methods. These slots are used as a hash table, with pointers to the individual interface method’s metadata as the keys. Since the interface method to be called is known at compile-time (but not which particular implementation of it), the hash can be computed then and used as a constant index into the hash table. Thus, an interface method call is done in almost the same way as a “normal” virtual method call. Problems occur only when more than one method hashes to the same slot. In that case, a small piece of code, called an “IMT thunk”, is generated, which resolves the lookup by doing a hard-coded search for the method’s implementation, given the method metadata pointer, which is passed as an implicit argument in interface calls.

Applying this technique to virtual generic methods requires reserving a single VTable slot for each virtual generic method, which points to a thunk that performs the look-up of the actual implementation, given its type arguments, which we represent by a single pointer. In contrast to interface methods, where we have a complete list of all methods, the required instantiations of a virtual generic method become know one by one during run-time, so we do a fall-back into unmanaged code if the required instantiation is not found. There the thunk might be re-built to include the new instantiation.

Currently the policy for deciding which instantiations to include in the thunk is very simple: If an instantiation was invoked at least 10 times then it will be included. The potential danger of this policy is that the thunks might grow too big. We discussed a few alternatives, like having an upper limit on the number of instantiations and including only the most frequently used ones, but decided to keep the current implementation until we run into problems.

Performance numbers are a bit harder to come by for this implementation since the speed of the look-up depends on the size of the thunk. For small thunks the performance seems to be comparable to that of my first approach, i.e. about 3 times slower than non-generic virtual calls. Interface calls are not supported, yet.

September 12, 2008

Hack Week 3: MathMap

Filed under: Photography, Software — by schani @ 5:39 pm
Tags: , , ,

MathMap Screenshot

I’ve used this Hack Week, like the last one, to work on my long-time pet project MathMap, culminating in two releases, 1.3.3 and 1.3.4.

MathMap 1.3.3, released early in the week, features an optically much improved composer, implemented by Herbert Pötzl, as you can see in the screenshot above.

Also new is a Gaussian blur filter which can be used in the composer as well as in hand-written filters. What’s noteworthy about this filter is that it’s the first filter that is not implemented in the MathMap language, but rather in MathMap’s implementation language, C, which is why I call it a “native filter”. The reason for this is that the MathMap language, which is very similar in spirit to pixel shader languages (but more powerful), does not lend itself well to filters which cannot be efficiently calculated for each pixel separately. The only way to implement it would be for every pixel to be calculated to loop over its pixel region (the size of which depends on the blur radius) and sum up the pixels.

At the end of the week I released MathMap 1.3.4, which, apart from bug fixes and performance improvements, adds yet another native filter (three, actually), namely convolution. It doesn’t realize its full potential yet, because there’s no sane way of providing pixel-precise convolution kernels, but I’ll fix that sooner or later.

To give you a feel for the new features of MathMap I’ve put together a new screencast, despite grave technical difficulties. It focusses mostly on the composer and the new native filters. I’m also presenting an interesting way to make black and white images which you might find interesting independent of MathMap. Have fun!

June 27, 2008

Another Generic Sharing Update

Filed under: Software — by schani @ 7:37 pm
Tags: , ,

Since my last report on generic code sharing I chased down a few bugs we uncovered when trying out IronPython 2.0. That new version uses the Microsoft Dynamic Language Runtime, which extensively utilizes generics. One issue we came across was how to figure out the actual method for a delegate when only the native code pointer (acquired with ldftn) and no target class is given. For example:

public class Gen<T> {
  public void work () { ... }
}

With generic code sharing the methods Gen<string>.work and Gen<object>.work will share the same native code, so given only a pointer to it it’s not possible to differentiate between the two. What one could do to make it possible to tell between the two would be to let ldftn produce a pointer not to the method directly but to a small piece of trampoline code for which there is one for each instantiation of the method. Fortunately it seems like we don’t have to bother with that, since the .NET CLR doesn’t either. Instead it gives you the instantiation of the method where all type arguments are object, so we do the same.

Another thing I did was implement sharing of methods of generic value types. There doesn’t seem too much code out there which utilizes generic value types extensively, but it wasn’t a big deal to implement so I went ahead and did it. Since instances of value types don’t contain VTable pointers we need to pass the runtime generic context (RGCTX) explicitly for all methods, like we do for static methods of reference types. One complication that arises here is when the value type implements an interface. When casting such a value type to the interface type it gets boxed and receives a VTable for the interface methods. Since the caller of those methods doesn’t know it’s dealing with a value type, much less which particular one, it cannot pass the RGCTX, so the methods in the interface VTable need a wrapper which will pass it. This is very similar to the wrapper we use when taking the address of a static method of a reference type (for constructing a delegate, for example).

I’ll end with an updated table of memory statistics for a few test applications. “Nemerle” is the Nemerle compiler compiling itself. “IronPython 2.0″ is running pystone. “F# 1.9″ is running a simple “Hello world” program on the command line and “F# 2.0″ is compiling a simple program.


No sharing Sharing
Methods
compiled
Code
produced
Methods
compiled
Code
produced
Memory for
(M)RGCTXs
Savings
Nemerle 7127 2008k 6159 1895k 23k 90k
IronPython 2.0 9060 1607k 5833 1011k 42k 554k
F# 1.9 15268 2187k 9828 1659k 111k 417k
F# 2.0 27186 3781k 15828 2830k 239k 712k

June 2, 2008

Sharing Generic Methods

Filed under: Software — by schani @ 7:12 pm
Tags: , ,

All of my posts on sharing generic code so far have been about sharing non-generic methods of generic classes, like this one:

class Gen<T> {
  public T [] newArr (int n) {
    return new T [n];
  }
}

But what about generic methods, like this one:

class Gen<S> {
  public Dictionary<S,T> newDict<T> () {
    return new Dictionary<S,T> ();
  }
}

If we want to share this methods between different instantiations, i.e. different type values of T, we need to provide a place for the code to look up the type of Dictionary<S,T>. This place cannot be the runtime generic context, because the data in there only depends on the type arguments of the class, i.e. S, but not of generic methods.

Our solution is to introduce a data structure very similar to the runtime generic context, called the method runtime generic context, or MRGCTX. It is associated not with generic classes and their type arguments, like the RGCTX, but with generic methods and their type arguments. We use the same MRGCTX for generic methods of a specific class if the method type arguments are the same. As an example, these methods would all share the same MRGCTX:

Gen<object>.foo<object,string> ()
Gen<object>.bar<object,string> ()
Gen<object>.quux<object,string> ()

while no two of these methods would use the same MRGCTX:

Gen<object>.foo<object,string> ()
Gen<object>.foo<string,object> ()
Gen<string>.foo<string,object> ()
Ban<string>.foo<string,object> ()

The MRGCTX contains, apart from the RGCTX-like slots, two items of data: A pointer to the vtable of the method’s class, and the values of the method’s type arguments. The first one is needed to get to the class’s RGCTX if no this argument is passed, i.e. in static generic methods. The type arguments are needed to instantiate new slots in the MRGCTX – without knowing what the value of T is, for example, we cannot look up the type Dictionary<S,T>.

So how much memory do we save with shared generic methods? In my previous post on sharing generic code I presented a table with the savings in memory my three large test applications. Here it is again, updated with data for sharing generic methods:


No sharing Sharing Sharing w/gen methods
Methods
compiled
Code
produced
Methods
compiled
Code
produced
Methods
compiled
Code
produced
Memory for
(M)RGCTXs
Savings
IronPython 3614 719k 3368 691k 3324 687k 7k 25k
Nemerle 7210 2001k 6302 1943k 6150 1891k 34k 76k
F# 15529 2193k 11431 2062k 9823 1652k 154k 387k

Note that this time I’m also counting all the memory used by (M)RGCTXs and the (M)RGCTX templates, which I didn’t do last time.

Next Page »

Powered by WordPress.com