Wolfram Alpha disappoints

Wolfram Alpha, the much-hyped “computational knowledge engine” by Wolfram Research, the people behind Mathematica, went online last night. I played a bit with it today, and I am disappointed. Alpha is not what it’s cracked up to be.

Alpha is hard to figure out

Alpha is supposed to figure out what you mean by your query and present the answer. Unfortunately, as with every natural language processing system in existence, it often fails to understand you. In the end, what that means is that you have to learn how Alpha parses language and adjust your input accordingly.

For example, I tried to get Alpha to give me a plot of infant mortality vs gross domestic product. It doesn’t seem to be able to do that, but that’s not the point. Rather, my attempts illustrate the problems with Alpha’s language parsing.

One of the examples for Alpha’s usage is the comparison between the Pioneer 10 and 11 space probes. You can enter, for example, “Pioneer 10 vs Pioneer 11”, or “Pioneer 10, Pioneer 11” and get the same nice comparison tables. Enter “infant mortality vs gdp”, however, and you get the result “1 child deaths | $54.61 trillion per year”. I don’t know what that even means. Enter “infant mortality, gdp” and you get “5.499×10^7 person children/yr | $54.61 trillion per year”, which seems to be the number of infant deaths per year in the whole world (but isn’t) and the sum of the GDP’s of all countries. Enter “plot of infant mortality, gdp” or “plot of infant mortality vs gdp” and you get the same nonsensical “1 child deaths | $54.61 trillion per year” answer, but in a slightly different table form.

Interestingly, if you’re interested in mortality rather than infant mortality, Alpha will give you the plot if you ask for “mortality vs gdp” or “mortality, gdp”.

Upon closer inspection the problem is that Alpha knows nothing about infant mortality (even though that data is available in Mathematica’s CountryData) but interprets “infant mortality” as “child multiplied by mortality”. The one number it gave that seemed like it might be the number of infant deaths per year was actually the total number of deaths per year, multiplied by one child, i.e. pure nonsense.

Another thing I tried was something similar to an example they give for finding all words ending with “ager”, the query for which is “words ending with ager”. What if I want to find all words containing “ager” instead? The search “words containing ager” gives a list of words which contain each of the letters in “ager”, but not necessarily in that order. How to get the correct answer, or rather how to correctly pose the question, is anybody’s guess.

Alpha gives wrong data

Not only does Alpha sometimes give nonsensical answers which look like they might mean something, it also presents completely wrong data now and then. For example, when asking for information on the Spanish language, it tells us that there are 321.6 million native speakers and 322.3 million speakers in total, which would mean that only 700 thousand people speak Spanish as a second language, or only 0.2% of the total number of speakers. I find that very hard to believe. According to Alpha, that proportion is 36% for English, 17% for Russian, 12% for German, over 8.6% for Italian and nearly 6% for Turkish. If we are to believe Alpha there are more people who speak Czech as a second language than there are who speak Spanish non-natively.

But wait – Alpha has a feature which lists the sources of its data! Clicking on that link gives us a list of “Background sources and references” (among them Wikipedia), with the nice hint that those sources might not actually be where the data is from. As “Primary source” is lists only one item: “Wolfram|Alpha curated data”. In other words, you can either believe Wolfram or find the correct information yourself.

Alpha doesn’t give all data

To add insult to injury, Alpha also fails to indicate that the data it has is incomplete. Looking up the German language we are informed that it has 60% lexical similarity to English and 29% to French. No other languages are given as points of comparison, leading one to believe that those are the two top ones. However, German is much more closely related to, for example, Dutch, but apparently Alpha doesn’t have that data, and it doesn’t tell us, either.

Alpha isn’t hyperlinked

There are almost no hyperlinks or further explanations in the data Alpha spits out, not even for the properties of whatever it is that is presented. One of the pieces of information on the icosahedron, for example, is its dual polyhedron. What is a dual polyhedron? Alpha has no idea. Wikipedia knows.

Alpha is incomplete and very biased

When asked about “Turing”, Alpha spits out a short table with biographical data about Alan Turing. Regarding his work all it seems to know is that he was a mathematician. “Turing machine” produces no data at all and neither does “Lambda calculus”, whereas Alpha seems to know a whole lot about cellular automata, Stephen Wolfram’s pet subject.

And, possibly worst of all, Alpha knows nothing about juggling or siteswaps.

3 thoughts on “Wolfram Alpha disappoints

  1. I gave it a spin, and yes, it drew blanks on a lot of my questions. I think your points are valid, but keep in mind…

    1. its been on line for less than 24 hours. Google uses prior clicks to adapt its results. When you click a link, you are actually refining the results shown to others. Perhaps alpha will have “learning” capabilities as well.

    2. Even if it doesn’t satisfy the casual user (me), even if it does require us to learn its semantics, that doesn’t render it useless. Its a new way of thinking. It will take a decade to measure the viability of this sort of thing.

  2. Another example of its incompleteness, one example they have is “earthquakes near the Akashi-Kaikyō bridge”. Well that’s nice, but what about “earthquakes in austria”? The nice part is that is up-to-date information the earthquake Alpha knows about was only about two weeks ago, but looking at historical data, we find that this is the sole earthquake alpha knows in Austria.

    The source, “W|A Curated Data”. What about the Earthquake on April 20. Maybe it was too weak? After all it only had 4.1(richter) compared to the 4.2 it knows about.

  3. Just wanted to say great job with the blog, today is my first visit here and I’ve enjoyed reading your posts so far :)
    Juan

Comments are closed.