January | 2010 | Mostly Software

Often when I learn a new programming language that’s different enough to what I already know, the first thing I implement is an interpreter for some subset of the Scheme programming language. I’ve done Scheme implementations in OCaml, Mercury/Prolog, Forth and Unlambda, among others. Recently I had to learn JavaScript for a small, fun project about which I’ll blog some other time, but I figured that writing an interpreter would be a bit boring, so I decided to do a Scheme compiler that produces JavaScript code. And it wouldn’t be written in JavaScript, but in Scheme. And it would compile itself. And for some reason I had to call it JScreme (to be pronounced “jay scream”). I’m weird.

You can try it out with Firefox or Chrome here. Other browsers might work, too, but I haven’t tested any.

Data representation

JavaScript offers many data types very similar to the ones in Scheme, so I decided to use them to represent Scheme data directly. A Scheme number is represented by a JavaScript number, strings are represented as JavaScript strings, booleans as booleans, nil is null (but we also treat undefined as nil), vectors are arrays. JavaScript doesn’t have a data type for characters, but for my purposes it’s good enough to treat single-character strings as Scheme characters as well as Scheme strings. The most important data type, the function, can fortunately also be represented directly.

The only two Scheme data types that need special consideration are conses (or pairs, as Scheme calls them) and symbols. Conses are represented as JavaScript objects with the properties car and cdr. Symbols are also represented as objects, with the property symbol set to the string that is the symbol’s name. Since Scheme symbols with the same name must not only be equal (in the sense of equal?), but identical (in the sense of eq?), we must make sure that only one such object exists for each symbol. This is called “interning”. We use a global dictionary, indexed by the symbol’s name, and whenever a symbol is to be created, we look it up there, or insert it if it’s new.

The compiler

Let’s look at how JScreme compiles a few simple examples:


Scheme	JavaScript
`123`	`123`
`#\a`	`"a"`
`"abc"`	`"abc"`
`#t` / `#f`	`true` / `false`

Functions are also easy:


Scheme	JavaScript
`(lambda (x) x)`	`function (x) { return x; }`
`(f x)`	`(f (x))`
`(apply f x y z)`	`(f.apply (null, [x, y].concat (list_to_vector (z))))`

The only rule here that merits explanation is the one for apply. In case you’re not familiar with Scheme’s (or most other Lisp’s) apply, here’s an equivalence:


`apply`	no `apply`
`(apply f 1 2 '(3 4))`	`(f 1 2 3 4)`

So, apply takes a function, some arguments, and a list of more arguments, and calls the function with those arguments. The reason apply is needed is because it lets you call a function with a number of arguments that is not statically known.

The function list_to_vector above converts a Scheme list into a vector, i.e. a JavaScript array. It’s implemented in Scheme as well. We’ll get to the details later.

Obviously there must also be a way for a function to receive a variable number of arguments:


Scheme	JavaScript
`(lambda (x . y) y)`	`function (x) { var y = listify_vector (arguments, 1); return y; }`

Here, the function listify_vector converts the arguments array, starting at index 1, into a Scheme list.

At this point I should mention that I’ve been lying a bit about names. Due to JavaScript’s scoping rules being slightly different than Scheme’s we must make sure that different variables have different names in JavaScript, even though they’re named the same in Scheme. The easiest way to achieve that is to use a name generator to create a unique name for each variable. In Lisps, it’s usually called gensym. So, this is what code actually looks like:


Scheme	JavaScript
`(lambda (x) x)`	`function(__space_g8757_){return __space_g8757_;}`

Of course there’s also if:


Scheme	JavaScript
`(if a b c)`	`(a ? b : c)`

For simplicity’s sake JScreme also implements letrec in the compiler. It creates a new function to make scoping easier and to be able to treat the whole thing as an expression:


Scheme	JavaScript
`(letrec ((x 1) (y x)) y)`	`(function () { var x = 1; var y = x; return y; } ())`

The toplevel

At this point we’ve covered almost all non-trivial features of the compiler. The rest of JScreme is implemented in JScreme itself as macros and functions. Macros and toplevel definitions are not handled by the compiler directly but by the toplevel handler. It looks at each toplevel item and filters out definitions and macros, which are handed to the compiler and then put into symbol tables. It also does macro expansion. Since it’s boring I won’t speak any more of it.

The library

Most of what’s in the library is also boring, so I’ll stick to the more interesting bits. Here are some fundamental parts of the Scheme language that are not implemented in the compiler but via macros:


before `macroexpand`	after `macroexpand`
`(begin x y z)`	`((lambda () x y z))`
`(let ((x 1)) x)`	`((lambda (x) x) 1)`
`(let* ((x 1) (y x)) y)`	`((lambda (x) ((lambda (y) ((lambda () y))) x)) 1)`
`(cond (a 1) (b 2) (c 3))`	`(if a 1 (if b 2 (if c 3 #f)))`
`(and a b c)`	`(if a (if b c #f) #f)`
`(or a b c)`	`(if a #t (if b #t c))`

Data types are next. How do we know something’s a number? Here’s the definition:

(define (number? x)
  (and (not (null? x))
       (js-op (.. x constructor) "==" (js-quote "Number"))))

There are three tiny compiler features in this piece of code that I haven’t mentioned before. Here’s what they do:


JScreme	JavaScript
`(.. x y)`	`x.y`
`(js-op x "==" y)`	`x == y`
`(js-quote "Number")`	`Number`

With that out of the way, we can see how number? works: If an object’s constructor is Number, that object is a number. Since null and undefined aren’t objects, though, they need to be filtered out first with null?, which, together with eq?, is defined like so:

(define (eq? a b)
  (or (js-op a "===" b)
      (and (js-op a "==" '())
           (js-op b "==" '()))))

(define (null? x)
  (eq? x '()))

The definition of eq? is a bit convoluted, since we want to have null and undefined be identical, which they are not by JavaScript’s operator ===, so we must handle them separately.

There’s yet another new compiler feature being used in cons:

(define (cons a b)
  (js-object (car a) (cdr b)))

js-object constructs an object via JSON notation:


JScreme	JavaScript
`(js-object (car 1) (cdr 2))`	`({car:1, cdr:2})`

Some functions are surprisingly easy to implement:

(define (list . args)
  args)

I mentioned the function listify_vector above, which is also implemented in JScreme:

(define (listify-vector vec start)
  (letrec ((recur (lambda (i l)
                    (if (< i start)
                        l
                        (recur (js-op i "-" 1)
                               (cons (vector-ref vec i) l))))))
    (recur (js-op (vector-length vec) "-" 1) '())))

It’s basically a loop that conses up the vector’s elements, starting with the last and counting down. The reason why I had to use js-op for the subtraction instead of Scheme’s - is that, since - is a variable argument function it is compiled to JavaScript code that uses listify-vector, so we’d end up with an endless recursion.

Much of the rest of the library is not very interesting and easy to understand, so I won’t bore you with the details.

The reader

Interpreting or compiling Scheme is nice and easy if you do it in Scheme, where code is represented as data, but if you bootstrap Scheme, you must also have a parser that reads Scheme code/data from a string. In Lisps, that parser is called the reader, and JScreme has one, too, also implemented in Scheme. It’s actually completely functional, i.e., it doesn’t use side-effects, because I wrote it before I had implemented set! in the compiler.

Limitations

To the seasoned Schemer it will not have gone unnoticed that I haven’t mentioned tail calls or continuations, yet. It will also not have gone unnoticed that the triple negations in the previous sentence and in this one don’t exactly improve the text’s clarity, but that is beside the point (which is also beside the point).

The reason for my silence on those two issues is that I’m cheating by simply implementing neither proper tail calls nor continuations (or, more specifically, call/cc).

Since recursion is the only way to do iteration in Scheme, and I haven’t implemented any other iteration construct, unless your JavaScript implementation has proper tail calls, JScreme is fundamentally flawed. One way to remedy this is to instead of doing a tail call, return a specially marked function that does the tail call, and then wrap each function call into a loop that continues invoking those functions until something else (the proper result) is returned.

Properly implementing call/cc would require compiling to continuation passing style. It’s not awfully complicated, but it would make JScreme even slower than it already is.

Playing around with JScreme

The sources for JScreme are hosted on GitHub. To build it you’ll need Guile. As far as I know the only Guile-specific feature JScreme is using is defmacro is probably also available in most other Scheme implementations.

The main pieces of JScreme are the compiler in compiler.scm, the reader in reader.scm, the library in primitives.scm and the toplevel handler in toplevel.scm and toplevel-compile.scm. There’s also a test suite in tests.scm. JScreme compiles itself in jscreme.scm.

Mostly Software

Menu

Monthly Archives: January 2010

JScreme