A Smile Posting!

Okay, so it’s apparently been forever since I talked here about Smile.  Which is sad, because I think hard about it every day, and I work on its code multiple times a week.

So here goes with a big status update.

First, let’s review the timeline so far.

Early pre-history

  • 1997:  It begins.  I learned about Lisp and Smalltalk for the first time, and I began pondering what a good Lisp/Smalltalk/C/JavaScript hybrid might look like.
  • 1998-2004:  Whiteboard language experiments.  A lot of these didn’t pan out, but a few core syntactic constructs stuck, like [square brackets] and Smalltalk-style unary/binary transforms.
  • 2003-2012:  Multiple code experiments.  The language wasn’t well-formed at this point, but I started building lexers and parsers and runtimes and evals to help work through the ideas.  I don’t even know how many directories I created named “smile” during this time period.

C# Prototype and Reddit

  • Jan 2013-May-2013:  Inspiration.  A few critical pieces fell into place in a short time, mostly while pondering problems in the shower.  |Vertical bar| syntax for functions (à la Ruby).  Inverting the operand/operator problem by using declarations to identify operands and treating everything else as an operator.  Coloring the S-boxes (cons cells) red and black to solve the function/OO problem.
  • Feb 2013:  Began the C# prototype.  I felt confident enough in the language that I began writing a C# interpreter.  I knew it would be slower than C, but I needed an environment where I could code and refactor fast enough to continue experimenting with the language design.
  • June 2013:  Bought smile-lang.org.  Increasingly confident, with the small prototype starting to be usable, I bought a website.  I had no content, so I put a “Coming soon” page there.
  • October 2013:  My son is born.  The next three months are very blurry, and not much progress is made.
  • January 2014:  The First Real Post about Smile.  I had a working (albeit limited) interpreter at this point, so I felt confident that I could begin talking about it for real on my blog.
  • April 2014:  Reddit notices.  There’s nothing they can use yet, and not a lot of information available, but folks at Reddit see my blog posting and start talking and asking questions.  I respond to some of them, and continue coding.  Since there’s nothing they can use, Reddit gets bored, and stops talking about it.
  • January 2015:  The C# prototype works fairly well, but it’s slow.  Absurdly slow.  I start to seriously consider whether to ditch it and restart in C, now that I know what the language is going to be like.  I keep expanding the language and adding features.
  • February-March 2015:  Attempts to port some of the C# code to “managed C++.”  The managed C++ code is faster, but still an impractical solution; it will take more effort to port the whole thing to C++ and detach it from .NET than it will to simply rewrite it.

C Implementation

  • April 2015:  Restarted in C.  The C# prototype reached its realistic limit when I tried to make file I/O work well.  I made the painful decision to stop work in C#, and start over, doing it right this time, in C.
  • Spring/summer 2015:  Garbage collection, strings, dictionaries, unit tests.  Began the core constructs needed to make a usable dynamic language.
  • Fall 2015:  Lexer and binary integer decimal.  I begin work on the lexer.  I make the decision to support IEEE decimal floats as the primary floating-point type (with IEEE binary float still available), and integrate the Intel Binary Integer Decimal library.
  • January 2016:  My daughter is born.  The next three months are very blurry, and not much progress is made.
  • February 2016:  Lexer is finished.  I start work on the parser.
  • June 2016:  Parser is complete.  A C equivalent to C# parser now exists, but I detour to support #syntax forms, which the C# version never did.
  • August 2016:  Dynamic-#syntax now works.  As of today (August 28), the #syntax code is nearly complete, and most of it works.

This brings us to today.

Current State

There are 619 unit tests (all passing!) for the current Smile C implementation.  It consists of 399,827 lines of code, divided up like this:

  • 7,000 lines of C header files
  • 32,000 lines of C source files
  • 41,940 lines in the Boehm Garbage Collector
  • 304,000 lines in Intel’s Binary Integer Decimal library
  • 15,000 lines in unit tests

The supporting types (like string and dictionary and real64) are complete; the garbage collector (Boehm’s) works; the lexer is complete; the parser is just this side of complete, and should be actually complete by the end of September, including the super-powerful #syntax construct, which lets you embed arbitrary LL(1) grammars in your code anywhere, and allows syntax forms like if-then-else to no longer be built-in special constructs but rather just be declarations out in code somewhere.

So far, though, the C implementation doesn’t actually do anything but run a suite of unit tests that guarantee the language is what I intend it to be.  The C# implementation eventually worked, but it was always a little quirky in places, and a lot of that had to do with its lack of testing.  I don’t know if I have full code coverage in the C implementation, but it’s pretty extensive, and I’m pretty confident in its quality this time around.

Also:  The Smile C implementation is in C.  Not C++.  I chose C for consistency, portability, and speed:  You can get a C compiler anywhere; all C compilers basically work correctly now; and they’re all stupid-fast compared to their C++ brethren.

The C implementation is intentionally designed to be portable:  While the C# version was dependent on .NET, the C implementation is dependent on nothing.  At one point back in March, I had it building and running in all four of Visual Studio, Cygwin, Linux, and MacOS X.  Some of the builds have probably gotten buggy and need fixing, but there’s no reason the new interpreter shouldn’t run on every major OS when it reaches version 1.0.

So When Can I Get It?

Well…  right now.  Sort of.

You see, while the C# version was closed-source mainly because the code was embarrassing and hacky and constantly in flux, the C version is open-source under the Apache License.  You can go to GitHub right now and download a copy:  https://github.com/seanofw/smile

That said, it doesn’t do anything.  Like I said, it’s a giant pile of code with unit tests.  You can’t run programs in it — yet.  You can open it in Visual Studio 2013, build it, run the unit tests, and watch a bunch of green marks appear on your screen.

But you’re welcome to read through the code.  It’s open this time, properly open, and has been since I started work on the C version.  (I just didn’t share the GitHub link on my blog until now; but it’s been a public GitHub project since I wrote the first line of C code.)

There’s a growing pile of documentation written for smile-lang.org, but it hasn’t been re-posted there yet since the server crash.

What’s Next?

I need to finish the parser, and then finally start work on eval.

In the C# version, I implemented a simple recursive eval for it, but that caused more trouble than it was worth by the time I was done with the C# version.  I had started work on a bytecode compiler in the C# version, but the performance was so awful that I stopped it after about two thousand lines of code.  This time around, in C, I can make that bytecode compiler scream, and that’s going to be the only interpreter.  (It’d still be possible to build a recursive eval, but nobody wants one of those except academics who would want to explore the theoretical properties of the language.)

Once eval exists, and a few primitive list bindings, I can finally at least eval my most primitive test program:  car cdr `[1 2 3]

(For those who know Lisp, that Smile code is loosely equivalent to (car (cdr `(1 2 3))), but with an interesting object-oriented twist and some funky syntax transforms taking place behind the scenes.  For those who don’t know Lisp, that code outputs the number 2.)

After eval, I need to make the ByteArray type, and properly bind up the String type.  Then I need to make the File type, and build a few of its transforms.  And then, finally, I can write the real first program that does something interesting:  print “Hello, World.”

So we have a ways to go.

I had hope to get to Hello, World before the end of the year, but I don’t think we’ll reach that.  There’s still work yet to be done in the parser, and eval doesn’t exist yet.  I’m revising my goal for the rest of 2016 to just being able to get car cdr `[1 2 3] to work, and I think that’s at least realistic.

Conclusion

It’s got a ways to go, but it’s coming.  The code’s public and on GitHub, which keeps me honest.  I think when you guys see all the pieces come together, you’ll be properly impressed with what Smile really is and does.  But it’s currently still just a big pile of code and unit tests, and a long way from production-ready.