If you have read any of my last few posts you will know that I am working on a tiny language called SIMPL which was inspired by Forth.
SIMPL stands for Serial Interpreted Minimalist Programming Language – as that’s what it started out in life as being – a serial command shell for microcontrollers, when a few simple single alpha character commands could be used to run simple programs and loops on microcontrollers such as Arduino and Teensy.
SIMPL was adapted from Ward Cunningham’s Txtzyme – a character interpreter running inside a loop, and over a period of a few years I have augmented the language to provide more functions such as maths operators, logic operators, loops and conditional branching, and perhaps most importantly, and borrowed from Forth – the means to compile new words using the “colon definition”.
SIMPL is a lightweight language, it does not try to even attempt to do all the things that Forth can do, and with a word set that has been reduced to a maximum of 85 unique functions, it is much smaller than amost all implementations of Forth.
In just 300 bytes, the heart of the SIMPL kernel can take serial text – either typed in manually or sent as a file from a serial terminal program, and buffer it into RAM memory. It can also take colon definitions, and based on their identifying character – normally a capital letter, store them at precalculated addresses in RAM so that they can be accessed and run later with the minimum of decoding.
This unique feature of SIMPL effectively removes the overhead of the dictionary search in Forth. A character read from the input buffer is decoded within a few cpu cycles, and program execution is directed to that block of code.
In addition to the engine that drives the inner interpreter, there are a series of low level primitives that define the SIMPL kernel. These primitive words are effectively the minimum wordset from which the rest of the language can be written. Bill Meunch, C.H. Ting and Chuck Moore all arrived at roughly the same conclusion – back in the early 1990s – that there was a small subset of words from which a full implementation of Forth could be synthesised. It is in this spirit that SIMPL has been written.
These primitives include stack operators, math and logic operators, memory access operators and those that allow conditional branching and control program flow.
There are just 32 primitives – which allows for a 5 bit instruction scheme. These primitives mostly consist of symbols taken from ascii characters 0x20 to 0x40.
The primitives have been chosen to offer register compatibility with Ting’s MSP430 eForth. This offers a proven route to extension and makes building and testing the code a lot simpler.
At it’s lowest level, SIMPL takes the place of a serial bootloader allowing source code to be loaded into the program space of the target microcontroller, using nothing more sophisticated than a serial terminal program such as terraterm.
SIMPL is written in MSP430 assembly language for efficiency, speed and compactness, and it has strong influences from Dr CH Ting’s eForth for MSP430.
As I read through Ting’s book Zen and the Forth Language – I realised that it should be possible to make the kernel of SIMPL a cut down version of eForth, that resides in just 1k bytes of program space, and presents a programmer’s “tookit” in addition to the means to load source code into the device and run that code on a virtual machine.
Progress to Date
Inspired by CH Ting’s book on the MSP430 eForth, I decided to restructure my rudimentary code to make it compatibe with his register model. This will allow for easier expansion later on.
I have included a “jump table” to jump to the code addresses of all of the 96 possible ascii character instructions. This framework will allow me to quickly add the other functions and run them from the code addresses stored in the jump table.
Whist this jump table in itself is 192 bytes long, just over one quarter of the code so far, it reduces other overheads of instruction decoding. The jump table structure is not going to be as fast as direct threaded code, and this is the price we pay for using a tokenised instruction set – but it is easier to understand and extend with more functionality. In hardware, this jump table would be analagous to a microcode ROM for instruction decoding.
The jump table accepts an index value in the form of a single ascii character. This is first copied to a temporary register as we need the original character later. Then we subtract 32 from this – to remove the offset from zero – as the first printable character is “space” which is ascii 32. Finally we multiply whats left by 2 (by adding it to itself) and add this to the program counter. This causes the pc to be incremented by the correct number of words into the table where we find the code address of the function we wish to run. A snippet of this below shows how this selects from the first 5 instructions
; Now we need to decode the instructions using a jump table ; Jump table uses 2 bytes per instruction - so 2 x 96 = 192 bytes next: mov.b @ip+,R12 ; Get the next character from the instruction memory mov.w R12,R13 ; Copy into R13 - as needed to decode Jump Address sub.w #0x0020,R13 ; subtract 32 to remove offset of space add.w R13,R13 ; double it for word addressing add.w R13,pc ; jump to table entry JumpTbl: jmp space ; SP jmp store ; ! jmp dup ; " jmp lit ; # jmp swap ; $
In this way all of the 96 possible ascii tokens are handled, including numbers – which have their own decoding routine, also accessed via this jump table.
Most of Ting’s set of eForth primitives have been coded in now – and I can do simple arithmetic, logical operations and memory 16 bit fetch and store.
The code to enter and decode decimal numbers and print them out to the terminal is also in place and working. – this allows me to type in 123 456+. and the machine responds 00579, as I have yet to suppress leading zeroes on my printnum routine.
I have tested memory fetch and store, and also the conditional operators < = and >.
With the conditional operators returning either 0 for false or 1 for true, the next thing to implement will be the simple looping structures, and the mechanism that allows conditional execution of code.
These structures make use of parentheses ( ), and a single parameter on the stack which is passed into the loop counter
Any code to be conditionally executed is paced between the parentheses. The loop counter defines how many times the code within the brackets should be executed –
0 not at all – skip everything between the brackets until the end ) is detected
1 execute once – i.e use for conditional execution
n execute n times – decrementing the loop counter until it is zero
This is a useful structure, as as well as conditional execution and DO-LOOPs it can also be used for comments – by setting the counter to zero, anything between the brackets is not executed – so the text characters can be treated as a comment.
I am fairly confident that I can keep the kernel below 1024 bytes, with the higher level routines built on top of this 1K kernel. The codesize is now 738 bytes.
I have created a github gist to allow you to view the code so far – as this is clearly a work in progress.