Exploring the J1 Instruction Set and Architecture using SIMPL

James Bowman developed the J1 Forth CPU in 2010 – and presented a paper describing its architecture at Euroforth  that same year.  It’s a stack based processor – capable of executing Forth primitives mostly in a single cycle. It’s also an ideal candidate to explore stack machine and MISC architectures – and has been ported to FPGAs using entirely open source tool chains.

This ongoing project was inspired by ideas first discovered in the book “The Elements of Computing Systems”  – and the associated course “From NAND to Tetris”.   Having ploughed easily through the first chapter on hardware, I realised that my software skills were dreadfully lacking – so I thought I’d create a simple language  (SIMPL) from scratch and get it to run on an open source FPGA platform  – myStorm – using Clifford Wolf’s Project IceStorm open source tool chain.

In this post I explore the J1 ISA (Instruction set & architecture) and hack together a simple cross-compiler to execute code on a simulated J1 processor.

 The J1 Architecture.

Having read James Bowman’s J1 Paper several times over, I managed to build up a simplified model of his instruction set and architecture.

J1 uses 16 bit long instruction words – where each word is divided into different length fields. The following 3 images – taken from James’s 2010 EuroForth paper – explains the architecture:

J1_ISAJ1_encodingJ1 ALU Codes

Bits 15, 14 and 13  define the Instruction Class – and there are 5 classes of instruction

1 x x   Literal

000   Jump

001    Conditional Jump

010    Call

011     ALU instruction

Bit 12 if  set in ALU mode provides the return from subroutine mechanism by loading the top of the return address stack R into the PC.

Bits 11,10,9 and 8  define a 4 bit ALU opcode – allowing up to 16 arithmetical, logical and memory transfer instructions.

Bits 7,6,5 and 4 are used to control the data multiplexers – so that data can be routed around the cpu according to which of these bits are set.  Here lies a little anomaly with the J1, in that Bit 4 is not used, and it would seem more logical to use it to provide the return function bit – currently done in bit 12.

Bits 3 and 2 define how the return stack pointer is manipulated in an instruction. It can be incremented or decremented by setting these bits.

Bits 1 and 0 define the manipulation of the data stack pointer  – it has a range of  +1,0,-1,-2 depending on the setting of these bits. To pop off the stack you subtract 1 from the stack pointer, to push on the stack, you add one to the stack pointer.

As the various control fields of the J1 instruction exercise different parts of hardware – they can operate in parallel – so for example a return or exit from subroutine can be had for free.

Modifying the J1 Instruction Set.

Whilst the J1 instruction set is neat and compact – the anomaly in Bit 4 is a bit of a sticking point with me.  If we put the “return bit” into the Bit 4 field, this would free up the Bit 12 field.  Bit 12 could then be used to define more classes of instruction, or better still added to the 4 bit ALU instruction field, to make the number of op-codes 32 rather than 16.

If there were two J1 cores acting together in a single FPGA – bit 12 could be used to channel instructions to either core. One J1 could be used for computation and the other could be used as a graphics processor or similar specialised co-processor.

James has already done some changes in this area – moving the return bit to Bit 7 – plus some other internal improvements – and has subsequently released a J1b – which can be instantiated as either a 16 bit or 32 bit Forth cpu, whilst the J1a is a minimum footprint cut down 16-bit J1 with *K memory to fit into small FPGAs – like the Lattice IceStick . See James’s Github for demo. The Verilog fr the new J1b  – is here

For more information about possible modifications to the J1 – read Victor Yurkovsky’s excellent post in “FPGA Related”.  Some neat ideas for those that want to tinker with the internals of the J1.  As the J1 original design verilog is so neat and clean, it’s easy to make small incremental changes and test them out for usefulness on the J1 simulator. Victor’s spring clean freed up two bits of the instruction for future expansion and reduced the number of CPU slices from 150 to 76 – allowing the scope for multiple J1 instantiations in a single small FPGA.


Creating a Simulator and SIMPL to J1 Cross Compiler

I came across a very compact simulator for James Bowman’s J1 cpu – written in about 100 lines of C. I decided to port this across to an ARM based Arduino Due, so I could use it to explore the architecture further, as a step on the road to implementing a J1 on a myStorrm FPGA board.

I have put the Arduino code for this SIMPL cross-compiler/ J1 Simulator on this Github Gist

As an input the simulator expects 16 bit J1 machine code instructions in an array of RAM, and the bulk of the code is an execution model of the cpu which steps through this section of memory executing each instruction in turn. After each program step I chose to print out the main stack elements, the program counter, and a few memory cells.

However, the process of hand assembling J1 machine language needs patience and knowledge of the chip’s architecture,  and to write half a dozen native instructions to execute a simple loop counter was about as much as I would recommend for anyone. So I decided to investigate easier ways to automate this process, and hit upon the idea of a cross-compiler, using my SIMPL language and text interpreter framework to input the source code.

SIMPL uses single ascii character tokens to represent primitive machine instructions – I just needed to create a look-up table mechanism where these characters are decode into whatever the machine language of the host processor happens to be – in this case J1.  However it could be any other cpu – just by changing the contents of the look-up table, so a series of cross compilers could be written so that SIMPL could be hosted virtually (no pun) on any processor.

SIMPL exists as a virtual, stack oriented processor, hosted on real hardware cpu. More often the host processor is register based – such as ARM or MSP430 – but in the case of the J1 it is also a stack machine (albeit simulated on a register based ARM) . This means that the choice of the SIMPL minimal primitive instructions – are a very good fit to actual J1 instructions – with often a 1 to 1 match.  This makes a real J1 an excellent host for the SIMPL language – and it is this relationship that I wish to explore further.

In previous posts I have described SIMPL as a lingua franca – which allows the communication of programs and data between widely varying classes of computing machines.  Each machine just needs to host a SIMPL interpreter, and at the start of a communication exchange, the communicating machine needs to send a descriptive list of what the various words mean – in terms of the primitives that go up to make their definitions. Once this has been done, SIMPL becomes a very compact medium for transfering code, data and objects – with a code compaction of between 3 and 4 over other source code or object code techniques.

In Practice

I took the SIMPL text interpreter framework and mapped into it the J1 instructions, focusing initially on the stack, ALU and memory operations – listed below

SIMPL               Operation                J1 hex code

”                           DUP                                6081

‘                           DROP                              6183

$                         SWAP                              6180

%                        OVER                              6181

+                         ADD                                 6203

&                         AND                                6303

|                           OR                                   6403

^                          XOR                               6503

~                          INV                                6600

@                         FETCH                         6C00

!                            STORE                         6123

As can be seen from the above – they all begin 6xxx which means ALU operation – and the 2nd significant digit  is the 4 bit ALU opcode. The 3rd digit is the transfer destination of the ALU result, and the least significant digit is the stack manipulation bits.

The cross compiler produces a line of output thus – in response to a single ascii character being typed in  – in this case ^ for XOR

XOR 6503 PC=9 TOP=FFFF DS0=0 DS1=0 DS2=0 DS3=0 RTN=0 MEM20=0 MEM21=6F00 MEM22=0

This also shows the Program Counter PC, the Top of Stack plus the first three elements of the stack below TOP,  the value on the top of the return stack and three consecutive locations in memory.  This debug output is essential to ensure that the J1 simulator is correctly executing each new instruction – and a great visual insight into the working of the CPU.

The cross compiler is now producing output in J1 machine language – and because it uses the SIMPL front end text interpreter, this automates the process of putting text strings together so it can be used to compose small snippets of  J1 code and run them in an interpreted manner – allowing new words to be created from SIMPL primitives.

Future work will enhance the cross compiler allowing the larger program flow constructs such as DO-LOOP, IF-THEN and BEGIN-END  to be implemented. Whilst output from the simulator is currently just single lines of Debug – the program will be extended so as to compile into files of J1 opcodes, and SIMPL source.


“It’s Forth-like, Jim – but not as we know it….”






Posted in Uncategorized | Leave a comment

Making it Minimal – MISC Machines

As I read more about Forth, and engage with some of the World’s Forth-Thinkers, I get the impression that at some point Forth becomes addictive and productive.  I have yet to reach that level of enlightenment, but my journey continues  slowly, in fits and starts, as there  is always some other more pressing engagement on my metal capacity.

I have been for some time a proponent of minimal instruction set (MISC) machines – and Chuck Moore has been prolific in this area for the last 35 years or so.  In a progressive career in creating silicon integrated circuits that execute Forth primitives directly, he has slowly whittled away the complexity of these machines, to the point where 144 individual cores can be put on a single die – as part of his Green Arrays work.

The problem then becomes not so much a hardware challenge – but then one of software just how do you efficiently co-ordinate the tasks of 144 cores. Perhaps we have something to learn from the behaviour of bees or other colonial insects including ants and termites?

Each worker has a specific task to perform be it foraging for food, nursery duty or general housekeeping.  Each agent is running a low level behavioural model, with a higher level specialisation – based on the specific task or in response to certain stmuli.

This might be one way to organise a multicore processor  – a set of low level functions which are augmented by task or sensor specific application functions.

For the moment however, I first have to define the Instruction Set and Architecture (ISA) of my proposed Stack CPU.

Charles Moore, Chen Hanson Ting and Bill Meunch provided an important framework for MISC machines, and proved them to be an important family of cpus.  The small instruction set means simplified hardware, fewer transistors or logic cells and the ability to implement  using low cost FPGA technology.

With a minimal instruction set,  and a Forthlike language, it’s always possible to create more complex instructions by concatenating the primitive instructions.  Provided that you have a small number of useful instructions – any program can be synthesised.

Moore and Ting postulated that the minimum useful instruction set would be about 25 to 30 instructions, and this fits in well with using 5 bit tokens to represent instructions. Moore packed four, 5 bit instructions into 21 bit wide memory – and this formed a  very simple pipeline architecture, which could also be used to hold a 16 bit literal and a 5 bit instruction. This scheme was used in Moore’s ShBoom processor and also the Mup21.

The Novix NC4016 however used a different approach and used the bit fields of the instruction to directly control different parts of the hardware.

This was the same approach taken by James Bowman – with his J1 Forth CPU.  This is fully described in just 200 lines of verilog and been implemented on Xilinx and Lattice FPGA devices.

James’s device was influenced my Charles Moore’s Novix NC4016 architecture, and has been successfully incorporated into a retro gaming sheild for Arduino – Gameduino.  James is also a professional Silicon Architect, and the J1 is also used as a core on some graphic processor chips.

Using a microcode ROM it would be possible to decode 5 bit instruction tokens into multiple field wider instructions – and this would allow the simple pipelining scheme to be utilised.

The Virtual Instruction Set

Here’s a non-exhaustive list of instructions that would be useful candidates for a Stack oriented MISC computer. Most of these could execute in a single cycle on a FPGA based processor as run-time code,  but some of the program flow constructs would need to be compiled.

Stack Manipulation

Memory & Register Transfer
Program Flow

In total about 35 instructions,  but some savings could be made bringing it down to 32, which allows 5 bit tokens and packing of instructions into wider memory words

 This maps well to the 33 printable ascii symbols  (not alpha or numbers).  These are chosen for their strong mnemonic bias  – to form the basis of the SIMPL language.  With only 32 symbols to learn its like a shorthand version of Forth yet retaining its human readability.  With a compiler, there is no reason why the longhand version of the instruction cannot be included in the listing. It also reduces the source code from an average of  4 characters (including space) per instruction to just 1 character per instruction. This compaction of source code means that snippets of executable code can be transferred between processors in small packets – using wireless connectivity or even as SMS messages over the GSM network.

In the next post I’ll cover some aspects of mapping this minimal instruction set onto the J1 architecture, plus a look at some of the cross-compiler and simulator tools needed to allow further exploration of the MISC class of computing machines.










Posted in Uncategorized | Leave a comment

Making progress with SIMPL

If you have read any of my last few posts you will know that I am working on a tiny language called SIMPL which was inspired by Forth.

SIMPL stands for Serial Interpreted Minimalist Programming Language – as that’s what it started out in life as being – a serial command shell for microcontrollers, when a few simple single alpha character commands could be used to run simple programs and loops on microcontrollers such as Arduino and Teensy.

SIMPL was adapted from Ward Cunningham’s Txtzyme – a character interpreter running inside a loop, and over a period of a few years I have augmented the language to provide  more functions such as maths operators, logic operators, loops and conditional branching, and perhaps most importantly, and borrowed from Forth –  the means to compile new words using the “colon definition”.

SIMPL is a lightweight language, it does not try to even attempt to do all the things that Forth can do, and with a word set that has been reduced to a maximum of 85 unique functions, it is much smaller than amost all implementations of Forth.

In just 300 bytes, the heart of the SIMPL kernel can take serial text – either typed in manually or sent as a file from a serial terminal program, and buffer it into RAM memory. It can also take colon definitions, and based on their identifying character – normally a capital letter, store them at precalculated addresses in RAM so that they can be accessed and run later with the minimum of decoding.

This unique feature of SIMPL effectively removes the overhead of the dictionary search in Forth.  A character read from the input buffer is decoded within a few cpu cycles, and program execution is directed to that block of code.


In addition to the engine that drives the inner interpreter, there are a series of low level primitives that define the SIMPL kernel.  These primitive words are effectively the minimum wordset from which the rest of the language can be written. Bill Meunch, C.H. Ting and Chuck Moore all arrived at roughly the same conclusion – back in the early 1990s  – that there was a small subset of words from which a full implementation of Forth could be synthesised. It is in this spirit that SIMPL has been written.

These primitives include stack operators, math and logic operators, memory access operators and those that allow conditional branching and control program flow.

There are just 32 primitives – which allows for a 5 bit instruction scheme.  These primitives mostly consist of symbols taken from  ascii characters  0x20 to 0x40.

The primitives have been chosen to offer register compatibility with Ting’s MSP430 eForth.  This offers a proven route to extension and makes building and testing the code a lot simpler.

At it’s lowest level, SIMPL takes the place of a serial bootloader allowing source code to be loaded into the program space of the target microcontroller, using nothing more sophisticated than a serial terminal program such as terraterm.

SIMPL is written in MSP430 assembly language for efficiency, speed and compactness, and it has strong influences from Dr CH Ting’s eForth for MSP430.

As I read through Ting’s book  Zen and the Forth Language – I realised that it should be possible to make the kernel of SIMPL a cut down version of eForth, that resides in just 1k bytes of program space, and presents a programmer’s “tookit” in addition to the means to load source code into the device and run that code on a virtual machine.

Progress to Date

Inspired by CH Ting’s book on the MSP430 eForth,  I decided to restructure my rudimentary code to make it compatibe with his register model.  This will allow for easier expansion later on.

Jump Table

 I have included a “jump table” to jump to the code addresses of all of the 96 possible ascii character instructions. This framework will allow me to quickly add the other functions and run them from the code addresses stored in the jump table.

Whist this jump table in itself is 192 bytes long, just over one quarter of the code so far, it reduces other overheads of instruction decoding. The jump table structure is not going to be as fast as direct threaded code, and this is the price we pay for using a tokenised instruction set – but it is easier to understand and extend with more functionality. In hardware, this jump table would be analagous to a microcode ROM for instruction decoding.

The jump table accepts an index value  in the form of a single ascii character. This is first copied to a temporary register as we need the original character later. Then we subtract 32 from this  – to remove the offset from zero – as the first printable character is “space” which is ascii 32. Finally we multiply whats left by 2 (by adding it to itself) and add this to the program counter.  This causes the pc to be incremented by the correct number of words into the table where we find the code address of the function we wish to run. A snippet of this below shows how this selects from the first 5 instructions

; Now we need to decode the instructions using a jump table
; Jump table uses 2 bytes per instruction - so 2 x 96 = 192 bytes

next:     mov.b @ip+,R12     ; Get the next character from the instruction memory
          mov.w R12,R13      ; Copy into R13 - as needed to decode Jump Address
          sub.w #0x0020,R13  ; subtract 32 to remove offset of space
          add.w R13,R13      ; double it for word addressing
          add.w R13,pc       ; jump to table entry

JumpTbl:  jmp space          ; SP
          jmp store          ; !
          jmp dup            ; "
          jmp lit            ; #
          jmp swap           ; $


In this way all of the 96 possible ascii tokens are handled, including numbers – which have their own decoding routine, also accessed via this jump table.


Primitive Progress

Most of Ting’s set of eForth primitives have been coded in now – and I can do simple arithmetic, logical operations and memory 16 bit fetch and store.

The code to enter and decode decimal numbers and print them out to the terminal is also in place and working. – this allows me to type in 123 456+. and the machine responds 00579, as I have yet to suppress leading zeroes on my printnum routine.

I have tested memory fetch and store, and also the conditional operators < = and >.

With the conditional operators returning either 0 for false or 1 for true, the next thing to implement will be the simple looping structures, and the mechanism that allows conditional execution of code.

These structures make use of parentheses ( ), and a single parameter on the stack which is passed into the loop counter

Any code to be conditionally executed is paced between the parentheses.  The loop counter defines how many times the code within the brackets should be executed –

0   not at all  – skip everything between the brackets until the end ) is detected

1   execute once – i.e use for conditional execution

n  execute n times – decrementing the loop counter until it is zero

This is a useful structure, as as well as conditional execution and DO-LOOPs  it can also be used for comments – by setting the counter to zero, anything between the brackets is not executed – so the text characters can be treated as a comment.

I am fairly confident that I can keep the kernel below 1024 bytes, with the higher level routines built on top of this 1K kernel. The codesize is now 738 bytes.

I have created a github gist to allow you to view the code so far – as this is clearly a work in progress.







Posted in Uncategorized | Leave a comment

More Thoughts SIMPL and eForth


Image result for CH ting zen

I recently downloaded Dr C.H. Ting’s book via Kindle “Zen and the Forth Language” – published by fellow Forther, Juergen Pintaske.

In the book, Ting describes in detail his eForth model ported to the MSP430, and how a complete direct threaded Forth can be built using a limited set of primitives.

Ting’s book is very comprehensive, giving details of how to use TI’s Code Composer to assemble a program and create an output file in a form that can be used by the 4E4TH IDE.  There is sufficient information in his book for any curious person to discover the inner workings of Forth and have a complete working system on the inexpensive MSP430 Launchpad.

Whist at Forth Day, I had the opportunity to talk with Ting and this was the motivation I needed to take the plunge with MSP430 assembly language. Ting’s book, and it’s detail of the eForth mode convinced me that I was on the right lines with my approach to SIMPL – indeed almost to the point where my implementation of SIMPL could be considered to be a sub-set of Ting’s eForth.

Having got the basics of the SIMPL interpreter running in assembly language, the plan is now to massage the allocation of the registers, so as to make it compatible with Ting’s model. Then SIMPL really will be a simplified version of eForth.  This means that if I ever choose to extend SIMPL at a later date – there is a clear and proven route via Ting’s eForth.

SIMPL removes the dictionary search overhead that otherwise exists in Forth. The “words” are just singe ASCII characters, and it is easy to jump to an address, calculated from that value. Whilst this might appear wasteful of memory, the ultimate  MSP430 target has 256Kbytes of FRAM to play with.  It massively reduces the overhead of decoding text strings.

Can we switch between SIMPL and Forth?

SIMPL source code  can be written in such a way that it can be expanded into standard Forth, using the SIMPL interpreter to automatically generated this more verbose form.  The SIMPL primitives can all be expanded out to their conventional Forth names, as can the internal words associated with letters a-z and user words A-Z.

Conversely, small eForth application programs could be analysed for their word content and mapped onto the primitives and user symbols used in SIMPL.  Provided that only 52 user defined words have been used in writing the application – then the switch between Forth and SIMPL should be relatively straightforward.

SIMPL is essentially a shorthand or shortform of Forth. The two should be designed in such a way that they are interchangeable. SIMPL is just a subset of Forth – simplified!


MUP21  & eForth Primitives 

From Ting, Dr. Chen-Hanson. Zen and the Forth Language: eFORTH for the MSP430 from Texas Instruments (Kindle Locations 406-413). Juergen Pintaske. Kindle Edition.

Here’s a list of the instruction set of Chuck Moore’s MUP21 Forth cpu. Ting compared his 31 eForth primitives with these and this confirmed that they were thinking along very similar lines.

Transfer Instructions JMP, JZ, JC, CALL, RET, LOOP

Math/ Logic Instructions AND, OR, XOR, NOT, SHR, ADDC

Memory Instructions LDR, STR, LIT

Stack Instructions PUSH, POP, DUP, DROP, SWAP, OVER, NOP

These are also very similar to the instruction set of James Bowman’s J1 Forth processor, which is the intended FPGA target for my SIMPL language. So my plan is to have a maximum of 32 primitives – which can be represented by a 5 bit instruction, and three of these instructions can be pipelined into a 16 bit wide memory – again an idea borrowed from Chuck’s MUP21.

However, the J1 uses a 16 bit wide instruction, where the bit fields directly control the hardware. In my proposal, there will need to be a further level of decode – effectively a 5 bit to 16 bit ROM, which generates the J1 instructions from the 5 bit instruction token. This is the price we pay for having a token set that has been chosen for human readability – it needs a further level of decoding to suit the ISA of the Forth processor.

We can of course model the instruction set and architecture  of the J1 in MSP430 assembly language, and this might be a useful step to do.  In effect we will have created a cross assembler, where SIMPL code is assembled into J1 instructions. Whilst slower than running SIMPL directly in MSP430 assembler, it would be useful to prove the development of the language on the J1 – all be it simulated in code. When run directly on the J1, there will be an order of magnitude speed up.

Uses of SIMPL.

Writing SIMPL has been an education for me, as I have learned about the inner workings of virtual stack based machines, and how they are created within a conventional register based processor.  A learning exercise for me, can equally be a educational process for others, so I see one of the uses of SIMPL is as a learning tool to teach students about the instruction set and the architecture of simple stack processors. In the same theme as “From NAND to Tetris”  “Building a SIMPL Stack Processor” could be offered in the form of an online study course, consisting of theory, implementation on the MSP430 Launchpad, or Juergen Pintaskes MicroBox. For some, learning electronic engineering and hardware description languages –  the implementation in a dedicated soft core processor on an FPGA woud aso be a useful earning exercise.

At it’s heart, SIMPL is a text interpreter, and can be used to process text files. Many of our modern manufacturing processes, such as pcb production, CNC machining, 3D printing and laser cutting transfer their information using fairly simple text file formats – such as Gerber and G code.  These files consist of numbers and alpha characters which are translated into sequential machine operations. SIMPL can be used as a text processor to convert these files between standards, or to generated new compatible formats – which may be of interest to the open source and maker communities.  Animating a walking robot, or fying a drone through a sequence of aerobatic manoeuvres are both things that could be translated into SIMPL text files.

In a later post I will describe how SIMPL source code can be used as a lingua franca for bi-directional  exchange of information and program code between radically different classes of computing hardware – from the humble Arduino or Launchpad to the fastest laptop or Octa-core Android platform. Laptops and other mobile computing devices are convenient viewing platforms because of their high performance graphics capabilities, whist small microcontrollers generally struggle to produce anything other than a serial text output.

However if this serial text was effectively a list of drawing commands for graphics primitives – the mobile platform, running a SIMPL virtual machine (written in iForth or Processing for convenience and portability) could interpret the “display list” and produce the fully rendered graphics display from it. SIMPL can send serial text to a Laptop or tablet at some 200,000 bytes per second – perfectly fast enough to render images for datalogging, oscilloscope or IDE type applications.



Posted in Uncategorized | Leave a comment

SIMPL – a Small Forth Inspired Language

SIMPL really is a very small language. The interpreter and primitives kernel fit into less than 1Kbytes on the MSP430.  It can be ported to virtually any microcontroller that has a reasonable number of registers. SIMPL is like Forth – but simplified!

SIMPL was inspired by Ward Cunningham’s TXTZYME – a minimal language written in C that allowed a microcontroller – such as Arduino or Teensy to perform basic operations, such as printing text, looping, toggling port pins and reading an ADC all controlled from a serial terminal. It only had 13 instructions and was quite limited in what it could do.

I was intrigued by TXTZYME and how it worked –  and quickly realised that it could easily be extended to include math and logical operations, conditional execution and like Forth could have new words created from it’s set of low level primitives.

In the last 4 years, as inspiration leads me, I have developed SIMPL much further and ported it to other microcontrollers such as MSP430 and ARM M4. There are now 400MHz ARM M7 microcontrollers available – and porting SIMPL to these monsters should be relatively SIMPL.

This year – taking my inspiration from a recent trip to Forth Day at Stanford University – with Forth cpu expert James Bowman, and eForth guru Dr. C.H. Ting – I have sought to write SIMPL in MSP430 assembly language, as a learning exercise, and so I can experiment further with the mechanics of the language. All I’m doing is following Chuck Moore’s journey,  but 50 years later. I just want to create the tool set that makes programming computers easier and more productive.  I don’t want 20 million lines of code just to run an operating system. I want the machine to spend all it’s time on my application!

I see SIMPL as a universal “Shorthand” allowing us to write code at a fairly low level for a wide variety of microcontrollers – at least for simple applications.  I think of it as a debugging toolbox, or Smart Bootloader, that once installed on a microcontroller, grants you the ability to load code into memory,  communicate with the mcu at a fairly basic level, execute code and examine the contents of memory or print out results. It also gives the means to exercise the mcu peripherals – such as GPIO, ADC, UART, SPI etc.

Just like Chuck Moore, I don’t want to learn a whole new set of assembly language for every new processor I take on.  I want to put a version of SIMPL (coded in C or Arduino C++) on the chip, and gain a familiar set of basic tools.

My goal is to produce an extensible language that fits into just 1k bytes – which is effectively an “Access All Areas Pass”  for the mcu.

The MSP430 has been chosen because it is a low-power 16 bit processor and newer devices have up to 256K bytes of nonvolatile FRAM memory.  MSP430 also have some neat features – like 24bit ADC, and very fast UART allowing serial communications at up to 8 Mbaud.

Once the SIMPL virtual machine  has been written in MSP430 assembly language – it is a relatively easy task to port it to any other modern register based processor – including ARM, x86 etc.  But for now I am just finding my way around the MSP430 assembly language – which will suffice as a very capable virtual machine.

Ultimately, the plan is to port SIMPL to it’s own custom stack processor – such as James Bowman’s J1, based on a soft core running in an FPGA. Most of the primitive instructions are executed directly by the J1 hardware, so the port will consist of decoding the primitives from ascii using a look up table and generating  the 16 bit wide instructions that the J1 uses.

What I see SIMPL useful for:

It’s a low overhead tookit that allows you to explore the inner workings of microcontrollers.

It uses many of the ideas of Forth, but without getting bogged down with the Forth virtual machine. Forth could be considered to be an extension of SIMPL, and learning SIMPL teaches you a lot of what you need to know for Forth.

It allows the newcomer to explore a microcontroller with just a serial terminal. Ideal as an educational language, showing what can be achieved in a very low byte count on virtually any microcontroller.

Very compact – a 140 character tweet or SMS could do a lot in SIMPL

Automates the process of getting code to run on a new microcontroller – a useful alternative to the traditional bootloader – with lest than 1K of overhead.

Fast – all primitives are written in assembly language. Instructions are decoded in logic or by look-up table – so reduces the overhead of a dictionary search.

Uses a very simple syntax, many machine tools, 3D printers, laser cutters could be controlled from text files that are essentially lists of SIMPL instructions.

SIMPL has just 32 primitive instructions – based on the non-alphanumeric symbols used in ascii. Lower case alphabetic characters are used for function calls to code that exercises processor specific hardware, or for examining the contents of memory.  The uppercase characters are reserved for user defined words – created from primitives and lowercase words.

SIMPL is a highly mnemonic language – where the choice of character reflects the operation being performed.  This is not a new idea – it was used on the very first Cambridge – built EDCSAC machine – where instructions were loaded in from paper tape. SIMPL pays homage to this machine and the simple way in which things were once done.

SIMPL is Forth like in that it uses words or tokens that cause blocks of code to be executed sequentially. It differs from Forth in that the words consist of single, printable ascii characters that are decoded directly causing the cpu to call a routine at a given address, execute the subroutine found there and then return control to the inner interpreter.

There are 85 printable characters – so this defines how many unique codes the machine will interpret as instructions.

The inner interpreter is very compact on the MSP430 with the main interpreter loop fitting into about 300 bytes.

On top of this interpreter loop, you have the code for the low level primitive symbols – listed below, and then higher level words – such as “toggle a port pin” “n milliseconds delay” “read an adc channel” “output a string to serial port” “produce a hex dump of memory” which are represented by the lower case alphabet characters.

This layer of the language is what I refer to as the “Arduino Layer” – all those useful helper functions that are processor specific.

The next layer is the users application code where the users words use capital letters. For example – the classic washing machine program:


This would be condensed in SIMPL as FWEFRES
– which is a code reduction factor of >5

But each of these can have a parameter – for example if the WASH is an AGITATE cycle A

– In Forth


In SIMPL we also use a colon definition to define W. There is no need for the ; as when this snippet is entered it has a null terminator added, and this is the cue to the interpreter to return to fetch the NEXT word

:W50(A) the code in the parenthesis is repeated 50 times

In fact the ascii code for W is used to store this code snippet at a given address – calcuated as (W-61)*32. So as soon as the interpreter encounters W, it jumps to that code address

Most of the primitive Forth words are located in the ASCII characters 32 to 63 that leaves 64 to 126 available for the users vocabulary and constructs made from primitives.

There are approximately 32 printable ascii punctuation and symbol characters – and with a bit of pre-processing they can be used to form the basic machine primitives.

Stack Commands (7) PUSH, POP, DUP, DROP, SWAP, OVER, NOP

,        PUSH
.        POP
”       DUP
‘        DROP
$        SWAP
%        OVER
SP        NOP

Maths & Logical Operations (9) AND, OR, XOR, NOT, SHR, ADDC

+        ADDC
–        SUB
*        MUL
/        DIV
&        AND
^        XOR
`        SHR
|        OR
~        INV (NOT/COMPL)

Transfer Instructions (8)

:        CALL (COMPILE)
;        RET
(       LOOP-START
)        LOOP-END
<       LESS
=        EQU
>        MORE
\        JUMP – condition follows

IN/OUT (2)

?        KEY (INPUT)
.        PRINT (OUTPUT)

Memory Instructions (3)

@        LOAD
!        STORE
#        LIT

Others (6)

_ _        String Print
[ ]        String Store
{ }        Array of elements – switch/case

The use of single ascii characters as tokens, makes the language a lot less verbose than Forth, and snippets of source code can be sent in very few characters – for example as an SMS message between 2 systems equipped with GSM modems.
It is a fairly simple task to have a word table – where the verbose form of the words are stored, and can be printed out, to expand the source into something more readable – for example


Three ascii characters expanded to 11 plus 3 spaces

SIMPL does not use the space as word separator, but in the special case where you have several numbers to put on the stack, the space is used to indicate “push onto stack”

The intention is to write an implementation of SIMPL in MSP430 assembly language. This not only helps me learn MSP430 asm code, but it’s a good exercise in creating a SIMPL virtual machine using a very conventional register based microcontroller, to gain experience of the language and its limitations. For example, have I got the right mix of primitives to allow a complete language to be synthesised?  The choice of primitives was based on Chuck Moore’s MUP21 instruction set and CH Ting’s e-Forth model.

Writing in assembler exposes you to the raw roots of the processor, where you have to think hard about what you want the code to do, and always be prepared to rewrite and refine every routine.  Having ported an application to one register-rich processor, porting to a new device like an ARM is much easier, as you have already done the hard work of developing the register functional model.

As the microcontroller is running the SIMPL machine, the SIMPL primitives should give access to the widest range of the host’s instruction set, but with the simplification that the stack takes the place of having multiple registers to write to.  This is further simplified that the top and next locations on the stack are stored in registers rather than RAM.

As the VM needs to perform tests and comparisons on numbers – the usual comparison operators are provided  > , < and  = .

These are used in conjunction with the parentheses operators which provide the means of skipping or looping sections of code depending on the result of the comparison.

As an example     10 11>(_Print if Greater_)












Posted in Uncategorized | Leave a comment

SIMPL – Now Ported to MSP430 Assembly Language

For some time I have been working on a tiny, Forth like language called SIMPL (serially interpreted minimal programming language). Whilst it’s origins are from Ward Cunningham’s “Txtzyme” – and originally written for Arduino, and Teensy in C, I have decided to pare it down even more and write it assembly language.

One of my challenges is to create an extensible language that can be ported to virtually any microcontroller, and it provides the basis of a complete, interractive programming environment, requiring only a terminal program for communications – consisting of text input and output.

A Primitive Command Set

The language will allow interactive programming of the microcontroller by way of a virtual machine.  ascii characters will be interpreted as commands, which either operate on data contained in the stack or within system memory.

Some of these commands are treated as primitives – and executed directly within a few machine cycles. Others are longer program operations which are synthesised from the primitives and called as subroutines. It is the ability to build up complex sequences of primitives into loops and other control structures that gives this language its extensibility.

Primitive commands include arithmetic and logical operations – such as ADD, SUB, MUL, DIV, AND OR, XOR, NOT

These are often not much more than a couple of register operations, written in the native assembly language of the target processor.

Other primitives allow data to be fetched or stored to and from memory and the stack

Other commands are used for stack manipulation such as PUSH, POP, DUP, DROP, SWAP and OVER.

For convenience the instructions to send a character to the terminal, or receive a character from the input buffer are included as primitives – but subroutines to print a decimal number to the terminal or produce a hexadecimal dump of memory are created as subroutines composed of several primitive instructions built into program control structures such as loops .

With the right mix of primitive instructions, and the ability to create subroutines using these, virtually any programming task can be achieved – and the user is coding in effectively the native machine language of the virtual machine.  This means that if the virtual machine can be hosted on almost any microcontroller – then the applications can be ported from platform to platform.





Tiny languages have existed for decades – literally since the first days of stored program computers, and later in the early days of resource limited microprocessors – and there is a certain amount of discussion on just how small they can be – and still remain useful.

My challenge is to get the complete language kernel into just 1024 bytes of program memory, and this will contain the serial communications routines, a text editor, a compiler and an interactive interpreter – plus whatever else will fit.  The language will have the capability of accepting a program, in the form of an ascii text file, and be also able to dump areas of memory – either in text format or hexadecimal to the terminal screen or capture.

I have chosen the MSP430 series of microcontrollers as the target. The reasons for this is that the MSP430 is a good 16 bit mcu, low power and now in variants that offer lots of non-volatile FRAM – and peripherals such as 24bit ADCs.

Plus it’s not such a difficult processor to get to grips with the assembly language (unlike the more complex ARMs) – and as such, it makes a good first choice of affordable 16 bit processor for learning about the mechanics of Forth.




I have bitten the bullet and started to port it into MSP430 assembly language for speed and compactness.

My aspirations lie in creating a minimal tokenised interpreted language – where the words or tokens are single ascii characters. This removes a lot of the more complex aspects of Forth, such as dictionary searches, and it also makes a very mnemonic rich language – in that I can choose what symbols I use for the various stack operations – so I use ” for DUP, $ for SWAP, % for OVER and ‘ for DROP.

Whilst to some in the Forth community this may appear heretical, it makes sense to the way my brain is wired. More conventionally I have &, |, ^ and ~ for AND OR XOR and INVERT for logic operations and the usual +, -, * and / for the maths operators.

Inspired by Ting’s eForth and its minimal word set, I can get most of the primitives into the 32 punctuation symbols, leaving the capital letters for user words, and the lower case for other system words – that are constructed from primitives – for example h to set a port high and l to set it low, m for millisecond delay, u for microsecond delay.

The language is evolving on a daily basis and I can now do basic maths and logic operations and some of the stack manipulations – yet the whole kernel (so far) with UART support is just 582 bytes long – which roughly translates into about 500 lines of code. I have posted the code so far up on Github Gist here


This is very much a work in progress – and I have not yet completed the primitive definitions – but it’s my way of learning how these obscure languages work, and good fun to tinker with. You just need a Launchpad with a MSP430G2553 – but with a little fiddling with the UART routines it will run on virtually any MSP430.

What’s it good for, you may ask. Well I see it as a lingua franca to allow widely different machines to communicate with each other efficiently at a low level – but still human readable. I also see it as being ported easily to a specialist Forth processor – and being used for controlling CNC machines and 3D printers, and also rendering any common file format – such as Gerber or G-code that uses a mix of single letter characters and numbers to represent machine control instructions.

The MSP430 is only the start of the project, hopefully leading onto custom stack processors that execute forth primitives directly.

Posted in Uncategorized | Leave a comment

Coming Up For Air


2017 has been an eventful year so far, and for the first time in ages, I have been able to pause to take stock of these events. I write this from my quiet seaside dwelling in Hove – actually.

As of 16th November, I will no longer be a PAYE Employee – so it’s back to freelancing and picking up whatever work interests me and pays the bills.

This is a self imposed change – and it’s only the 3rd time in my 30 year career in electronic engineering that I have pulled the ripcord and bailed out of an otherwise steady job.

Whilst I will continue to pick up some part time contact work from my current employer, and wrap some things up in an orderly manner, the plan is to focus on my own project goals for at least two days a week.

In order to transit gently into this new way of working – I have decided to take a break, and so I am off to San Francisco on Thursday for 12 days.

During this trip, I will be reacquainting with old friends and colleagues, and attending the annual Forth Day – which is held on the Stanford University Campus – by the Silicon Valley Forth Interest Group.

As well as meeting up with some of the Forth group I will be doing a  short presentation to introduce some of the things that I have worked on this year, plus some projects from other friends in the European Forth Community.

It’s a tall order, but somehow I have to condense some fairly  broad subjects into a concise 18 minute presentation. – so in a rather tenuous manner –  that’s why this blog begins with a picture of the Forth Bridge.

In this post I collect together some of the threads of various projects that I have started in the last 12 months – but for a variety of reasons have not yet been taken to conclusion.

Weaving these various threads together to make a stronger fabric – or canvas – on which I am going to paint a whole new strand of embedded technology.  It’s taken all summer, the UK MegaTour with Toby Yu, the forced escape from my day job, and a bit of time to rationalise and coalesce some ideas – and these will form the basis of my presentation at Forth Day.

The Ideal Forth Microcontroller?

Earlier this year I began working with simple microcontrollers, in particular the MSP430 with non volatile ferro-electric memory or FRAM.

The MSP430 makes a good choice as a Forth machine as a result of its 16 bit architecture, it’s not volatile FRAM memory and the inclusion of an instruction that allows the $NEXT  macro of the inner interpreter to be reduced to a single cycle instruction:

mov @ip+,pc

This indirect, autoincrement memory read instruction efficiently implements the $NEXT macro – and makes the direct threaded model for Forth superior to other threading models.

Key details of this direct threaded model and MSP430 eForth are now available in Dr. C.H Ting’s excellent book  “Zen and the Forth Language” – now  from Amazon, and I am indebted to Juergen Pintaske for making this publication available to the wider Forth Community. Dr Ting explains the inner working of the eForth implementation on the MSP430 in a most readable fashion and thus makes this key information available to a much wider audience.

Zen and the Forth Language: EFORTH for the MSP430 from Texas Instruments by [Ting, Dr. Chen-Hanson]

The MSP430 makes an excellent choice for a Forth Computer  using the direct threaded code model.  It already has a 16 bit architecture,  von Neuman, unified memory space makes dealing with data and instructions somewhat simpler. It’s choice of low power, non volatile FRAM memory makes it virtually unique in the microcontroller market.

FRAM has a fast write cycle – about 100 times the speed of Flash, and with a life of 10E +14 cycles – it will not wear out anytime soon.   EEprom is made redundant with FRAM, and the boundary between FRAM and SRAM can be moved – so that FRAM is used as RAM – allowing up to 8MHz operation.

The MSP430FR5994 with it’s massive 256K of FRAM is a new addition to the FRAM range of microcontrollers and at  $3.76 (1000+) it is very affordable. Add to this an external SRAM memory and you have the makings of a powerful little machine.  With volume production, it’s possible to have a complete Forth system for around $5 – including the USB programming adaptor.

Performance wise, it’s not in the ARM class, not even close, but at 16MHz full speed, it will run about 4 times the throughput of the Arduino R3.

One of it’s strengths is that it has been blessed with 3 communications ports – allowing a rich mix of asynchronous UART and synchronous SPI hardware to be added.

The MSP430 may be programmed with a variety of Forths, including Mecrisp, Amforth and 4E4th – the latter having a special port of Camel Forth available specifically for FRAM operation.

Recently I came across Fast Forth by Jean-Michel Thoorens – this is Direct Threaded Code, and has support for SD card.  Programming is done making use of TeraTerm for file sending – at a full 921600 Baud.   It was designed specifically for MSP430 with FRAM – needing just 8K bytes of FRAM.

Forth was designed as a complete, self contained  tool-chain – including Editor, Compiler and Assembler – all the tools that you need to develop code,  at your fingertips – present on the target chip.   Now for the first time, with ChipStick we can have all of this on a tiny, manageable 20 pin DIL module – complete with it’s own USB programming interface.


ChipStick was my first venture into creating an MSP430 FRAM board in the form of a 20 pin DIL module that could replace the MSP430G2553 that comes in the Value Line Launchpad.

It features a detachable programming section, USB serial interface and 128K of SPI SRAM – which can be powered or backed up by supercapacitor or LiPo cell..ChipStick_3

The early prototypes were created in March and sent out to various Forthers around the UK and Europe.  As a result there are now three ports of Forths available for this tiny design.

Fast Forth – Jean Michel Thoorens

Mecrisp – Matthias Koch

Camel Forth for MSP430 FRAM – by Michael Kalus

Additional support came from Juergen Pintaske, Dirk Bruehl, Lars Brinkhoff and Mark Willis.

Now that the MSP430FR2433 is in volume production – this prototype – slightly enhanced can go into production.



Image result for chipstick

Nanode 24

This year I am relaunching my Nanode Design and Technology company with a range of new board designs.

The tiny MSP430FR2443 used on ChipStick is now commercially available – whereas in March it was only available in sample quantities.  This means that I can push ahead with ChipStick and other related designs – including the even smaller Nanode 2016. This tiny module has the MSP430FR2433, which is $1.37 in production quantities, and the module can be fitted with external SPI RAM or FRAM  – up to 128Kbytes. If you need a capable microcontroller in a small space – then this will do much more than the Arduino Nano.

Inspired by my colleagues in France and Germany with their ports of Forth to the MSP430 and my own efforts with the diminutive ChipStick running SIMPL in March and April, I was please to see in July the long awaited 256Kbyte FRAM MSP430 had become available and I had a couple shipped across from the US. This little $16 board has to be the preferred platform for anyone serious about experimenting with Forth and other stack based languages – on a mainstream commercially available microcontroller.

  • 35 lines of GPIO brought out to Launchpad headers
  • 2 User LEDs and 2 switches
  • 256K bytes of FRAM!
  • Low Energy Accelerator – a maths coprocessor
  • microSD card
  • Super Capacitor
  • Very low energy with energy monitoring feature
  • Programming interface with USB serial connection

Talking to small microcontrollers has traditionally been done with a serial terminal link, a text editor or an Integrated Design Environment (IDE).

I have been rummaging a  new idea, which I hinted on  earlier in the Summer – which I am now going to call “Forth Bridge”. It’s a way of connecting widely differing computer or processor systems using Forth like commands to convey information and instructions.

The idea arose from the work that James Bowman and I did on the FT812 EVITA graphics processor board back in January.  James successfully applied the FT812 to his Gameduino II product – allowing a high performance handheld gaming device to be driven using an Arduino.  This is possible because all of the hardware needed to support the graphics of the LCD is provided in the FT812 – and the Arduino need only supply a list of commands in order to generate the graphics display – and not the actual RGB pixel signals.

This offloading of the video to the video co-processor is something that has been done in home computers ad PCs since the early 1980s, but the FT812 does this in a $5 chip – effectively reducing the 65MHz bandwith video signal down to a list of video control commands sent over SPI at a few tens of kilobaud this puts it firmly within the grasp of any small microcontroller.  I reworked James’s ideas so that the FT810 could drive a large LCD monitor at up to 1024×768 pixels – and thus my little EVITA board was born.

The next part of the jigsaw was to realise that all that Gameduino and EVITA are doing is connecting two computers together – one is a general purpose 8-bit microcontroller and the other is a machine dedicated to creating high bandwidth streams of pixels and rendering images.  The 32 bit spi commands that are sent between them is a  shorthand means of forcing the co-processor to execute it’s graphic generation primitives – so one computer is controlling another computer over a communications link. Whilst the ICs are generally only a few centimeters from each other – and SPI is the most appropriate comms interface, they could be in different rooms or continents, linked by RS232, ethernet or any other practical comms link.

The next piece of the puzzle was JM Thoorens Fast Forth – a direct threaded MSP430 FRAM Forth, which he supports with a simple serial UART connection to a PC. The only difference is that he is able to send Forth source from the PC text editor to the target at 6Mbaud using a low cost serial UART board that costs about a dollar from China (CP2104). Whilst the source code is sent generally from the PC to the target, there is no reason why debug and other interactive data cannot be sent in the reverse direction at a similar speed, and then be rendered by the PC, tablet or whatever to produce a highly interactive video display. If the PC emulates the command set of the FT810 – then the target can either talk to a real on board FT810  or  a PC – without changing the command set.

This then allows a whole host of interactive devices to be created from low cost microcontrollers fitted with low cost LCDs.  You may recall that I mentioned in the Summer that a 7″ touch screen is now about $30.

Commands, communications, computers talking to each other? This sounds like a job for Forth. An intelligent Forth, that runs across all platforms, that if I were naming it – I’d probably call it iForth. So I googled it, and I see that it already exists – and is well into its 21st year!

That’s good –  it makes the PC end so much simpler if there’s already a tool that can do all the heavy lifting.  So we can host a GUI, and IDE and whatever cross compilers that are needed on any platform from a smart phone, tablet, laptop or PC. And just reading that last sentence – the word GUIDE leaps out of the text. A graphical user interactive development environment…….

We can also start to explore some of the other ideas, I hinted on in the summer.  New ways to write code and display the inner workings of the processor we are targeting.  Now that  the target can send a lot more than a hex-dump or source code listing to the PC, we can start to look at new ways of understanding the way the code sits and runs on the target. We can interact with the processor in whole new ways to better grasp the application and see much further into the logic processes. And in this respect – it makes an excellent platform for education. More on this in  a later post

So we have a PC or tablet platform running an iForth toolset, and a useful “off the shelf” educational target in the form of the latest MSP430 FRAM launchpad running Fast Forth.The third thing in this unholy alliance, is to extend these principles to custom Forth processors – and this is the main thrust of my involvement this year.



Forth and FPGAs

myStorm is a low cost FPGA and ARM dev board – aimed at makers and hobbyists. It consists of a Lattice ICE40HX4K FPGA and an STM32L433 microcontroller.  There is also a 256K x16 SRAM closely coupled to the FPGA and a USB programming interface.

James Bowman has already ported his J1a Forth processor to the myStorm board and Matthias Koch has ported he Mecrisp Forth to the STM32 ARM M4 Cortex microcontroller. Now we have two very different processors, both running Forth – and coupled with a high bandwidth link.  This is where things start to become exciting.

Whilst the ARM M4 is a very capable processor in its own right, it’s main role is to provide a standard set of peripherals, interfaces and 5MSPs ADC  and make these available to the FPGA.  The user can then focus on the design of the FPGA, knowing that the peripherals are standard – and won’t need to be synthesised within the FPGA.  The user can then develop his ideas in the knowledge that all the support hardware is taken care of.


Image result for mystorm

Since the early 1960s, computer engineers have been connecting widely different computing systems together – sometimes this might be several machines on a network, but initially it started with just a complementary pair of machines connected to achieve a specific goal.

This idea of connecting two widely different computers harks back to the time of the first commercial transistorised machines – using RTL, DTL and eventually TTL to implement the architectures.
Digital Equipment Corporation (DEC) and Control  Data Corporation (CDC) are typical of players at this time – working on a class of machine which became the first mass-market, affordable (relatively) mini-computers.
I choose these machines  – because they were new stripped down 12 bit architectures – with the emphasis of reducing the complexity and cost of the system.
The CDC 160 series of 12 bit processors – reputedly designed over a long weekend by Seymour Cray in 1960 – that was created initially to serve as a I/O processing front end to the CDC 1604 mainframes and later modified in the mid-1960s (CDC 160A) to be  used as Peripheral Processors to the larger CDC 6000 series mainframe.
In a similar way, when Digital sold first a PDP-1 and later a PDP-4 to the Canadian Atomic Energy as a reactor controller – Gordon Bell and Edson de Castro conceived and implemented the PDP-5 ( the fore runner to the PDP-8 )- that was used as an analogue data capture front-end at their Chalk River Nuclear labs.
It’s with this heritage that I anticipate that the myStorm board will be used – providing the means to use the FPGA to capture and generate high speed data – for digital oscilloscope or logic analyser projects, and then use the ARM to package all this data up into a form that can be used to drive the display of a tablet or PC.
As you can see – the streams are converging – and we live in interesting times.








Posted in Uncategorized | Leave a comment