James Bowman developed the J1 Forth CPU in 2010 – and presented a paper describing its architecture at Euroforth that same year. It’s a stack based processor – capable of executing Forth primitives mostly in a single cycle. It’s also an ideal candidate to explore stack machine and MISC architectures – and has been ported to FPGAs using entirely open source tool chains.
This ongoing project was inspired by ideas first discovered in the book “The Elements of Computing Systems” – and the associated course “From NAND to Tetris”. Having ploughed easily through the first chapter on hardware, I realised that my software skills were dreadfully lacking – so I thought I’d create a simple language (SIMPL) from scratch and get it to run on an open source FPGA platform – myStorm – using Clifford Wolf’s Project IceStorm open source tool chain.
In this post I explore the J1 ISA (Instruction set & architecture) and hack together a simple cross-compiler to execute code on a simulated J1 processor.
The J1 Architecture.
Having read James Bowman’s J1 Paper several times over, I managed to build up a simplified model of his instruction set and architecture.
J1 uses 16 bit long instruction words – where each word is divided into different length fields. The following 3 images – taken from James’s 2010 EuroForth paper – explains the architecture:
Bits 15, 14 and 13 define the Instruction Class – and there are 5 classes of instruction
1 x x Literal
001 Conditional Jump
011 ALU instruction
Bit 12 if set in ALU mode provides the return from subroutine mechanism by loading the top of the return address stack R into the PC.
Bits 11,10,9 and 8 define a 4 bit ALU opcode – allowing up to 16 arithmetical, logical and memory transfer instructions.
Bits 7,6,5 and 4 are used to control the data multiplexers – so that data can be routed around the cpu according to which of these bits are set. Here lies a little anomaly with the J1, in that Bit 4 is not used, and it would seem more logical to use it to provide the return function bit – currently done in bit 12.
Bits 3 and 2 define how the return stack pointer is manipulated in an instruction. It can be incremented or decremented by setting these bits.
Bits 1 and 0 define the manipulation of the data stack pointer – it has a range of +1,0,-1,-2 depending on the setting of these bits. To pop off the stack you subtract 1 from the stack pointer, to push on the stack, you add one to the stack pointer.
As the various control fields of the J1 instruction exercise different parts of hardware – they can operate in parallel – so for example a return or exit from subroutine can be had for free.
Modifying the J1 Instruction Set.
Whilst the J1 instruction set is neat and compact – the anomaly in Bit 4 is a bit of a sticking point with me. If we put the “return bit” into the Bit 4 field, this would free up the Bit 12 field. Bit 12 could then be used to define more classes of instruction, or better still added to the 4 bit ALU instruction field, to make the number of op-codes 32 rather than 16.
If there were two J1 cores acting together in a single FPGA – bit 12 could be used to channel instructions to either core. One J1 could be used for computation and the other could be used as a graphics processor or similar specialised co-processor.
James has already done some changes in this area – moving the return bit to Bit 7 – plus some other internal improvements – and has subsequently released a J1b – which can be instantiated as either a 16 bit or 32 bit Forth cpu, whilst the J1a is a minimum footprint cut down 16-bit J1 with *K memory to fit into small FPGAs – like the Lattice IceStick . See James’s Github for demo. The Verilog fr the new J1b – is here
For more information about possible modifications to the J1 – read Victor Yurkovsky’s excellent post in “FPGA Related”. Some neat ideas for those that want to tinker with the internals of the J1. As the J1 original design verilog is so neat and clean, it’s easy to make small incremental changes and test them out for usefulness on the J1 simulator. Victor’s spring clean freed up two bits of the instruction for future expansion and reduced the number of CPU slices from 150 to 76 – allowing the scope for multiple J1 instantiations in a single small FPGA.
Creating a Simulator and SIMPL to J1 Cross Compiler
I came across a very compact simulator for James Bowman’s J1 cpu – written in about 100 lines of C. I decided to port this across to an ARM based Arduino Due, so I could use it to explore the architecture further, as a step on the road to implementing a J1 on a myStorrm FPGA board.
I have put the Arduino code for this SIMPL cross-compiler/ J1 Simulator on this Github Gist
As an input the simulator expects 16 bit J1 machine code instructions in an array of RAM, and the bulk of the code is an execution model of the cpu which steps through this section of memory executing each instruction in turn. After each program step I chose to print out the main stack elements, the program counter, and a few memory cells.
However, the process of hand assembling J1 machine language needs patience and knowledge of the chip’s architecture, and to write half a dozen native instructions to execute a simple loop counter was about as much as I would recommend for anyone. So I decided to investigate easier ways to automate this process, and hit upon the idea of a cross-compiler, using my SIMPL language and text interpreter framework to input the source code.
SIMPL uses single ascii character tokens to represent primitive machine instructions – I just needed to create a look-up table mechanism where these characters are decode into whatever the machine language of the host processor happens to be – in this case J1. However it could be any other cpu – just by changing the contents of the look-up table, so a series of cross compilers could be written so that SIMPL could be hosted virtually (no pun) on any processor.
SIMPL exists as a virtual, stack oriented processor, hosted on real hardware cpu. More often the host processor is register based – such as ARM or MSP430 – but in the case of the J1 it is also a stack machine (albeit simulated on a register based ARM) . This means that the choice of the SIMPL minimal primitive instructions – are a very good fit to actual J1 instructions – with often a 1 to 1 match. This makes a real J1 an excellent host for the SIMPL language – and it is this relationship that I wish to explore further.
In previous posts I have described SIMPL as a lingua franca – which allows the communication of programs and data between widely varying classes of computing machines. Each machine just needs to host a SIMPL interpreter, and at the start of a communication exchange, the communicating machine needs to send a descriptive list of what the various words mean – in terms of the primitives that go up to make their definitions. Once this has been done, SIMPL becomes a very compact medium for transfering code, data and objects – with a code compaction of between 3 and 4 over other source code or object code techniques.
I took the SIMPL text interpreter framework and mapped into it the J1 instructions, focusing initially on the stack, ALU and memory operations – listed below
SIMPL Operation J1 hex code
” DUP 6081
‘ DROP 6183
$ SWAP 6180
% OVER 6181
+ ADD 6203
& AND 6303
| OR 6403
^ XOR 6503
~ INV 6600
@ FETCH 6C00
! STORE 6123
As can be seen from the above – they all begin 6xxx which means ALU operation – and the 2nd significant digit is the 4 bit ALU opcode. The 3rd digit is the transfer destination of the ALU result, and the least significant digit is the stack manipulation bits.
The cross compiler produces a line of output thus – in response to a single ascii character being typed in – in this case ^ for XOR
XOR 6503 PC=9 TOP=FFFF DS0=0 DS1=0 DS2=0 DS3=0 RTN=0 MEM20=0 MEM21=6F00 MEM22=0
This also shows the Program Counter PC, the Top of Stack plus the first three elements of the stack below TOP, the value on the top of the return stack and three consecutive locations in memory. This debug output is essential to ensure that the J1 simulator is correctly executing each new instruction – and a great visual insight into the working of the CPU.
The cross compiler is now producing output in J1 machine language – and because it uses the SIMPL front end text interpreter, this automates the process of putting text strings together so it can be used to compose small snippets of J1 code and run them in an interpreted manner – allowing new words to be created from SIMPL primitives.
Future work will enhance the cross compiler allowing the larger program flow constructs such as DO-LOOP, IF-THEN and BEGIN-END to be implemented. Whilst output from the simulator is currently just single lines of Debug – the program will be extended so as to compile into files of J1 opcodes, and SIMPL source.
“It’s Forth-like, Jim – but not as we know it….”